|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: DHS ten years of data
All,
I am trying to create ten years of data for infant and child mortality data
from a single Demographic Health Survey. I originally had a SAS do file
which selected the relevant records and variables, and a STATA file that
then created the two series of DATA. With some help from numerous people on
STAT list (thanks) I was a able to combine the two do files in a single
STATA do file. This do file now runs all the way through, but it does not
always produce tens years of data for both series. For example, for
Bangledesh 1996 survey (individual recode) it produces ten years of data
for one series but only seven years for the other series. For Kenya 1989
(individual recode) it only produces a few years worth of data. Can anyone
take a look at the do file and tell me why this is happening? I have pasted
all three do files below:
Any help would be greatly appreciated.
**** DO FILE 1 COMBINED STATA FILE ********
*Explanation of Stata do file and SAS File
set more off
* lets open a log
log using c:\DHSLOG1, replace
* This do file creates ten years worth of time series data for infant,
under three and child mortality from a demographic health survey.
* This do file has to be run for each survey. Then the results are pasted
into the panel data time series Excel sheet, which also contains
information from WDI, then the results are converted back into STATA
* The file is always loaded from the desktop DHSEX
* Step 1 Setting up Stata
* The next line tells stata we are using 8.2; because you are using your
full laptop version of STATA 8, but WED has version 9
version 8.2
* T he next command increases the systems memory because we are going to be
working with a lot of variables
set mem 36m
* The next command increases the number of variables it can handle
set matsize 3500
* Step 2 loading the data
* the next command loads the data file. The file has to be put in
use C:\DHSEX.DTA, clear
* Step 3 Recoding
*Recode the variable that tells you if women interviewed is from a rural or
urban area
recode v102 2 = 0
*rescale the sample weight
replace v005 = v005/1000000
* Step 4 Extracting the relevant information for each child
* the first stage is too only keep the relevant variables because the
reshape command crashes if we have everything. We are only keeping 6 kids
because this is what Stifel SPS prog does, but we need to check that this
is right with him.
sort caseid
g wid=_n
keep caseid v001 v002 v003 v005 v006 v007 v011 v008 v101 v102
bidx_01-bidx_06 bord_01-bord_06 b0_01-b0_06 b3_01-b3_06 b4_01-b4_06
b7_01-b7_06 b10_01- b10_06
* the second stage is too create a macro with all the information we want
in it apart from the variable that we are making long
global varx "bidx_ bord_ b0_ b3_ b4_ b7_ b10_ "
* the third stage is too reshape so each observation is its own line
reshape long $varx, i(caseid) j(j 01 02 03 04 05 06)
* Lets creat variables with new names
gen hcluster = v001
gen hhnumber = v002
gen mother = v003
gen wgt = v005
gen monthint = v006
gen yearint = v007
gen dateint = v008
gen mdob = v011
gen region = v101
gen urban = v102
gen twin = b0_
gen dob_ = b3_
gen sex_ = b4_
gen aged_ = b7_
gen flag_ = b10_
*now lets drop all variables that are not relevant
keep caseid hcluster hhnumber mother wgt monthint yearint dateint mdob
region urban bidx_ bord_ twin dob_ sex_ aged_
* now lets drop missing observations
drop if bidx == .
* now lets create a variables showing the childs age
gen magekdob = (dob_ - mdob)/12
gen age = dateint - dob
gen yob = int(dob/12)
* now lets sort the variables
sort hcluster hhnumber mother bidx
* from here on is Stifels do file
#delimit;
*Load dataset and keep only those kids born to mothers whose age was 15-39
at the time of birth;
scalar year = 96;
gen ageint = dateint - dob;
drop if ageint<12;
keep yob aged magekdob wgt ageint;
drop if magekdob<15 | magekdob>35;
save c:\DHSTEMP1, replace;
*------------------------------------------------*
| |
| Program to create 10 scalars indicating |
| the year in which each cohort of kids |
| was born. |
| |
| For example, if the survey year was 1988, |
| |
| surv1 = 78 |
| surv2 = 79 |
| . |
| . |
| surv10 = 87 |
| |
| Syntax for the program is simply "survgen" |
| |
*------------------------------------------------*;
capture program drop survgen;
program define survgen;
local i = 1;
while `i' <= 10 {;
scalar surv`i' = year - 11 + `i';
local i = `i' + 1;
};
drop if yob == year | yob < surv1;
end;
*------------------------------------------------*
| |
| Program to calculate IMR & CMR rates |
| for cohorts of kids born in each of the |
| 10 years prior to the survey |
| |
| Saves the output as two datasets: |
| c:\DHSTEMP1 -- IMR |
| c:\temp\DHSTEMP2 -- CMR |
| |
| Syntax for the program is simply "cimrgen" |
| |
*------------------------------------------------*
capture program drop cimrgen;
program define cimrgen;
local i = 1;
local thsnd = 1000;
qui gen imrv=0;
qui replace imrv = 1 if aged <= 12;
qui gen cmrv = 0;
qui replace cmrv = 1 if aged <= 36;
sum imrv cmrv;
while `i' <= 10 {;
matrix yr`i' = surv`i';
qui gen x = imrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local imrl = r(mean);
matrix imr`i' = `imrl' * `thsnd';
drop x;
if `i' == 2 {;
matrix imr = ( yr1 , imr1 \ yr2 , imr2 );
};
if `i' > 2 {;
matrix imrn = yr`i' , imr`i';
matrix imr = imr \ imrn;
};
if `i' <= 8 {;
qui gen x = cmrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local cmrl = r(mean);
matrix cmr`i' = `cmrl' * `thsnd';
drop x;
if `i' == 2 {;
matrix cmr = ( yr1 , cmr1 \ yr2 , cmr2 );
};
if `i' > 2 {;
matrix cmrn = yr`i' , cmr`i';
matrix cmr = cmr \ cmrn;
};
};
local i = `i' + 1;
};
matrix colnames imr = year imr;
matrix colnames cmr = year cmr;
drop _all;
svmat cmr, name(col);
sort year;
save c:\DHSTEMP1, replace;
drop _all;
svmat imr, name(col);
sort year;
merge year using c:\DHSTEMP1
drop _m;
save c:\DHSTEMP1, replace;
list;
graph imr cmr year;
end;
*********************************************;
survgen;
cimrgen;
save c:\DHSEX2;
log close;
clear;
******** DO FILE 2 ORIGINAL SAS FILE
************************************************
/* SAS program to create SAS dataset of Mortality */
/* of children in country (cc) in year (YY) from DHS */
libname user '/home4/ds52/aadata';
libname work1 '/home4/ds52/temp';
options ls=132 ps=54 nocenter;
data work1.temp0;
set user.bdir3afl;
* Recode urban so that urban = 1 and rural = 0;
if v102 = 2 then v102 = 0;
* Rescale the weights;
v005 = v005/1000000;
********************************************
Extract Info Relevant to Each Child
********************************************;
array bidx_a{20} bidx_01-bidx_20;
array bord_a{20} bord_01-bord_20;
array b0_a{20} b0_01- b0_20;
array b3_a{20} b3_01- b3_20;
array b4_a{20} b4_01- b4_20;
array b7_a{20} b7_01- b7_20;
array b10_a{20} b10_01- b10_20;
do i=1 to 6 ;
if bidx_a{i}>0 then do;
hcluster = v001;
hhnumber = v002;
mother = v003;
wgt = v005;
monthint = v006;
yearint = v007;
dateint = v008;
mdob = v011;
region = v101;
urban = v102;
bidx = bidx_a{i};
bord = bord_a{i};
twin = b0_a{i};
dob = b3_a{i};
sex = b4_a{i};
aged = b7_a{i};
flag = b10_a{i};
keep caseid hcluster hhnumber mother wgt monthint yearint dateint
mdob region urban bidx bord twin dob sex aged;
if bidx_a{i}>0 then output;
end;
end;
run;
data work1.temp0;
set work1.temp0;
magekdob = (dob - mdob)/12;
age = dateint - dob;
yob = int(dob/12);
run;
proc sort; by hcluster hhnumber mother bidx; run;
proc contents;
proc means;
weight wgt;
title "Bangladesh 1996 (DHS) Mortality Data of Kids";
run;
libname trn1 xport '/home4/ds52/aadata/bd96mort.v5x'; **;
proc copy in=work1 out=trn1;
select temp0;
run;
endsas;
**** DO FILE THREE --- ORIGINAL STATA DO FILE
*********************************************
version 6.0
clear
#delimit ;
set matsize 350;
set more off;
log using c:\dstifel\amort\rates\bd96imr.log, replace;
*************************************************************
Load dataset and keep only those kids born to mothers
whose age was 15-39 at the time of birth
*************************************************************;
scalar year = 96;
use c:\dstifel\amort\data\bd96mort;
gen ageint = dateint - dob;
drop if ageint<12;
keep yob aged magekdob wgt ageint;
drop if magekdob<15 | magekdob>35;
save c:\temp\temp1, replace;
*------------------------------------------------*
| |
| Program to create 10 scalars indicating |
| the year in which each cohort of kids |
| was born. |
| |
| For example, if the survey year was 1988, |
| |
| surv1 = 78 |
| surv2 = 79 |
| . |
| . |
| surv10 = 87 |
| |
| Syntax for the program is simply "survgen" |
| |
*------------------------------------------------*;
capture program drop survgen;
program define survgen;
local i = 1;
while `i' <= 10 {;
scalar surv`i' = year - 11 + `i';
local i = `i' + 1;
};
drop if yob == year | yob < surv1;
end;
*------------------------------------------------*
| |
| Program to calculate IMR & CMR rates |
| for cohorts of kids born in each of the |
| 10 years prior to the survey |
| |
| Saves the output as two datasets: |
| c:\temp\temp1 -- IMR |
| c:\temp\temp2 -- CMR |
| |
| Syntax for the program is simply "cimrgen" |
| |
*------------------------------------------------*
capture program drop cimrgen;
program define cimrgen;
local i = 1;
local thsnd = 1000;
qui gen imrv=0;
qui replace imrv = 1 if aged <= 12;
qui gen cmrv = 0;
qui replace cmrv = 1 if aged <= 36;
sum imrv cmrv;
while `i' <= 10 {;
matrix yr`i' = surv`i';
qui gen x = imrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local imrl = r(mean);
matrix imr`i' = `imrl' * `thsnd';
drop x;
if `i' == 2 {;
matrix imr = ( yr1 , imr1 \ yr2 , imr2 );
};
if `i' > 2 {;
matrix imrn = yr`i' , imr`i';
matrix imr = imr \ imrn;
};
if `i' <= 8 {;
qui gen x = cmrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local cmrl = r(mean);
matrix cmr`i' = `cmrl' * `thsnd';
drop x;
if `i' == 2 {;
matrix cmr = ( yr1 , cmr1 \ yr2 , cmr2 );
};
if `i' > 2 {;
matrix cmrn = yr`i' , cmr`i';
matrix cmr = cmr \ cmrn;
};
};
local i = `i' + 1;
};
matrix colnames imr = year imr;
matrix colnames cmr = year cmr;
drop _all;
svmat cmr, name(col);
sort year;
save c:\temp\temp1, replace;
drop _all;
svmat imr, name(col);
sort year;
merge year using c:\temp\temp1;
drop _m;
save c:\temp\temp1, replace;
list;
graph imr cmr year;
end;
*********************************************;
survgen;
cimrgen;
save c:\dstifel\amort\rates\bd96imr.dta, replace;
log close;
clear;
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/