I am sending this email from Bangladesh.
While working with Stata Survey commands (version 8) for cluster analyses, we are facing a problem analysing data on appended files. We have two files with similar variables collected in two different areas. When we analyse a variable (e.g. age) from each separate file we get a mean, and 95% CI for each of the area. When we append the files and then analyse the age for the two areas in the appended data file, although we get the same mean ages, we get different 95% CIs and design effect for each area as compared to the non-appended files. Why should the CIs and design effect change after appending the files? This is not clear to us. We are giving an example below.
For example, we have two STATA data files, file-1 and file-2. File-1 has 10 observations and file-2 has 15 observations. The files are:
File-1:
Id cls age area
1 1 10 1
2 1 25 1
3 2 24 1
4 2 28 1
5 2 19 1
6 3 20 1
7 4 22 1
8 4 30 1
9 4 35 1
10 5 27 1
File-2:
id cls age area
1 1 10 2
2 1 25 2
3 2 24 2
4 2 28 2
5 2 19 2
6 3 20 2
7 4 22 2
8 4 30 2
9 4 35 2
10 5 27 2
11 5 35 2
12 5 36 2
13 3 45 2
14 3 40 2
15 1 52 2
File-1+File-2 (appended):
id cls age area psuid
1 1 10 1 1100
2 1 25 1 1100
3 2 24 1 1200
4 2 28 1 1200
5 2 19 1 1200
6 3 20 1 1300
7 4 22 1 1400
8 4 30 1 1400
9 4 35 1 1400
10 5 27 1 1500
1 1 10 2 2100
2 1 25 2 2100
3 2 24 2 2200
4 2 28 2 2200
5 2 19 2 2200
6 3 20 2 2300
7 4 22 2 2400
8 4 30 2 2400
9 4 35 2 2400
10 5 27 2 2500
11 5 35 2 2500
12 5 36 2 2500
13 3 45 2 2300
14 3 40 2 2300
15 1 52 2 2100
Here, id=id number of respondent, cls=cluster code, age=age of respondent and area=study site, psuid=unique cluster code.
When we appended these two files we see that estimate of mean and standard error of mean of age is exactly same in single data file and in appended data files. But 95% CI and design effect is slightly different. Why it is happening?
We used the following STATA (version 8) commands for doing this exercise.
For appended data file:
. gen psuid= cls*100+ area *1000
. svyset, strata(area) psu(psuid)
. svymean age,by (area) ci obs deff
For single data file:
. svyset, psu(cls)
. svymean age,ci deff obs
Is there any one can help us on this issue?
Masud Reza
Operations Researcher (Statistician)
HIV/AIDS Program
ICDDRB:Centre for Health and Population Research (http://www.icddrb.org/)
Dhaka
Bangladesh
--
_______________________________________________
NEW! Lycos Dating Search. The only place to search multiple dating sites at once.
http://datingsearch.lycos.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/