I am working with Stata 8. I am working with survey data (DHS data) and am studying fertility behaviour of families. I have complete birth history data for each family in the sample. I wish to test the following hypothesis: girls have, on average, larger number of sibling.
This is how I proceed. I calculate the number of boys and girls in each family (*nboy* and *ngirl*); then, I do:
quietly gen alive = nboy + ngirl
quietly gen sibg = (alive - 1) if ngirl > 0
quietly gen sibb = (alive - 1) if nboy > 0
Thus, *sibg* is the number of sibling for girls and *sibb* is number of sibling for boys. Then, I do:
gen smpwt = v005/1000000
svyset [pweight=smpwt], psu(v021) strata(v022)
svymean sibg, subpop(ngirl)
matrix t1 = e(b)
matrix t2 = e(V)
local t11 = e(N)
svymean sibb, subpop(nboy)
matrix t3 = e(b)
matrix t4 = e(V)
local t33 = e(N)
gen sibeff = t1[1,1] - t3[1,1]
local g1 = (t1[1,1] - t3[1,1])/sqrt((t2[1,1]/`t11')+(t4[1,1]/`t33'))
Thus, *sibeff* gives me the difference in the average number of sibling for girls and boys and *g1* gives me the t-statistic for testing whether *sibeff* is significantly different from zero.
I am getting the t-statistic as much larger than I expected; it is also much smaller if I do not correct for survey design and simply assume that I have a simple random sample. This is making me a little suspicious. My questions:
1) Am I making any mistake in my computation or reasoning?
2) Is there a better way to conduct this t-test?
I looked at: http://www.ats.ucla.edu/STAT/stata/faq/svyttest.htm
but did not find it useful.
Thanks in advance.
Deepankar
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/