Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: convergence of sample mean using gsample with weights
From
tshmak <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: convergence of sample mean using gsample with weights
Date
Wed, 15 May 2013 16:32:20 +0800
Hi,
I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help.
Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop.
Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i,
Then res = sum_i X_i
If X_i are independent across loops, then:
E(res) = sum_i E(X_i)
Var(res) = sum_i Var(X_i)
Since you're sampling from your original data, let's say
E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted)
Suppose:
Var(X_i) = Var(X) for all i
then
Var(res) = N0 Var(X)
Var(res/N0) = Var(X)/N0
E(res/N0) = m
Therefore, it appears that res/N0 should converge to m.
Is that what's happening?
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Olga Gorbachev
Sent: 14 May 2013 01:51
To: [email protected]
Subject: st: convergence of sample mean using gsample with weights
Dear List servers,
We are trying to match the means of the subsample that is randomly
generated using gsample with weights with that of the original sample.
but are not successful, the differences in means are persistent, even
after over 5000 iterations.
The program we are running to generate a random sample and the table
of differences in means are below:
local res = 0
local N0 = 1000
di "i = " _c
forv i = 1/`N0' {
di " `i'" _c
cap: drop wtn2
qui: gen wtn2 = .
qui: levelsof year, local(years)
foreach yr of local years {
su work [aw = wt] if year == `yr', meanonly
local pct = 1 - r(mean)
qui: count if year == `yr' & work & wt > 0
local n = r(N) * `pct'
gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace
// qui: gen smpl`yr' = smpl
qui: replace wtn2 = wtn * smpl if year == `yr'
}
su nokid if year == 2009 [aw = wtn2], meanonly
local meannew = r(mean)
su nokid if year == 2009 & !work [aw = wt], meanonly
local meanold = r(mean)
local res = `res' + `meannew' - `meanold'
}
di `res' / `N0'
After 5751 iterations, the mean differences are persistent: (white,
nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var)
white ed nokid wife age RMSE
1968 .07075803 .02760528 -.07051057 .10025028 -1.9695697 .64914917
1969 .0685191 .0043999 .00714388 .07798387 -1.0421818 .44748337
1970 .06611464 .05476483 -.02358097 .077666 -1.8464169 .76425403
1971 .06971375 .02524083 -.04641226 .07669907 -1.9877308 .84812203
1972 .06842085 .00459005 -.01252929 .07209953 -1.4438688 .58143546
1973 .07875147 -.0065409 -.00719551 .0762982 -.84213075 .76031927
1974 .07394796 .01265153 .03503028 .06037437 .47233679 .45948809
1975 .0754228 -.02080965 .04125415 .06711441 1.3878045 .44676919
1976 .07922582 -.0270845 .0703499 .08149621 3.0252375 1.2009947
1977 .07757246 -.06248362 .13932287 .05320747 4.381814 1.9495654
1978 .0712201 -.10770348 .09020478 .07284452 3.3190634 1.0499406
1979 .0867201 -.11178738 .11253834 .07264209 2.972378 1.5287306
1980 .07419313 -.03035967 .13589319 .06733552 4.0215276 1.7365936
1981 .07878431 -.01949136 .17420796 .04241048 5.3660359 2.8373346
1982 .0829203 -.11727873 .17645938 .05927291 5.4543346 2.3774178
1983 .07845573 -.10130641 .09725345 .0687734 2.3112865 1.4648557
1984 .09015502 -.07159415 .09821572 .0418674 4.0170757 2.2326577
1985 .07475118 -.17578234 .15213582 .06892136 4.8550365 2.4831803
1986 .09253893 -.20126191 .16269138 .06200215 4.8770742 1.7727842
1987 .08100237 -.17625041 .14548996 .05864067 3.6014147 1.6212222
1988 .10134555 -.08595601 .20253243 .08289725 6.7326522 2.8191897
1989 .08155963 -.10436591 .15625212 .02222005 3.9071876 1.0379599
1990 .09724568 .03089819 .1811577 .08476095 4.8926164 2.3862564
1991 .08948172 -.03575608 .2627551 .08514362 8.0915346 3.8331833
1992 .08865055 -.1055572 .25235049 .10462895 7.4632178 3.3645744
1993 .0951815 -.07997661 .17046405 .06604482 4.2573245 2.0752016
1994 .04873715 -.16646878 .07550069 .03458139 .98317415 .39188131
1995 .06876277 -.13850863 .13320029 .02768267 2.4135662 .66990443
1996 .00856876 -.21758791 .08564262 -.00698818 1.4966446 .57737936
1997 .03627838 -.15611398 .15043455 .05398246 1.6478452 1.009571
1999 .11814375 -.00525869 .02250082 .08790646 .80302795 .41872088
2001 .08085215 .03209268 .00536218 .03539566 .28864816 .08032318
2003 .01760212 -.0463809 .07889079 .03968931 3.0012058 2.1627047
2005 .01684409 -.07215183 .09026966 .0235811 2.3966124 .67002878
2007 .03959067 .03748774 .09446534 .06242606 2.9837086 .87488072
2009 .02200718 -.04037616 .05716718 .05698124 2.5813597 1.7555616
Total .08024411 -.08345948 .0901088 .06647665 2.531596 1.108369
Does it make sense that the means don't converge? Is there a way to
force the random subsample to have the same means as the main data
set?
thank you in advance,
--
Olga Gorbachev
Assistant Professor of Economics
University of Delaware
Newark, DE 19716
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/