Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Question about the threshold in subsample size
From
"Yigit Aydede" <[email protected]>
To
"statalist" <[email protected]>
Subject
st: Question about the threshold in subsample size
Date
Fri, 06 Sep 2013 19:48:08 +0000
Hello,
I apologize for asking seemingly a simple question, if anybody help me on this I greatly appreciate.
My dataset is too big to run clogit (fixed-effect) in Stata. I have more than 800K observations over 282 regions (clusters). My dependent variable 1 for movers (4% of the total) across regions 0 for non-movers .
If I reduce (resample) the data size, I can run clogit on 282 regions.
Since the success rate is 4 percent, I would like to resample by
sample 20 if moved==0, by(region),
where moved is the dependent variable.
Basically I only resample nonmovers and keep the movers the same. My model is trying to find the determinants of moving decisions. So I have a bunch of variables that control individual characteristics. It seems to me that resampling only nonmovers reduces the power of nonmovers on estimations. Am I right?
I would also do
sample 20, by(region)
I pick 20 here because only 20% gives me a right size that Strata can handle in clogit.
Is there any "right" way that I can find out a threshold size for the subsample, instead of using 20%.
I thank you for your time and help. Any advice is much appreciated
Best,
Yigit Aydede
Saint Mary's University
Halifax, NS, B3H 3C3
Canada
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/