Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Andrew Dyck <tempmail@andrewdyck.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Randomly picking observations based on a certain condition |
Date | Wed, 13 Apr 2011 15:16:36 -0700 |
After you consider the comments from Nick and J, you wish to proceed with your analysis as you initially stated it, I think the following should work. Here I create some sample data with 50 observations and 5 groups (quintiles). See if this might work for your data the way I understood your question. I use the cutoff of 10 adults instead of 100 to keep the dataset small. * sample data set obs 50 egen group = seq(), from(1) to(5) gen adults = round( runiform()*5, 2 ) * random variable for sorting gen r = runiform() * create a cumulative sum of adults * sorting randomly within the group. bysort group (r): gen cumul_adults = adults[1] bysort group : replace cumul_adults = adults[_n] + cumul_adults[_n-1] if _n > 1 drop r * keep all obs below the cutoff keep if cumul_adults <= 10 Good luck, Andrew On Wed, Apr 13, 2011 at 2:02 PM, Nikhil Srivastava <nikhil.del85@gmail.com> wrote: > > I am not trying to actually sample households. As I wrote in my rely > to Nick,I am trying look at the effectiveness of a transfer program > targeted to adults of a household which has a certain exclusion error. > The exclusion error that we are assuming is that 1 percent of eligible > participants within each expenditure quintile do not receive the > benefits. In my sample within the first quintile 1 percent of the > total adults comes to around 100. Thus for the first quintile I need > to randomly assign non-beneficiary status to households so that the > total number of adults for these households comes to 100. Similarly I > have to pick randomly 1 percent of adults for each quintile and assign > them non-beneficiary status. In my previous mail I used the number 100 > as an example. Thanks > > Nikhil > > On Wed, Apr 13, 2011 at 1:06 PM, Joerg Luedicke > <joerg.luedicke@gmail.com> wrote: > > On Wed, Apr 13, 2011 at 3:17 PM, Nikhil Srivastava > > <nikhil.del85@gmail.com> wrote: > >> Hi, > >> > >> I have a dataset at the household level which contains the expenditure > >> details of a sample of households. The dataset also records the number > >> of adults within each household. I have divided this dataset into 5 > >> quintiles based on the level of expenditure. Now I need to randomly > >> select a set of observations within each quintile so that the sum of > >> the adults for those observations comes to 100. Could somebody please > >> help me in writing a code for this part? > >> > >> I would really appreciate any help in this regard. Thanks > > > > Do I understand that right, you want to sample households, and within > > each quintile of household expenditure, the number of household > > members among sampled households is supposed to add up to 100? Why > > would you do that? Why not just taking a random sample of households > > or a stratified sample with respect to household size, if that is a > > concern. That way, you would at least have a clear picture of the > > population you are targeting, whereas in the other case, this picture > > becomes pretty blurry, no? > > > > J. > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/