Dear Statalisters,
thanks to Brian Poi, some days ago I solved a problem in drawing random
samples from a given dataset with Stata 9.2/SE.
I would like to share Brian's kind reply with whom might be interested in
the same topic.
I also take the chance to thank Martin Weiss one more time for his precious
support along the way.
Kind Regards,
Carlo
-----Messaggio originale-----
Da: Brian P. Poi [mailto:[email protected]]
Inviato: lunedì 28 settembre 2009 18.31
A: Carlo Lazzaro
Oggetto: Re: R: st: odd results after insample
> I take the chance to ask you whether Stata 9.2 SE (I don't know about
other
> more recent releases) can be programmed to run -sample- repeatedly (and
not
> just one time) for drawing, say, 10,000 random samples from a given
dataset,
Yes, you could do
. sysuse auto
. sample
. sample
. sample
or put -sample- in a -forvalues- loop. But you'd have a hard time
convincing me that's the right thing to do.
Or, do you mean something like this:
set seed 1
sysuse auto
gen mean = .
quietly forvalues i = 1/74 {
preserve
sample 50
summ mpg
scalar mpgm = r(mean)
restore
replace mean = mpgm in `i'
}
su mean
di %20.16f r(mean)
That is perfectly valid, as long as you keep in mind that -sample- samples
without replacement. On the other hand,
sysuse auto,clear
set seed 1
bootstrap mu = r(mean), size(50) reps(74) saving(mybs, replace): summ mpg
use mybs, clear
summ mu
di %20.16f r(mean)
will give you a slightly different answer because -bootstrap- samples with
replacement.
Thus, the $64,000 question is whether you want to sample with or without
replacement.
*************************************************************************
___ ____ ____ ____ ____
/__ / ____/ / ____/ Brian P. Poi, Ph.D.
___/ / /___/ / /___/ Senior Economist
StataCorp LP
4905 Lakeway Drive
College Station, TX 77845
[email protected]
*************************************************************************
On Mon, 28 Sep 2009, Carlo Lazzaro wrote:
> Dear Brian,
> thanks a lot for your kind reply. I was actually banging my head against
the
> wall in trying to understand what went wrong with my code lines and you
shed
> light on this.
> I take the chance to ask you whether Stata 9.2 SE (I don't know about
other
> more recent releases) can be programmed to run -sample- repeatedly (and
not
> just one time) for drawing, say, 10,000 random samples from a given
dataset,
> no matter the underlying distribution: in fact, this is the need I am
> currently facing.
> I am very fond of -simulate- as far as my programming skills allow me to
> invoke it, but it requires Stata users to know (or to mimic) the
underlying
> distribution of the population.
>
> Thanks a lot again for your kindness and for your time.
>
> Kind Regards,
> Carlo
> -----Messaggio originale-----
> Da: Brian P. Poi [mailto:[email protected]]
> Inviato: lunedì 28 settembre 2009 16.07
> A: Carlo Lazzaro
> Oggetto: Re: st: odd results after insample
>
> Carlo,
>
> I don't think anyone on statalist actually answered the question of why
> your code doesn't produce 2000 observations like you expect. It had me
> stumped for a bit, so I just had to try the code myself to figure it out.
>
> Here's why. In the first part of your loop you randomly sort the data and
> summarize the first 20 observations. In the second part of your loop you
> try and store the mean and standard deviation in the `i'th observation,
> assuming that `i' runs from 1 to 2000 so that you will fill in the 1st
> observation, then the 2nd, and so on up to the 2000th. But that won't
> work, because in every iteration of the loop you change the order of your
> data. Therefore, you essentially are sticking the mean and s.d. into a
> random observation of your dataset. Given the luck of the draw, some
> observations of ln_g_20 are being filled in more than once, and others
> never do get filled in like you expect.
>
> Also, note that because you generate A for only 972 observations, your
> mean and s.d. will on average will be computed using (972/2000)*20 = 9.72
> observations, not 20 observations.
>
> You could make your loop work with -preserve- and -restore, preserve-
> statements or perhaps with some contorted logic, but it's easier to just
> let -simulate- do it.
>
> *************************************************************************
> ___ ____ ____ ____ ____
> /__ / ____/ / ____/ Brian P. Poi, Ph.D.
> ___/ / /___/ / /___/ Senior Economist
> StataCorp LP
> 4905 Lakeway Drive
> College Station, TX 77845
> [email protected]
> *************************************************************************
>
> On Sat, 26 Sep 2009, Carlo Lazzaro wrote:
>
>> Dear Statalisters,
>> as an alternative to - simulate - , I have written the following do file
>> (for Stata 9.2/SE) to draw 2000 random samples, 20 observations each,
from
> a
>> normal distribution:
>>
>> drop _all
>> set more off
>> set obs 2000
>> obs was 0, now 2000
>> g double ln_g_20=.
>> g double ln_sd_g_20=.
>> set seed 999
>> qui gen A=5.37 + 1.19*invnorm(uniform()) in 1/972
>> qui forvalues i = 1(1)2000 {
>> qui gen ln_20`i'=A
>> qui generate random`i' = uniform()
>> qui sort random`i'
>> qui generate insample`i' = _n <= 20
>> qui sum ln_20`i' if insample`i' == 1
>> replace ln_g_20=r(mean) in `i'
>> replace ln_sd_g_20=r(sd) in `i'
>> drop ln_20`i'
>> drop random`i'
>> drop insample`i'
>> }
>> drop A
>>
>> However, as a result I have obtained 1721 observations instead of the
>> expected 2000.
>>
>> sum ln_g_20 ln_sd_g_20
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -------------+--------------------------------------------------------
>> ln_g_20 | 1271 5.314033 .3800687 3.79247 6.587941
>> ln_sd_g_20 | 1271 1.101084 .2835007 .0260279 2.161299
>>
>>
>> Besides, results are even more puzzling when I increase the number of
>> samples (again 20 observations each), in that I get a different number of
>> observation for ln_g and ln_sd_g.
>>
>> Comments are gratefully acknowledged.
>>
>> Thanks a lot for your kindness and for your time.
>>
>> Kind Regards,
>> Carlo
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/