Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Draw a random sample of my data...
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Draw a random sample of my data...
Date
Thu, 27 Sep 2012 15:37:14 +0100
First note that
sort countryID year
does nothing useful because you undo it by
by firmID, sort: gen firms = _n
Now focus on that last command. It will sort your data by -firmID- but
precisely which observation comes first within -firmID- is not
reproducible with that syntax. So which observations are selected by
keep if firms == 1
may differ. Nothing that you do afterwards will undo that
indeterminacy. You can ensure consistency by e.g. -sort, stable-.
Here is a demo:
. sysuse auto, clear
. bysort rep78 : gen which = _n == 1
. levelsof make if which
`"AMC Spirit"' `"Cad. Deville"' `"Dodge St. Regis"' `"Pont. Firebird"'
`"Subaru"' `"VW Rabbit"'
. sysuse auto, clear
(1978 Automobile Data)
. bysort rep78 : gen which = _n == 1
. levelsof make if which
`"Buick Century"' `"Chev. Monte Carlo"' `"Ford Fiesta"' `"Honda
Accord"' `"Pont. Firebird"' `"Pont. Phoenix"'
Different -make-s come first.
. sysuse auto, clear
(1978 Automobile Data)
. sort rep78, stable
. by rep78 : gen which = _n == 1
. levelsof make if which
`"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
`"Dodge Colt"' `"Olds Starfire"'
. sysuse auto, clear
(1978 Automobile Data)
. sort rep78, stable
. by rep78 : gen which = _n == 1
. levelsof make if which
`"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
`"Dodge Colt"' `"Olds Starfire"'
Nick
Mehmet Altun
> I will code a subset of my data. I used the "sample"
> command..However, I would like to fix my random sample, so that I can
> generate the same sample again..For this I used the "set seed" command.
> However, if I rerun the dofile I get different samples in my random
> sample. Here is my dofile:
>
> clear;
> use all_data8;
> sort countryID year;
>
> by firmID, sort: gen firms = _n;
> keep if firms==1;
>
> by countryID, sort: egen countryfirms = total(firms);
>
> keep if countryID==244;
>
> set seed 260581;
>
> sample 63;
>
> save usfirms_1, replace;
>
>
>
> Is there a bug in stata, or what is wrong? Please help.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/