Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Draw a random sample of my data...
From
[email protected]
To
[email protected]
Subject
Re: st: Draw a random sample of my data...
Date
Thu, 4 Oct 2012 01:46:16 +0200 (CEST)
Thank you very much Nick for your answer. The "stable" option
helped solving my problem. However a new question emerged:
I have a
little problem with generating a new dataset. I first use the command
"sample" and "set seed" to generate a new dataset.
But I still have problemswith integrating my random sample dataset within
the original paneldata. The reason is that US firms account for more than
50% of the dataset, this affects the cross-country results very strong.
However, with respect to the world wide industry business volume, US
firms account 29%. Therefore, I draw a random sample, in which I randomly
account 29% of the US firms in the dataset. I have a panel data with
countryID firmID and years. After running the random sample and setting
the seeds, I would like to merge the randomly generated dataset of US
firms (with random firmID and random years) with my original panel data
(with countryID firmID and years). But: how can I merge the dataset in
which only the random sample of US firms is considered (for additional
years within the original paneldataset) and the other US fimrs are
dropped. How can I genetrate a variable, in which I can say that only
"the random" US firms can be considered within the original
panel dataset for all years?
Please help..Thank you in
advance...Mehmet Altun
My commands look like:
use
all_data8;
by firmID, sort: gen firms = _n;
keep if
firms==1;
keep if countryID==244 (USA);
sort firmID,
stable;
set seed 260581;
sample 63;
sort year;
save usfirms_1, replace;
> First note that
>
> sort countryID year
>
> does nothing useful because you undo it by
>
> by firmID, sort: gen firms = _n
>
> Now focus on that last command. It will sort your data by -firmID- but
> precisely which observation comes first within -firmID- is not
> reproducible with that syntax. So which observations are selected by
>
> keep if firms == 1
>
> may differ. Nothing that you do afterwards will undo that
> indeterminacy. You can ensure consistency by e.g. -sort, stable-.
>
> Here is a demo:
>
> . sysuse auto, clear
>
> . bysort rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Spirit"' `"Cad. Deville"' `"Dodge St. Regis"' `"Pont. Firebird"'
> `"Subaru"' `"VW Rabbit"'
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . bysort rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"Buick Century"' `"Chev. Monte Carlo"' `"Ford Fiesta"' `"Honda
> Accord"' `"Pont. Firebird"' `"Pont. Phoenix"'
>
> Different -make-s come first.
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . sort rep78, stable
>
> . by rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
> `"Dodge Colt"' `"Olds Starfire"'
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . sort rep78, stable
>
> . by rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
> `"Dodge Colt"' `"Olds Starfire"'
>
>
> Nick
>
> Mehmet Altun
>
>> I will code a subset of my data. I used the "sample"
>> command..However, I would like to fix my random sample, so that I can
>> generate the same sample again..For this I used the "set seed" command.
>> However, if I rerun the dofile I get different samples in my random
>> sample. Here is my dofile:
>>
>> clear;
>> use all_data8;
>> sort countryID year;
>>
>> by firmID, sort: gen firms = _n;
>> keep if firms==1;
>>
>> by countryID, sort: egen countryfirms = total(firms);
>>
>> keep if countryID==244;
>>
>> set seed 260581;
>>
>> sample 63;
>>
>> save usfirms_1, replace;
>>
>>
>>
>> Is there a bug in stata, or what is wrong? Please help.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/