Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Creating a data subset with subjects chosen at random
From
Amal Khanolkar <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Creating a data subset with subjects chosen at random
Date
Fri, 15 Jun 2012 13:15:47 +0000
Hello all,
I have a large dataset with almost 3 million observations. The following is a description of the country of origin of the subjects:
mother's country of |
birth | Freq. Percent Cum.
--------------------+-----------------------------------
Sweden | 2,593,143 86.69 86.69
Western Europe + NA | 71,736 2.40 89.09
Finland | 108,326 3.62 92.71
Eastern Europe | 15,636 0.52 93.23
Poland | 18,179 0.61 93.84
F. Yugoslavia | 34,110 1.14 94.98
Arab league | 8,687 0.29 95.27
Iraq | 13,004 0.43 95.71
Lebanon | 12,295 0.41 96.12
Somalia | 7,122 0.24 96.36
Syria | 9,360 0.31 96.67
Turkey | 22,083 0.74 97.41
Iran | 11,717 0.39 97.80
South Asia | 9,341 0.31 98.11
Ethiopia+Eritrea | 6,917 0.23 98.34
East asia | 23,162 0.77 99.12
Latin America | 10,111 0.34 99.46
Chile | 10,512 0.35 99.81
Africa | 5,759 0.19 100.00
--------------------+-----------------------------------
Total | 2,991,200 100.00
- I would like to create a subset of the above dataset that consists of 1. 100,000 subjects, 2. With the same distribution of subjects by country of origin as above in the parent dataset. 3. Tell Stata to choose the subjects at random. The dataset of course has several other variables. But I would like to define the new data subset based on the above country of origin as it is my main exposure variable.
- Any idea how I go about doing this?
Thanks!
Regards,
/Amal.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/