Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population)
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: semi-random sampling (how to impose properties of one population onto a subsample of a different population)
Date
Mon, 15 Aug 2011 17:13:44 -0400
You are very welcome, Ekaterina, but note Austin's follow-up. If you wish to make these two groups comparable for analysis, there are better, more comprehensive, and more defensible approaches. Austin mentioned propensity score weighting, but -margins- might be superior if you are not interested in causal effects of income.
Steve
On Aug 15, 2011, at 4:12 PM, Ekaterina Hertog wrote:
Thank you very much! Sorry for posing the original question imprecisely!
ekaterina
On 07/08/2011 18:32, Steven Samuels wrote:
>
> Sorry, I misunderstood. Here's code that you can adapt. Note that you set the sample size you want in the first line
>
> *************CODE BEGINS*************
> scalar sampsize = 500
> set seed 842655
>
> clear
> /* Input Frequencies for External Population
> You can get these from -contract-
> in the original external data set:
> "contract agegp region, freq(freq1)"
> */
> input agegp region freq1
> 1 1 501
> 1 2 415
> 2 1 1809
> 2 2 3003
> 3 1 1288
> 3 2 1400
> end
> egen tot1 = total(freq1)
> gen ssize = round(sampsize*freq1/tot1)
> /* Check Frequencies */
> tab agegp region [fw=freq1], cell
> tab agegp region [fw=ssize], cell
>
> sort agegp region
> tempfile t1
> save `t1'
> /* Create Data set to be sampled from the auto data */
>
> sysuse auto, clear
> expand 100
> rename rep78 agegp
> rename foreign region
>
> recode agegp 2=1 5=1 .=1 3=2 4=3 // values 1,2,3
> replace region = region +1 // values 1,2
>
>
> /* Merge with external counts */
> sort agegp region
> merge m:1 agegp region using `t1'
> tab _merge
> drop _merge
>
> egen stratum = group(agegp region)
> levelsof stratum, local(levels)
> tempfile t2
> save `t2'
> foreach x of local levels{
> use `t2'
> keep if stratum==`x'
> gen u = uniform()
> sort u
> keep if _n<=ssize
> tempfile td`x'
> save `td`x''
> }
>
> clear
> tempfile t0 //empty data set to append to
> gen dummy=1
> save `t0'
> foreach x of local levels{
> append using `td`x''
> }
> drop dummy
> /* Check frequencies again */
> tab agegp region , cell missing
> save sample1, replace
> **************CODE ENDS**************
>
> On Aug 7, 2011, at 5:05 AM, Ekaterina Hertog wrote:
>
> Dear Steven,
> thank you for your help, however it does not fully solve my problem. Your proposed solution will allow me to roughly preserve the population percentages from the whole sample into a subsample. What I need however, is to impose populations percentages found in a different dataset on a subsample I am creating. Essentially i have two datasets: one of high income women and one of middle income women. High income women tend to be older and are more likely to live in the capital. I need to create a subsample of a dataset of middle income woemn which would match the high income women dataset on age and location characteristics.
> Does anyone know how to do this in Stata 11?
> Ekaterina
>
> On 07/08/2011 09:08, Steven Samuels wrote:
>> The following code shows how to take a 10% sample within categories formed by two variables. The sample and whole population percentages will be approximately the same, with the agreement better for larger within-cell sample sizes.
>>
>> Steve
>>
>> *************CODE BEGINS*************
>> sysuse auto, clear
>> expand 6
>> set seed 842655
>> recode rep78 1/2=5 .=5
>> tab rep78 foreign, cell
>> sample 10, by(foreign rep78)
>> tab rep78 foreign, cell
>> **************CODE ENDS**************
>>
>>
>>
>> On Aug 6, 2011, at 4:23 PM, Ekaterina Hertog wrote:
>>
>> Dear all,
>> I need to take a subsample of observations from a big dataset making sure that the people in the subsample have a given geographic and age profile. I need to make sure that, say, 50% of people in the subsample come from the capital and 50% from other towns. Within each of these 2 locations I want to preserve a certain age structure: say in a city: 3 people ages 23, 4 people aged 24 …
>> Within those geographic and age profiles I want to select the observations randomly. Is it possible to do that in Stata 11? Any thoughts on how I would go about it?
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/