Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Sample Wegihts |
Date | Tue, 9 Mar 2010 15:13:20 -0600 |
On Tue, Mar 9, 2010 at 12:35 PM, Jason Dean, Mr <jason.dean@mail.mcgill.ca>wrote: > I have a quick question. I currently have a 5% random sample of Canada. I > also have 4 extra random samples of only the four largest urban cities (I > have dropped duplicate observations between samples). > > What is the best strategy to include these extra samples and keep the > sample representative of the country. I intend to conditon on these cities > with dummy variable in my regression. However, I would prefer to use sample > weights but I am not sure the best way to go about creating them. Any > suggestions would be greatly appreciated. 1. Keep strata identifiers from the original data -- say stratum variable 2. Identify samples in say sample variable, so that 1 is your microcensus, and 2 through 5 are extra samples. 3. Your new combined strata should be egen new_strata = group( sample strata ) 4. Your new PSUs should be the original PSUs. They should work as is, but just to be safe, egen new_PSU = group( sample strata old_PSU ) 5. Now, the weight variables are tricky. If you don't have any weight adjustments (and I doubt that), the weights are inverse probabilities of selection. If the 5% sample and extra samples are independent of one another (meaning, the information that was used to design the extra samples does not rely on any pieces on which the 5% sample relies... I doubt that though), then overall P[ selection ] = P[ to be selected in the first sample ] + P[ to be selected in the second sample ] - P [ to be selected in both ] = 1 - (1-P[first])*(1-P[second]) So your weights should become lower in the joined sample (in those cities for which extra samples were collected), as Michael indicated. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/