|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Dealing with survey data when the entire population is also in the dataset
From |
"Michael I. Lichter" <[email protected]> |
To |
[email protected] |
Subject |
Re: st: Dealing with survey data when the entire population is also in the dataset |
Date |
Fri, 24 Jul 2009 23:24:10 -0400 |
Margo,
1. select your sample and save it in a new dataset, and then in the new
dataset:
a. define your stratum variable -stratavar- as you described
b. define your pweight as you described, wt = 1/(sampling fraction) for
each stratum
2. combine the full original dataset with the new one, but with
stratavar = 1 for the new dataset and wt = 1 and with a new variable
sample = 0 for the original and =1 for the sample, and then
a. -svyset [pw=wt], strata(stratavar)-
b. do your chi square test or whatever using svy commands, e.g., -svy:
tab var1 sample-
Michael
Margo Schlanger wrote:
Hi --
I have a dataset in which the observation is a "case". I started with
a complete census of the ~4000 relevant cases; each of them gets a
line in my dataset. I have data filling a few variables about each of
them. (When they were filed, where they were filed, the type of
outcome, etc.)
I randomly sampled them using 3 strata (for one strata, the sampling
probability was 1, for another about .5, and for a third, about .75).
I end up with a sample of about 2000. I know much more about this
sample.
Ok, my question:
1) How do I use the svyset command to describe this dataset? It would
be easy if I just dropped all the non-sampled observations, but I
don't want to do that, because of question 2:
2) How do I compare something about the sample to the entire
population, just to demonstrate that my sample isn't very different
from that entire population on any of the few variables I actually
have comprehensive data about. I could do this simply, if I didn't
have to worry about weighting:
tabulate year sample, chi2
But I need the weights. In addition, I can't simply use weighting
commands, because in the population (when sample == 0), everything
should be weighted the same; the weights apply only to my sample (when
sample == 1). And I can't (so far) use survey commands, because I
don't know the answer to (1), above.
NOTE: Nearly all the variables I care about are categorical: year of
filing, type of case. But it's easy enough to turn them into dummies,
if that's useful.
Thanks for any help with this.
Margo Schlanger
______________________
Professor of Law
University of Michigan Law School
Director, Civil Rights Litigation Clearinghouse
(http://clearinghouse.wustl.edu)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/