|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Question about svyset command
Thomas,
1. The finite population corrections should affect only standard
errors and confidence intervals, not estimates of means, proportions,
or confidence intervals.
2. fpc's should be employed only for descriptive analyses
(proportions, means). These analyses describe the specific finite
population that you sampled: tort, contract, and real property trials
in the 75 counties.
If the purpose of your model is analystic: to develop predictions,
estimate odds ratios, compare proportions, or otherwise test
hypotheses, you should *omit* the finite population corrections. The
reasoning is interesting (Cochran, 1977, p.39): It is seldom of
scientific interest to ask if a null hypothesis (e.g. that two
proportions are equal) is exactly true in a finite population .
Except by a very rare chance, a null hypothesis will never be true.
You would discover this by enumerating the entire population. This
leads to the adoption of a "superpopulation" viewpoint, which is
taken by almost all statisticians these days. See also Deming(1966)
pp 247-261 "Distinction between enumerative and analystic studies";
Korn and Graubard (1999), p. 227.
In other words, you should use one -svyset- for describing the target
population and another for the logistic regression.
Two questions came to mind:
1. If a trial had >1 plaintiff or >1 defendant, would that not
increase the probability of a post trial motion? How are you going
to account for that?
2. For descriptive analyses, counties selected with certainty need
special treatment. Look up the "singleunit" option for -svyset-.
Good luck!
-Steve
References
Cochran, W. G. (1977). Sampling techniques (3ded.). New York: Wiley.
Deming, W. E. (1966). Some theory of sampling. New York: Dover
Publications.
Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys
(Wiley series in probability and statistics). New York: Wiley.
On Feb 19, 2009, at 12:04 AM, [email protected] wrote:
Iâm a beginner Stata user and have a question about the svyset
command in Stata that I hope someone can help me with.
For some background, I'm engaged in a logistic regression model
that examines the likelihood of either a plaintiff or defendant
filing a post trial motion. The database I'm working with is the
Civil Justice Survey of State Courts (CJSSC). The CJSSC provides
case level data for all t conclude in a sample of 46 of the
nation's 75 most populous counties in 2005. Data are collected on
about 8,000 trials in these 46 counties which are weighted to
represent about 10,500 trials concluded in the nation's 75 most
populous counties. I understand that one of the nice features of
Stata is that it allows you to take into account the sampling
structure of a dataset when doing logistic regression modeling.
Here is the Stata code that I used to take in account the sampling
structure of these civil trial data:
svyset sitecode [pweight=bwgt0], strata(strata) fpc(fpc1) || su2,
fpc(fpc2)
Where
Sitecode = County where the civil trial took place
Bwgt0 = Weights to weight the data from 46 to the 75 most populous
counties
Strata = Strata where the counties are located. The dataset has 5
strata
fpc1 = The probability of a county appearing in the sample. For
example, a county with a weight of 2 would have a 50% probability
of appearing in the sampl
e
su2 = Unique identifier that identifies the trials that occurred in
each of the 46 counties
Fpc2 = 1 for all 8,000 trials disposed in the 46 counties. I gave
fpc2 a value of 1 because I wanted to tell Stata that the trials
had a 100% probability of showing up in these 46 counties.
I think that I got the part of this programming that deals with the
first level of the sample design correct. Itâ??s the second level
that Iâ??m having some problems with At the second level of the
sample design, I'm trying to correct for the fact that I have data
for every civil trial concluded in the 46 counties. Basically, I
want to tell Stata that part of this sample is actually a census of
all trials concluded in the 46 counties in 2005. I understand Stata
has a finite population correction command that takes into account
the census like format of these data. The logistic regression
results were the same irrespective of whether I used the 1st or 2nd
stages in the sample design. I think this is telling me that Stata
is not correcting for the census like aspect of this sample. Can
anyone give me some guidance as to whether I'm correctly taking
into account the sampling structure of these data. In particular, I
would like to know whether I'm using the fpc2 factor correctly. Any
assistance you could give on this matter would be very much
appreciated.
Thanks
Thomas Cohen
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/