I hope I will not add to confusion.
Different kinds of weights refer to different situations.
If your data comes from a random sample with unequal selection, the sum of the weights should be the size of the population the sample was drawn from, they should be treated as pweights in Stata, and you should definitely use robust estimates of variance based on the number of (independently selected) observations that make your sample. Since most samples drawn with unequal selection probability do not follow a simple random design, chances are that the sample design includes stratification, clustering, or both. In such a case, you shoud use one of the - svy - estimation commands and provide the information on the sample design. This will affect your variances estimates and t-tests: stratification will increase the precision of your estimates whereas clustering will decrease it.
If your data file use weights to "pack" on a single record all observed cases that share the same combination of values for all the observed variables, these weights should be treated as fweights in Stata and you should not bother about the variance estimates. If your data comes from a randomized experiment or from simple random sampling with equal selection probability, the conventional estimates of variance should be fine.
What you read on being cautious with variance estimates when using weights was probably refering to sampling with unequal selection probability.
Benoit Laplante, Ph.D.
Centre interuniversitaire d'etudes demographiques
INRS-Urbanisation, Culture et Societe
Montreal Qc, CANADA
-----Message d'origine-----
De : [email protected] [mailto:[email protected]]
Envoy� : vendredi 23 mai 2003 16:22
� : [email protected]
Objet : st: Re: RE: Re: RE: statistical significance in a data set with
weighted observations
I apologize if my question was confusing. I know that the weights in my
sample are frequency weights. The problem is not in accounting for weights
in the regression but in the statistical significance of the coefficients. I
remember from literature that with weighted data one must be careful with
the interpretation of statistical significance, as t-statistics tend to be
overstated. I am curious if anyone knows how to account for this
statistically.
MM
----- Original Message -----
From: "Copeland, Laurel" <[email protected]>
To: <[email protected]>
Sent: Friday, May 23, 2003 4:09 PM
Subject: st: RE: Re: RE: statistical significance in a data set with
weighted observations
> As I understand things, the t-statistics for the parameter estimates
> correctly reflect the importance of your predictors in your analysis,
> assuming the sample was taken as represented by the weights. The effect is
> taken into account if you use -svy...- specifying weights (psu, strata).
>
> If you do not include the weights (so analyze the small sample as if it
were
> an entity unto itself), you will not get correct parameter estimates (or
> accompanying t-statistics) to generalize.
>
> The fact that the t-statistics are significant or insignificant is
> immaterial. Your approach need only be consistent with what the data
> actually represent.
>
> You mention fweights (frequency weights). I am assuming these are
1/pweight
> (probability weights) for your dataset. If this is not the case, you may
> need to find out more about your sample and how it was taken.
>
> Actually, you should find out as much as you can about your sample and how
> it was taken, regardless.
>
> -Laurel
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Friday, May 23, 2003 3:32 PM
> To: [email protected]
> Subject: st: Re: RE: statistical significance in a data set with
> weighted observations
>
> Thank you!
>
> I do account for the weights using fweights in my regression, but the
> weights increase the impact of observations, and thereby impacting the
> t-statistics making the effect that all explanatory variables are
> significant. Is there a way of accounting for that effect on t-stats?
>
> thanks,
>
> Mikhail
> ----- Original Message -----
> From: "Copeland, Laurel" <[email protected]>
> To: <[email protected]>
> Sent: Friday, May 23, 2003 3:05 PM
> Subject: st: RE: statistical significance in a data set with weighted
> observations
>
>
> > The data can be weighted to reflect the sampling design. The sampling
> > design is complex to give you a sample that is representative of the
> > underlying population, and to allow inferential statistics. The complex
> > sampling lets you get a good sample of a large population of unlisted
> > smaller units (e.g., all US residents), based on a complete list of
larger
> > units (e.g., US census tracts). The weight is the inverse of the
> > probability of getting sampled. In your sample, individual units had
> > differing probabilities of being sampled, so they have differing
weights.
> > The calculated size of the population that is represented by your sample
> > will be produced by Stata -svy-- commands. To analyze such a sample
> > properly, you must include the PSU, strata, and weights in your
analysis,
> if
> > they exist. Without the weights, the estimates you get will be biased.
> > Sometimes weights are used to allow post-stratification (for matching to
a
> > known distribution) or to deal with non-response.
> > -Laurel
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > Sent: Friday, May 23, 2003 2:52 PM
> > To: [email protected]
> > Subject: st: statistical significance in a data set with weighted
> > observations
> >
> > Dear Stata Users,
> >
> > I have encountered this small problem and since I am not sure about how
to
> > address it myself I've decided to ask you all. Thank you in advance for
> any
> > advice you might have for me.
> >
> > I am working with a dataset that has weights for all observations, and
> these
> > weights exhibit large variation, from 1 to over 500. When I run a
> > nonweighted estimation my t-statistics are relatively small, but when
> > weights are introduced, the t-statistics jump. Is there a way of
> determining
> > the true statistical significance of coefficients in this case?
> >
> > Thanks again for any help you might have,
> >
> > MM
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/