Martin Weiss <[email protected]>
SPSS is using the wrong type of weight, and therefore will give you
incorrect standard errors. See -help weights- and -help svy- and the
manuals for more.
Perhaps the large size of the Stata file is due to all variables being
stored as doubles? Try -compress- on an extract and see -help
datatypes-.
Note that -mean- restricts to obs where all vars are nonmissing, so
instead of e.g.
ds, has(type numeric)
loc num `r(varlist)'
mean `num'
try
ds, has(type numeric)
loc num `r(varlist)'
foreach v of loc num {
mean `v'
}
or just use -summarize- with aweights or pweights
(pweights=aweights+_robust so point estimates are identical, but
variance estimates differ).
On Wed, Apr 30, 2008 at 10:57 AM, Martin Weiss
<[email protected]> wrote:
> Dear Statalisters,
>
> can anybody give me a clue as to the array of weighting options in Stata? I
> have an important project where I would really like to make headway...
>
> My dataset features a size of 2.4 GB as .csv. When I translate this into
> SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB
> (!). Anyway, I usually pick out the interesting variables beforehand because
> Stata is unable to open the entire dataset. The first column of the data
> contains samplingweights. The dataprovider ships a pdf with the descriptives
> for the marginal distributions of the variables in the population so I know
> the true values.
>
> Now here lies the rub: when I weight -summarize- with analytic weights, the
> approximately correct mean and standard deviation pop out. When I let Stata
> estimate the mean with the -mean- command, with analytic weights attached in
> the same fashion, I get widely differing results for the point estimate of
> the mean, far from the true values. In SPSS, I simply go to -weight cases-
> and everything comes out correct.
>
> Do I have to -svyset- the data? When I try to -frequency weight- the data,
> Stata complains that non-integers are not allowed while SPSS seems to not
> quarrel with them. Why is it that SPSS needs one command at the beginning of
> the session while Stata has a (differing) tab dedicated to weighting for
> every single command?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/