Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Questions on weights on regressions


From   Tak-wai Chau <[email protected]>
To   [email protected]
Subject   st: Questions on weights on regressions
Date   Tue, 15 Nov 2005 19:30:06 -0500

Hi, statalist users,

Since I cannot get a reply on this, I would like to post it again to see if anyone can help. Thanks!

I have two questions about the use of weights in regression.

First, I have a question about using aweight in regression. As I understand from this:
http://www.stata.com/support/faqs/stat/crc36.html
The (slope) coefficients and se estimates of regression using aweights (=n) and those of regression with variables transformed by multiplying sqrt(n) are the same. However, what I have got is not the same. Maybe I have misunderstood the above page. If so, how and why?

I have attached the codes and results at the end of the mail.

My second question is as follows: I am working with the US census data and I am pooling data from 1960 to 2000. Due to the huge data size and following some other researchers working on it, I am going to use group mean data to run certain regressions, with aweight=cellsize (number of original observations it is averaged from.)

First, I should expect a loss of efficiency, am I correct?

Second, a problem is that individual observations contains a person weight due to survey design, especially after 1990. One suggestion is to use this person weight (as pweight) to calculate the cell means and use aweight=cellsize to do the regression on cell means, where cellsize is the number observation these means are derived from, without regarding the person weight.

I would like to ask if it is a good way, and if there is another better way to deal with this situation, say should we take into account the person weight to construct the weight in the regression stage?

Another question I haven't asked last time: will it be generally more efficient if I collapse into more number of cells to run the regression?

Thank you very much for your assistance and opinion!

Regards,
Tak Wai

My codes used are:
. reg y x1 x2 [aw=celsize]
(sum of wgt is 3.0000e+02)
Number of obs = 20
(something omitted...)
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 2.130015 .5890904 3.62 0.002 .8871432 3.372887
x2 | 1.364704 .7330079 1.86 0.080 -.181808 2.911215
_cons | 1.127888 .1742532 6.47 0.000 .7602464 1.495531
------------------------------------------------------------------------------

. gen yw=sqrt(celsize)*y

. gen x1w=sqrt(celsize)*x1

. gen x2w=sqrt(celsize)*x2

.
. reg yw x1w x2w
Number of obs = 20
[also something omitted...]
------------------------------------------------------------------------------
yw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1w | 2.024784 .5655494 3.58 0.002 .831579 3.217989
x2w | 1.476072 .7049143 2.09 0.052 -.0111669 2.963311
_cons | 4.436417 .6484838 6.84 0.000 3.068236 5.804598
------------------------------------------------------------------------------




*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index