Lee Sieswerda <[email protected]> took my suggestion to one of George
Hoffman's question in a completely surprising direction, at least to me. I
was answering George's 2nd question, whereas Lee was answering George's 1st.
Even so, Lee found an interesting way to twist my answer toward the 1st
question.
Taking great liberty with Lee's response, he basically suggests using -ci- to
get the CIs for two different categories and graphing those along with the
original data using -twoway-. (Lee actually used -egen- but notes that the
results are the same as -ci-.).
Lee then picked up on my suggestion to use -predictnl- to get CIs for
INDIVIDUAL observations after a -regress-ion and then cleverly used an
indicator variable as the regressor so that those CIs would be the same for
all observations in a group. He then compared the results of -ci- to those
from -predictnl- and found that they were different.
Using the auto data, Lee gets the following CIs using -predictnl- after
-regress-.
regress weight foreign
predictnl yhat=predict(), ci(lb ub)
. bysort foreign: sum ub lb
____________________________________________________________________________
-> foreign = Domestic
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
ub | 52 3491.338 0 3491.338 3491.338
lb | 52 3142.893 0 3142.893 3142.893
____________________________________________________________________________
-> foreign = Foreign
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
ub | 22 2583.76 0 2583.76 2583.76
lb | 22 2048.058 0 2048.058 2048.058
Lee notes that these are different from -ci-,
. ci weight, by(foreign)
____________________________________________________________________________
-> foreign = Domestic
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+--------------------------------------------------------------
weight | 52 3317.115 96.4296 3123.525 3510.706
____________________________________________________________________________
-> foreign = Foreign
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+--------------------------------------------------------------
weight | 22 2315.909 92.31665 2123.926 2507.892
Let me get the same results as, -predictnl- more directly by using foreign and
domestic indicator variables directly in -regress-.
. gen domestic = ! foreign
. regress weight for domestic, noconstant
[...]
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 2315.909 134.3649 17.24 0.000 2048.058 2583.761
domestic | 3317.115 87.39676 37.95 0.000 3142.893 3491.338
------------------------------------------------------------------------------
We see that the 95% CIs from regress match those from -predictnl- after
regress, as they should. Now, however, it is easier to see why the CIs are
different. -ci- with the -by()- option assumed independent samples for
domestic and foreign, one with 22 observations and one with 52 observations,
and it also assumed that two variances were to be estimated, one for domestic
and the other for foreign. -regress-, on the other hand, assumed a single
variance was to be estimated and that variance had 72 degrees of freedom. In
the parlance of regression, the -ci- estimates of variance allowed for
heteroskedasticity across the domestic and foreign groups, while -regress- did
not. Basically, we make different assumptions when using -regress- than when
using -ci, by()-.
-- Vince
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/