[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: graph mean and sd over/by category

From	Lee Sieswerda <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: graph mean and sd over/by category
Date	Mon, 10 Feb 2003 18:51:46 -0500

I was writing a reply to George Hoffman's question, when Vince Wiggins reply
popped into my mail box. I have a question for Vince. First, here was what I
was suggesting for George.

bysort foreign: egen mean = mean(weight)
bysort foreign: egen sd = sd(weight)
bysort foreign: gen ub = mean + invttail(_N-1,.025)*(sqrt((sd^2)/_N))
bysort foreign: gen lb = mean - invttail(_N-1,.025)*(sqrt((sd^2)/_N))
twoway (rcap lb ub foreign) (scatter mean foreign)

This gives the same results as -ci-. Specifically, it gives a 95% CI with
the t critical value based on _N-1 observations within strata (of foreign in
this case). It is -ci-'s results that I gather Nick Cox is using to produce
his new -ciplot- (pardon me Nick, if I'm misrepresenting you).

Then, I read your reply suggesting the use of -predictnl-, which is
intriguing. Vince, if I'm understanding your post correctly, I could obtain
a 95% CI for the mean of weight by foreign like so:

regress weight foreign
predictnl yhat=predict(), ci(lb ub)

When I do so, I get the following upper and lower bounds:

. bysort foreign: sum ub lb

____________________________________________________________________________
___
-> foreign = Domestic

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          ub |        52    3491.338           0   3491.338   3491.338
          lb |        52    3142.893           0   3142.893   3142.893

____________________________________________________________________________
___
-> foreign = Foreign

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          ub |        22     2583.76           0    2583.76    2583.76
          lb |        22    2048.058           0   2048.058   2048.058

These differ from what -ci- produces:

. ci weight, by(foreign)

____________________________________________________________________________
___
-> foreign = Domestic

    Variable |        Obs        Mean    Std. Err.       [95% Conf.
Interval]
-------------+--------------------------------------------------------------
-
      weight |         52    3317.115     96.4296        3123.525
3510.706

____________________________________________________________________________
___
-> foreign = Foreign

    Variable |        Obs        Mean    Std. Err.       [95% Conf.
Interval]
-------------+--------------------------------------------------------------
-
      weight |         22    2315.909    92.31665        2123.926
2507.892


Now, I gather that the difference is a results of this message that I
received after running -predictnl-:

. predictnl yhat=predict(), ci(lb ub)
note: Confidence intervals calculated using t(72) critical values.

So, here is the dumb question. For what George is looking for (and many
others I'm sure), should a person be using t critical values based on the
total sample (_N-2=72), or based on the sample within strata (_N-1=21 and
_N-1=51)? 

Thanks (and sorry for the long posting),

Lee

Lee Sieswerda, Epidemiologist
Thunder Bay District Health Unit
999 Balmoral Street
Thunder Bay, Ontario
Canada  P7B 6E7
Tel: +1 (807) 625-5957
Fax: +1 (807) 623-2369
[email protected]
www.tbdhu.com



> -----Original Message-----
> From:	[email protected] [SMTP:[email protected]]
> Sent:	Monday, February 10, 2003 5:55 PM
> To:	[email protected]
> Subject:	Re: st: graph mean and sd over/by category
> 
> Among other things, George Hoffman <[email protected]> asks,
> 
> > [...] fitted curves under scatter plots look beautiful - can the
> > regression coefficients from fplotci or qplotci be captured somehow,
> > as poor-man's curve fit?
> 
> I think George is referring to the -fpfitci- and -qfitci- plot types of 
> -graph twoway-.  If so, he can readily perform the regressions that
> produced
> the graphs.
> 
> -qfitci- just performs a quadratic regression.  If we use the auto data,
> -sysuse auto-, the lines for the graph command,
> 
>       . twoway qfitci mpg weight
> 
> are the predictions of the quadratic fit,
> 
>       . gen weight2 = weight^2
>       . regress mpg weight weight2
> 
> The coefficients can be seen in the output of -regress-, or manipulated in
> the
> usual way through the saved results.
> 
> If George wants to add the predictions, and their CIs to his dataset, he
> can
> type,
> 
>       . predictnl mpg_hat2 = predict() , ci(ci_low ci_high)
> 
> 
> This is a very simple application of -predictnl-, Bobby Gutierrez
> <[email protected]> said more in a prior post, but it lets us get both
> the
> predictions and their CIs with one command.
> 
> We could then get a graph similar to our earlier -twoway qfitci-, by
> typing,
> 
>       . twoway rarea ci_low ci_high weight, sort || line mpg_hat2 weight,
> sort
> 
> which we will immediately think looks ugly and decide to relabel the CI in
> the
> legend, option -legend(label())-, and change the fill color of the CI to
> be
> the standard for our scheme, option -p(ci)-.
> 
>       .  twoway rarea ci_low ci_high weight , sort p(ci) || 
>                 line mpg_hat2 weight , sort legend(label(1 "CI"))
> 
> 
> The -fpfitci- plot type just uses -fracpoly- as the engine to produce the
> fits, much like -regress- is used for the quadratic fit.  For our example,
> the
> corresponding -fracpoly- estimation command is,
> 
>       . fracpoly regress mpg weight
> 
> and we can repeat the rest of the story, or just use -fracplot-, to plot
> the
> fit and CI.
> 
> 
> -- Vince 
>    [email protected]
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: graph mean and sd over/by category
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: gllamm vs xtlogit: repeated obs & dummy variables
Next by Date: st: Referring to a varname, leading to errors
Previous by thread: Re: st: graph mean and sd over/by category
Next by thread: RE: st: graph mean and sd over/by category
Index(es):
- Date
- Thread