Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Using over() option in -margins- command

From	[email protected] (Jeff Pitblado, StataCorp LP)
To	[email protected]
Subject	Re: st: Using over() option in -margins- command
Date	Thu, 17 Mar 2011 14:56:03 -0500

Thomas Weichle <[email protected]> asks about -margins- with a single
variable in the -over()- option compared to using the variable to margin on:

> Using the auto dataset, suppose I generate the following indicator
> variables.
> 
> generate weight_gt_50ptile = 0
> replace weight_gt_50ptile = 1 if weight > 3190
> 
> generate length_gt_25ptile = 0
> replace length_gt_25ptile = 1 if length > 170
> 
> Then, I run the following logistic regression model.
> 
> * Case A:
> logistic foreign i.weight_gt_50ptile i.length_gt_25ptile
> estimates store auto
> 
> I would like to estimate average predicted probabilities using
> -margins-.  Suppose I run margins (1) and (2) below.
> 
> * (1)
> margins weight_gt_50ptile#length_gt_25ptile, predict(pr)
> 
> * (2)
> estimates restore auto
> margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile)
> 
> **** Note: Results obtained from (1) are identical to (2)
> 
> Suppose I run a similar model but include the variables mpg and price.
> 
> * Case B:
> logistic foreign i.weight_gt_50ptile i.length_gt_25ptile mpg price
> estimates store auto2
> 
> Again, I would like to estimate average predicted probabilities using
> -margins-.  Suppose I run margins (3) and (4) below (syntax identical to
> (1) and (2)).
> 
> * (3)
> margins weight_gt_50ptile#length_gt_25ptile, predict(pr) post
> 
> * (4)
> estimates restore auto2
> margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile) post
> 
> **** Note: In this case, results obtained from (3) and (4) are not
> identical to eachother.
> 
> In Case A, the margins are identical using "margins X#Y" compared to
> "margins Y, over(X)".
> In Case B, the margins are different using "margins X#Y" compared to
> "margins Y, over(X)".
> 
> My question is the following:  In Case B, why are the margins different?
> It appears as if the over() option is the cause for this difference but
> I am unable to understand its reasoning.

The -over()- option identifies groups of observations from which -margins-
will compute the requested predictive margins.  So in (2)

	. margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile)

-margins- computes the margins of the predicted probabilities at each level of
'length_gt_25ptile' within each group of observations identified by
'weight_gt_50ptile'.  This is generally not the same as (1)

	. margins weight_gt_50ptile#length_gt_25ptile, predict(pr)

which computes the marginal predicted probabilities at each level combination
of 'length_gt_25ptile' and 'weight_gt_50ptile', and each margin is computed 
using the entire estimation sample.

If 'length_gt_25ptile' and 'weight_gt_50ptile' are not interacted in the model
and there are no other independent variables in the model, then -margins-
merely produces the predicted probability of a positive outcome at each level
combination of these two factor variables.

In (1) both 'length_gt_25ptile' and 'weight_gt_50ptile' must be present in the
model specification as factor variables predicting the outcome.

In (2) 'weight_gt_50ptile' doesn't have to be a predictor in the model, it
merely identifies groups of observations from which to compute predictive
margins.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Is version 12 coming soon?
Next by Date: Re: st:histogram for weighted data
Previous by thread: st: Using over() option in -margins- command
Next by thread: st: replacing missing time period data with next closest period
Index(es):
- Date
- Thread