Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Using over() option in -margins- command
From
[email protected] (Jeff Pitblado, StataCorp LP)
To
[email protected]
Subject
Re: st: Using over() option in -margins- command
Date
Thu, 17 Mar 2011 14:56:03 -0500
Thomas Weichle <[email protected]> asks about -margins- with a single
variable in the -over()- option compared to using the variable to margin on:
> Using the auto dataset, suppose I generate the following indicator
> variables.
>
> generate weight_gt_50ptile = 0
> replace weight_gt_50ptile = 1 if weight > 3190
>
> generate length_gt_25ptile = 0
> replace length_gt_25ptile = 1 if length > 170
>
> Then, I run the following logistic regression model.
>
> * Case A:
> logistic foreign i.weight_gt_50ptile i.length_gt_25ptile
> estimates store auto
>
> I would like to estimate average predicted probabilities using
> -margins-. Suppose I run margins (1) and (2) below.
>
> * (1)
> margins weight_gt_50ptile#length_gt_25ptile, predict(pr)
>
> * (2)
> estimates restore auto
> margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile)
>
> **** Note: Results obtained from (1) are identical to (2)
>
> Suppose I run a similar model but include the variables mpg and price.
>
> * Case B:
> logistic foreign i.weight_gt_50ptile i.length_gt_25ptile mpg price
> estimates store auto2
>
> Again, I would like to estimate average predicted probabilities using
> -margins-. Suppose I run margins (3) and (4) below (syntax identical to
> (1) and (2)).
>
> * (3)
> margins weight_gt_50ptile#length_gt_25ptile, predict(pr) post
>
> * (4)
> estimates restore auto2
> margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile) post
>
> **** Note: In this case, results obtained from (3) and (4) are not
> identical to eachother.
>
> In Case A, the margins are identical using "margins X#Y" compared to
> "margins Y, over(X)".
> In Case B, the margins are different using "margins X#Y" compared to
> "margins Y, over(X)".
>
> My question is the following: In Case B, why are the margins different?
> It appears as if the over() option is the cause for this difference but
> I am unable to understand its reasoning.
The -over()- option identifies groups of observations from which -margins-
will compute the requested predictive margins. So in (2)
. margins length_gt_25ptile, predict(pr) over(weight_gt_50ptile)
-margins- computes the margins of the predicted probabilities at each level of
'length_gt_25ptile' within each group of observations identified by
'weight_gt_50ptile'. This is generally not the same as (1)
. margins weight_gt_50ptile#length_gt_25ptile, predict(pr)
which computes the marginal predicted probabilities at each level combination
of 'length_gt_25ptile' and 'weight_gt_50ptile', and each margin is computed
using the entire estimation sample.
If 'length_gt_25ptile' and 'weight_gt_50ptile' are not interacted in the model
and there are no other independent variables in the model, then -margins-
merely produces the predicted probability of a positive outcome at each level
combination of these two factor variables.
In (1) both 'length_gt_25ptile' and 'weight_gt_50ptile' must be present in the
model specification as factor variables predicting the outcome.
In (2) 'weight_gt_50ptile' doesn't have to be a predictor in the model, it
merely identifies groups of observations from which to compute predictive
margins.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/