Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
From
TA Stat <[email protected]>
To
[email protected]
Subject
Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
Date
Sat, 14 Jul 2012 12:51:19 +0700
Thanks everyone for advice. I am figuring out how to collapse some
categories of each variable in a meaningful way for my research
question. I will keep my eyes on additional advice from everyone.
Pete
On Fri, Jul 13, 2012 at 10:12 PM, Austin Nichols
<[email protected]> wrote:
> Ariel and Pete--
> Estimating a logit with dummies is one way to combine across distinct
> combinations of the 15 observables to estimate a propensity score. A
> fully nonparametric propensity score would include every possible
> interaction as well, or simply compute the mean of treatment across
> all cells (possibly millions of cells). If any cells have pscore 0 or
> 1, and some are almost certain to be degenerate in that way, then you
> must combine that cell with another; one way of doing that is using
> the marginal across some subset of categories. The logit with no
> interactions is one particular method of combining across cells.
>
> sysuse auto
> logit foreign i.rep78
> predict p if e(sample)
> egen m=mean(foreign), by(rep78)
> su m p if p<.
> * Note that if you do not restrict using if e(sample)
> * the estimated p=.818 for rep78=1
> * (taken from excl cat rep78=5) when it should be zero.
> ta rep78, mi sum(foreign)
> ta rep78, mi sum(m)
> ta rep78, mi sum(p)
>
> g fakecat=round(mpg,10)
> logit foreign i.rep78##i.fakecat
> predict p2 if e(sample)
> egen m2=mean(foreign), by(rep78 fakecat)
> su m2 p2 if p2<.
>
>
> On Fri, Jul 13, 2012 at 10:19 AM, Ariel Linden, DrPH
> <[email protected]> wrote:
>> Hi Pete,
>>
>> Since estimation of the propensity score is nothing more than a logistic (or
>> probit) regression model, you could leave the categorical variables as-is
>> and use the "i." prefix to denote that they are categorical, such as i.race.
>> The regression output will show you that the levels of the categorical
>> variable have been dealt with accordingly (including if any of the levels
>> are dropped from the model). See for example:
>>
>> sysuse auto
>> logit foreign i.rep78
>>
>> On the other hand, you could certainly create dummy variables for the
>> categorical variable. However, if you have a large number of covariates,
>> your dataset will start looking ugly in a hurry. In any case, your results
>> will be identical:
>>
>> tab rep78, gen(rep78_)
>> logit foreign rep78_1- rep78_5
>>
>> I hope this helps
>>
>> Ariel
>>
>> Date: Fri, 13 Jul 2012 10:06:14 +0700
>> From: TA Stat <[email protected]>
>> Subject: st: Propensity Score Matching with Multiple Categorical Variables
>> with Multiple Categories...Dummy Variables?
>>
>> Dear All
>>
>> In PS matching, I am wondering about how to handle multiple
>> categorical variables e.g. 15 variables. Each variable has multiple
>> categories e.g. 3-5 categories. Do I have to create dummy variables,
>> (n-1 for each variable), for all those categorical variables before
>> calculating propensity score?
>>
>> Thanks
>> Pete
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/