Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
From
Marcello Pagano <[email protected]>
To
<[email protected]>
Subject
Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
Date
Sat, 14 Jul 2012 12:33:12 -0400
We can, Peter. If the Listers want me to, I shall.
You can also vote with your feet; ignore the postings. Do not respond.
m.p.
On 7/14/2012 12:22 PM, Lachenbruch, Peter wrote:
I am also frustrated by anonymous posters. Can we simple block such posts from appearing? Marcello?
________________________________________
From: [email protected] [[email protected]] On Behalf Of Steve Samuels [[email protected]]
Sent: Saturday, July 14, 2012 6:28 AM
To: [email protected]
Subject: Re: st: Propensity Score Matching with Multiple Categorical Variables with Multiple Categories...Dummy Variables?
Please take note of the FAQ section:
• "It is long-standing practice on Statalist that most members, especially the most active members who supply a large fraction of the answers, post using their real names. This is one of the ways in which we show respect to others. So we discourage you from posting from behind fake names or identifiers. Such handles are particularly objectionable if they include the word “Stata” in some way.. "
I would add that "real name" means first and last name.
Steve
[email protected]
On Jul 14, 2012, at 1:51 AM, TA Stat wrote:
Thanks everyone for advice. I am figuring out how to collapse some
categories of each variable in a meaningful way for my research
question. I will keep my eyes on additional advice from everyone.
Pete
On Fri, Jul 13, 2012 at 10:12 PM, Austin Nichols
<[email protected]> wrote:
Ariel and Pete--
Estimating a logit with dummies is one way to combine across distinct
combinations of the 15 observables to estimate a propensity score. A
fully nonparametric propensity score would include every possible
interaction as well, or simply compute the mean of treatment across
all cells (possibly millions of cells). If any cells have pscore 0 or
1, and some are almost certain to be degenerate in that way, then you
must combine that cell with another; one way of doing that is using
the marginal across some subset of categories. The logit with no
interactions is one particular method of combining across cells.
sysuse auto
logit foreign i.rep78
predict p if e(sample)
egen m=mean(foreign), by(rep78)
su m p if p<.
* Note that if you do not restrict using if e(sample)
* the estimated p=.818 for rep78=1
* (taken from excl cat rep78=5) when it should be zero.
ta rep78, mi sum(foreign)
ta rep78, mi sum(m)
ta rep78, mi sum(p)
g fakecat=round(mpg,10)
logit foreign i.rep78##i.fakecat
predict p2 if e(sample)
egen m2=mean(foreign), by(rep78 fakecat)
su m2 p2 if p2<.
On Fri, Jul 13, 2012 at 10:19 AM, Ariel Linden, DrPH
<[email protected]> wrote:
Hi Pete,
Since estimation of the propensity score is nothing more than a logistic (or
probit) regression model, you could leave the categorical variables as-is
and use the "i." prefix to denote that they are categorical, such as i.race.
The regression output will show you that the levels of the categorical
variable have been dealt with accordingly (including if any of the levels
are dropped from the model). See for example:
sysuse auto
logit foreign i.rep78
On the other hand, you could certainly create dummy variables for the
categorical variable. However, if you have a large number of covariates,
your dataset will start looking ugly in a hurry. In any case, your results
will be identical:
tab rep78, gen(rep78_)
logit foreign rep78_1- rep78_5
I hope this helps
Ariel
Date: Fri, 13 Jul 2012 10:06:14 +0700
From: TA Stat <[email protected]>
Subject: st: Propensity Score Matching with Multiple Categorical Variables
with Multiple Categories...Dummy Variables?
Dear All
In PS matching, I am wondering about how to handle multiple
categorical variables e.g. 15 variables. Each variable has multiple
categories e.g. 3-5 categories. Do I have to create dummy variables,
(n-1 for each variable), for all those categorical variables before
calculating propensity score?
Thanks
Pete
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/