Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: teffects, caliper, propensity score matching
From
"David M. Drukker" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: teffects, caliper, propensity score matching
Date
Tue, 4 Mar 2014 09:25:50 -0600 (CST)
Scott Cunningham <[email protected]> posted several questions regarding
-teffects psmatch- on Friday, 28 February. We apologize for the delay, one
of us has been traveling.
Here are the short versions of the questions and the answers. We discuss
the details below.
Scott's first question was about how to replicate results from -psmatch2-
using -teffects-. The answer is to use the -ties- option in -psmatch2-.
-psmatch2- drops ties, while -teffects- keeps the ties following the
recommendation of Abadie and Imbens (2006).
Scott's second question was about how to replicate the results from
-psmatch2- using -teffects- with caliper matching. Caliper matching
requires that each observation have a match within the specified caliper
distance. -psmatch2- automatically drops observations for which no match
within the caliper distance can be found. Dropping these observations
changes the population parameter. -teffects- refuses to proceed so that you
can choose how to identify a feasible parameter.
Scott's third question pertains to which treatment level is the base
category for the overlap plot. By default, the first treatment level is the
base category, as discussed below.
We now discuss Scott's questions in detail.
1. My first question is regarding the comparability of teffects psmatch and
psmatch2. I have been unable to successfully replicate psmatch2 results
using teffects. The two seemingly identical commands yield very different
treatment effect estimates. -teffects- gives me an estimate of 730.38, but
psmatch2 age me a return of 951. My understanding is that both used logit to
estimate propensity score, both used nearest neighbor(1) to find nearest
neighbor. So I am at a loss to explain why they are different.
We begin by downloading the data and creating variables that Scott used.
. use http://users.nber.org/~rdehejia/data/nsw_dw.dta
.
. generate double agesq = age*age
. generate double agecubed = age*age*age
. generate double edusq = educ*educ
. generate byte u74 = (re74==0)
. generate byte u75 = (re75==0)
. generate double edure74 = educ*re74
. save nsw_dw, replace
-teffects psmatch- includes all tied matches. To obtain the same results
from -psmatch2- specify the -ties- option. Here is an example
******************* Begin Output********************************************
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, ///
outcome(re78) logit ties neighbor(1) ate
Logistic regression Number of obs = 445
LR chi2(14) = 26.42
Prob > chi2 = 0.0229
Log likelihood = -288.89043 Pseudo R2 = 0.0437
------------------------------------------------------------------------------
treat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.4241147 .4098083 -1.03 0.301 -1.227324 .3790947
agesq | .0147783 .0135348 1.09 0.275 -.0117493 .041306
agecubed | -.0001607 .0001423 -1.13 0.259 -.0004397 .0001182
edusq | .0481057 .0238387 2.02 0.044 .0013828 .0948287
edure74 | .0000127 .0000125 1.02 0.308 -.0000117 .0000371
education | -.9525424 .4257927 -2.24 0.025 -1.787081 -.118004
married | .1694692 .2852625 0.59 0.552 -.3896349 .7285734
nodegree | -.4125117 .3923586 -1.05 0.293 -1.18152 .356497
re74 | -.0001822 .0001407 -1.30 0.195 -.0004579 .0000935
re75 | .0000392 .0000505 0.78 0.438 -.0000598 .0001381
u74 | -.216387 .3851458 -0.56 0.574 -.9712588 .5384849
u75 | -.3428689 .3238441 -1.06 0.290 -.9775917 .2918538
black | -.2541281 .3697488 -0.69 0.492 -.9788225 .4705663
hispanic | -.9218369 .5180363 -1.78 0.075 -1.937169 .0934956
_cons | 9.031523 4.742779 1.90 0.057 -.2641534 18.3272
------------------------------------------------------------------------------
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
-------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
re78 Unmatched | 6349.1435 4554.80112 1794.34238 632.853392 2.84
ATT | 6349.1435 4291.20612 2057.93739 873.982463 2.35
ATU | 4554.80112 6355.40981 1800.60869 . .
ATE | 1907.58803 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 260 | 260
Treated | 185 | 185
-----------+-----------+----------
Total | 445 | 445
******************* End Output********************************************
Note that the estimated ATE is 1907.58803.
Now we replicate the ATE estimate using -teffects-.
******************* Begin Output********************************************
. teffects psmatch (re78) ///
(treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, logit)
Treatment-effects estimation Number of obs = 445
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 8
------------------------------------------------------------------------------
| AI Robust
re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE |
treat |
(1 vs 0) | 1907.588 879.8298 2.17 0.030 183.1534 3632.023
------------------------------------------------------------------------------
******************* End Output********************************************
Specifying the -ties- option on -psmatch2- also causes it to produce the same
estimate for the average treatment effect on the treated (ATET).
(In the command below, we did not specify the -ate- option causing
-psmatch2- to estimate the ATET.)
******************* Begin Output********************************************
.
. * psmatch2 results for the Average treatment effect for the treatment
. * group (here, ATT)
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, ///
outcome(re78) logit ties neighbor(1)
[Output Omitted]
******************* End Output********************************************
produces an estimated ATET of 2057.937.
******************* Begin Output********************************************
. teffects psmatch (re78) ///
(treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, logit), ///
atet vce(iid)
Treatment-effects estimation Number of obs = 445
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 8
------------------------------------------------------------------------------
re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET |
treat |
(1 vs 0) | 2057.937 873.4073 2.36 0.018 346.0906 3769.784
------------------------------------------------------------------------------
******************* End Output********************************************
Here is Scott's second question
2. My second question is regarding caliper matching. I have been
unsuccessful at estimating caliper matching for -teffects- but was able to do
so for -psmatch2- for the same given caliper. I only was successful when I
increased the caliper to 0.1. The code for that is below.
There are observations for which no match can be found within the specified
caliper distance. As mentioned above, -psmatch- drops these observations
and proceeds with the estimation on the remaining subsample. -psmatch2-
generates also generates a variable named _support containing a 0 if a match
is not found for an observation within the specified caliper and 1 if a
match is found.
Here is an example
******************* Begin Output********************************************
. * Caliper matching (0.00001) with psmatch2
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, ///
outcome(re78) caliper(0.00001) logit ties
Logistic regression Number of obs = 445
LR chi2(14) = 26.42
Prob > chi2 = 0.0229
Log likelihood = -288.89043 Pseudo R2 = 0.0437
------------------------------------------------------------------------------
treat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.4241147 .4098083 -1.03 0.301 -1.227324 .3790947
agesq | .0147783 .0135348 1.09 0.275 -.0117493 .041306
agecubed | -.0001607 .0001423 -1.13 0.259 -.0004397 .0001182
edusq | .0481057 .0238387 2.02 0.044 .0013828 .0948287
edure74 | .0000127 .0000125 1.02 0.308 -.0000117 .0000371
education | -.9525424 .4257927 -2.24 0.025 -1.787081 -.118004
married | .1694692 .2852625 0.59 0.552 -.3896349 .7285734
nodegree | -.4125117 .3923586 -1.05 0.293 -1.18152 .356497
re74 | -.0001822 .0001407 -1.30 0.195 -.0004579 .0000935
re75 | .0000392 .0000505 0.78 0.438 -.0000598 .0001381
u74 | -.216387 .3851458 -0.56 0.574 -.9712588 .5384849
u75 | -.3428689 .3238441 -1.06 0.290 -.9775917 .2918538
black | -.2541281 .3697488 -0.69 0.492 -.9788225 .4705663
hispanic | -.9218369 .5180363 -1.78 0.075 -1.937169 .0934956
_cons | 9.031523 4.742779 1.90 0.057 -.2641534 18.3272
------------------------------------------------------------------------------
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
re78 Unmatched | 6349.1435 4554.80112 1794.34238 632.853392 2.84
ATT | 5257.79482 3951.72019 1306.07463 1187.7825 1.10
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
psmatch2: | psmatch2: Common
Treatment | support
assignment | Off suppo On suppor | Total
-----------+----------------------+----------
Untreated | 0 260 | 260
Treated | 130 55 | 185
-----------+----------------------+----------
Total | 130 315 | 445
******************* End Output********************************************
Let's look at _support created by -psmatch2-.
******************* Begin Output********************************************
. label list _support
_support:
0 Off support
1 On support
. count if _support
315
******************* End Output********************************************
-teffects- will refuse to proceed, because there are observations that
violate the specified caliper condition.
******************* Begin Output********************************************
. capture noisily teffects psmatch (re78) (treat age agesq agecubed edusq ///
edure74 education married nodegree re74 re75 u74 u75 black ///
hispanic, logit), atet gen(cstub) caliper(0.00001) vce(iid)
no propensity-score matches for observation 1 within caliper 1e-05; this is not allowed
. list _support in 1
+-------------+
| _support |
|-------------|
1. | Off support |
+-------------+
******************* End Output********************************************
The same occurs for the other caliper values.
In order for the two commands to produce the same results, both the
propensity score model and the matching on the estimated propensity score
must be run on the same sample.
3. I have a question regarding the interpretation in teffects overlap. Am I
correct that the propensity score is being estimated as the probability of
being in the control group (as opposed to the treatment group)? The caption
in the default overlap graph makes it seem that way. This seems like an
innovation -- I have never seen anyone present the propensity score that way
and was just curious why -teffects- does it.
-teffects overlap- computes the propensity score for the first level listed in
e(tlevels), by default. Use the -ptlevel()- option to change this behavior.
We hope that this discussion helps.
-David Drukker -Rich Gates
[email protected] [email protected]
References
----------
Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching
Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.