Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Regression Discontinuity (RD) Designs, sharp discontinuity: basic question about implementation with "rd"
From
Stefano Lombardi <[email protected]>
To
[email protected]
Subject
Re: st: Regression Discontinuity (RD) Designs, sharp discontinuity: basic question about implementation with "rd"
Date
Wed, 12 Oct 2011 02:13:56 +0200
Dear Austin,
Thank you very much for the reply. Here there are some additional
information about the dataset.
About the forcing variable:
"ten_cat" is measured in months (12 - 58). The last 5 categories
are full of missing values.
alternatively, "tenure" is the same variable measured in days. I
would want to use this one choosing the correct bandwidth.
Just to have a rough idea of the data, here it is the the table of the
frequencies of "ten_cat":
. tabdisp ten_cat, cell(freq cumfreq)
----------------------------------
job |
tenure |
categorie |
s | freq cumfreq
----------+-----------------------
13 | 14296 14296
14 | 13989 28285
15 | 13564 41849
16 | 12595 54444
17 | 11629 66073
18 | 11269 77342
19 | 9735 87077
20 | 9441 96518
21 | 8897 105415
22 | 8426 113841
23 | 7735 121576
24 | 7407 128983
25 | 5672 134655
26 | 5451 140106
27 | 5486 145592
28 | 5224 150816
29 | 5041 155857
30 | 4631 160488
31 | 4516 165004
32 | 4277 169281
33 | 4049 173330
34 | 4059 177389
35 | 4190 181579
36 | 3601 185180
37 | 2938 188118
38 | 2937 191055
39 | 3006 194061
40 | 2790 196851
41 | 2680 199531
42 | 2609 202140
43 | 2417 204557
44 | 2414 206971
45 | 2257 209228
46 | 2221 211449
47 | 2300 213749
48 | 1725 215474
49 | 1682 217156
50 | 1809 218965
51 | 1730 220695
52 | 1602 222297
53 | 1579 223876
54 | 1464 225340
55 | 1486 226826
56 | 1458 228284
57 | 1384 229668
58 | 1375 231043
----------------------------------
Severance pay takes two possible values: people are treated at tenure =
1094 (days) or at ten_cat = 36 (months). What I expect is that after the
cut-off the mean of nonemployment duration (y_bar, in days) raises.
Notice however that severance pay is generally delivered within one
month of job termination, but I have not information about the exact
moment in wich the sum of money is paid.
Since I have the forcing variable both in months and in days, I have
plotted the following graphs:
- y_bar VS tenure: the scatterplot is quite dispersed around the
threshold but it is clearly evident a decreasing trend before the cut
off, then an increasing trend starting from the right of the cutoff. By
including a straight interpolating line to the left and one to the right
of the cut-off, the average treatment effect is of about 9.5 days.
- y_bar VS ten_cat: there is a clear jump between 36 and 37 (y_bar is
respectively 148 and 161). After the jump the observations stay steadily
higher than the ones to the left of the cut-off.
From the regression you told me to do (using either ten_cat or tenure)
comes out a R^2 = 1, with the dummy that explains the entire variation
of severance payment.
Using Z in days and running rd nonedur Z, bdep the problem seems
overcame (I don't know why, anyway)! I get:
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.
Assignment variable Z is Z
Treatment variable X_T unspecified
Outcome variable y is nonedur
Estimating for bandwidth 14.14255035704279
Estimating for bandwidth 7.071275178521395
Estimating for bandwidth 28.28510071408558
------------------------------------------------------------------------------
nonedur | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
lwald | 30.76441 8.9709 3.43 0.001 13.18177
48.34705
lwald50 | 34.90172 14.25218 2.45 0.014 6.967965
62.83548
lwald200 | 23.17764 6.553702 3.54 0.000 10.33262
36.02265
------------------------------------------------------------------------------
With bandwidth 7.1 and 14 the estmated effect is not precise, I would go
for the third one. However, since I have many observations close to the
cut-off, probably I could also restrict the window of the observations
considered through the "n(real)" option. Is that sensible?
Also, if I plot the graph though the option "gr" it is not informative:
all the oservations are plotted (basically the entire graph is
completely full of dots) and not the means of nonendur. Also, the X-axis
range is the entire forcing variable range, but I just want a "zoom"
near the cut-off (let's say, between 950 and 1150). I probably have to
work with "scopt", but how exactly?
Thank you very much!!
Stefano
Il 11/10/2011 19:43, Austin Nichols ha scritto:
Stefano Lombardi<[email protected]>:
Apparently there is a problem in your data; if you give us information
about the actual data, maybe we can diagnose it.
Is ten_cat measured in days, so that it takes on a larger number of
discrete values, many of which are close to the threshold, or does it
take on a small number of discrete values?
Does sevpay take on one of two possible values, or is it more continuous?
What happens when you regress sevpay on z=(ten_cat-36) and a dummy for
z>=0 (ten_cat>=36), and their interaction?
What happens when you type
g z=ten_cat-36
rd nonedur z, bdep
?
The bandwidth calculations assume the data far from the cutoff have
NOT "already been manually eliminated" as you have done, so you may
want to clarify how you want to estimate the optimal bandwidth.
On Tue, Oct 11, 2011 at 1:12 PM, Stefano Lombardi
<[email protected]> wrote:
Hi Ariel,
thank you very much for your interest. You got the correct interpretation
for X and the cut-off as well.
With respect to the treatment ("severance payment"), I wrote a bit
confusingly. The "job tenure" variable is sharply discontinuos at month 36,
in the sense that if a person is laid off after having worked for 13 or 14
or ... 35 months in the same place, he is not going to receive any sort of
lump-sum payment. Otherwise, if one works for 36 months or more and is laid
off, then the employer is obliged to immediately pay him a fixed amount of
money (three months of salary of the job just lost).
Hence, every person in my dataset has been laid off, but only someone will
receive the lump-sum severance payment (with probability 1 after 36 moths of
job tenure). The thing which probably can make some confusion is that I am
not considering any unemployment benefit (which starts at a certain point
and then continue to be received over time), but a "one-time" payment.
Also, we are interested in knowing whether this kind of treatment affects
the duration unemployment (the "nonemployment" duration, which goes from the
layoff to the start of the new job).
You are completely right: job position could be a very important issue. But
the dataset is quite homogenous from this point of view. In any case, in the
hypotheses checking part of the work I have graphically considered whether
there is a "jump" at the threshold of this variable. So you are right, but I
can still check if there is a violation of the continuity assumption at the
threshold, and actually (at least from a graphical point of view) there is
not evidence of that.
Same reasoning for the previous job salary level. Since the severance
payment equals three months of the last job, the size of the payment is not
the same for every one who receives it. But again, the previous salary range
is not very wide. There are indeed some extreme cases in both directions,
but from a graphical point of view the "previous salary" variable passes
quite smoothly through the cut-off.
One main concern could be that employers fire more people "just on the left"
of the 36 months cut-off (in order to elude the compulsory payment). But
this is not the case: the number of layoffs (vs the previous job tenure)
does not change much at the threshold. For people more used with the labor
economics framework, my dataset is quite comparable with the one of the
David Card's work of 2007. Of course a certain dose of critic is always
necessary, but I consider that a very good work, and I wanted to start from
that one.
Actually, none of the other variables that could give some problems at the
threshold seem to be discontinuous at the threshold. Hence I would have
liked to proceed with the "rd" command, but I really cannot understand what
is the syntax/input problem.
Basically, on the y axis I want the mean nonemployment duration (in days),
while on the X axis I want the job tenure in months. Hence I computed the
mean of y conditioned to X. I did through:
egen cond_mean_y = mean(nonedur), by(ten_cat)
Now I have for each job tenure month between 13 and 52 the correspondent
mean of the nonemployment duration (and I can easily make the plot). But
then why "rd" does not returns the same? Where I got wrong?
I believe that "rd" should "automatically" do it by (1) including "job
tenure" in days, and (2) choosing the correct bandwidth. The first thing
that I tried was to include the forcing variable as continuous, but I
couldn't manage to have a graph as I mentioned in the above paragraph..
And apart from the graph itself, I am clearly making some kind of error
somewhere in the "rd" command, since I receive the error which i reported in
the last post. It is also clear that the error is due to my ignorance, but
how can I solve this problem?
Thank you very much,
Stefano
I clearly have to make Stata considers just points near to the cut-off in
order to estimate the jump. However, without expliciting that, I think that
Stata should do it by itself. About the bandwidth, if I am not wrong, Stata
chooses the optimal one and also tries two others.
I do not understand
d if I hav eto insert the average
Il 11/10/2011 17:09, Ariel Linden, DrPH ha scritto:
Hi Stefano,
I am a bit confused by your variables. If I understand correctly, your X
variable is previous job tenure which is ranges from 0-52 months and your
cutoff is 36. However, your "treatment" is whether a person gets
severance,
which, I am assuming can be at any point along the X variable continuum?
In the RD design, the cutoff is the treatment assignment, so to make it
work, you'd have to have everyone at or above 36 months receive severance
and everyone below 36 months not receive severance. I am not sure that is
what you have done here?
I am not an economist (I don’t even play one on television), but I am not
sold on the premise that length of previous tenure is associated the
outcome
variable (unless it is mediated vis-à-vis the severance). I also assume
that
the size of the severance will be associated with the Y variable, and may
or
may not have a strong independent association with the X variable (the
recent CEO of HP just got fired after a year on the job and got a
multi-million dollar severance). Thus, the type of position (or perhaps
salary level of previous job) will moderate the relationship.
Therefore, I am not sure you have the right variables, or the right
modeling
approach here. Perhaps you should consider switching to a mediation
(controlling for moderators) approach, or perhaps a time series approach
with two or three variables, (a) length of previous job tenure, (b) length
of time unemployed thereafter, (c) relative size of severance?
I hope this helps
Ariel
Date: Mon, 10 Oct 2011 21:15:37 +0200
From: Stefano Lombardi<[email protected]>
Subject: st: Regression Discontinuity (RD) Designs, sharp discontinuity:
basic question about implementation with "rd"
Hello everybody,
I have a big problem in computing a sharp regression discontinuity
design via the "rd" function. I have read a number of papers about the
underlying theory, but I cannot carry out even a very basic RD design..
Unfortunately I found very little information on Statalist and on the
whole Internet as well.. Could you please give a hand? Every comment
would be tremendously helpful. Here is my (labor economics) setting:
"tenure_cat": discrete forcing variable, Z = last job tenure (in
months = 13, 14, ..., 52)
"severance": treatment, X_T = lump-sum severance payment
"nonendur": outcome, y = non-employment duration (days between the
layoff and the start of the new job)
The cut-off is at Z_0 = 36 months (after three years of job tenure, a
person who is laid off is going to receive a severance payment with
probability 1).
Does the severance payment cause a variation in the job search?
I also have "mean_nonedur" = "nonedur" mean conditioned on "tenure_cat"
(basically the mean of y for each month between 13 to 52)
My aim is to set a RD design with the mean nonemployment duration in
days against Z in months. My first best would be to estimate the outcome
gap through a second or higher order polynomial. All the data "far" from
the cut-off have already been manually eliminated, hence I simply need
to run the RD design with all the available data.
As very first step, I simply tried to run the following command:
. rd nonedur sevpay ten_cat, z0(36)
Three variables specified; jump in treatment
at Z=36 will be estimated. Local Wald Estimate
is the ratio of jump in outcome to jump in treatment.
Assignment variable Z is ten_cat
Treatment variable X_T is sevpay
Outcome variable y is nonedur
Estimating for bandwidth 9.826534218815946
A predicted value of treatment at cutoff lies outside feasible range;
switching to local mean smoothing for treatment discontinuity.
score variables for model __00000P contain missing values
r(322);
Probably is nonsense, but I also tried to run the same command with
"mean_nonedur" instead of "nonedur".. same result from Stata.
Could you give me any suggestion about this issue? Is there something
related to the bandwidth choice?
Thank you very much,
Stefano Lombardi
*
* For searches and help try:
*http://www.stata.com/help.cgi?search
*http://www.stata.com/support/statalist/faq
*http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/