Indeed. Your example is well behaved, but
as we all know, real data need not be. Also,
it is easy to get confused on the details
of the -by:-, so that for example
bysort state county :
does not _guarantee_ that -year- is in the
right order within -state- and -county-.
Once -tsset-, neither pitfall should catch you.
Nick
[email protected]
Eric G. Wruck
> Yeah, I goofed. For one thing, I entered the data
> incorrectly. I was trying to follow what Gregor said he
> wanted, which I'm not sure I understood or that he wrote down
> clearly. I fully acknowledge that using the D. operator
> --which you & Kit suggested-- is probably the way to go.
>
> Nevertheless, I want to try to correct what I did earlier. I
> added a third observation for one of the state county
> combinations. I am assuming that Gregor wants a difference
> in employment from one year to the next within state &
> county. So here goes:
>
> . sort state county year
>
> . l
>
> +----------------------------------+
> | year state county employ~t |
> |----------------------------------|
> 1. | 1 1 1 10 |
> 2. | 2 1 1 20 |
> 3. | 3 1 1 22 |
> 4. | 1 2 1 15 |
> 5. | 2 2 1 30 |
> +----------------------------------+
>
> . bysort state county: gen diff = employment - employment[_n - 1]
> (2 missing values generated)
>
> . l
>
> +-----------------------------------------+
> | year state county employ~t diff |
> |-----------------------------------------|
> 1. | 1 1 1 10 . |
> 2. | 2 1 1 20 10 |
> 3. | 3 1 1 22 2 |
> 4. | 1 2 1 15 . |
> 5. | 2 2 1 30 15 |
> +-----------------------------------------+
>
>
>
> If I understand the tsset stuff at all, that approach would
> force Gregor to come to terms with any date gaps & duplicate
> years which my approach glosses over. Is that right?
>
>
> Eric
>
>
>
> >There are two issues here: what to calculate and
> >how to do it. Eric's example presumes two
> >estimates for each combination of state, county, year
> >and wanting to find the difference between them.
> >Evidently this could arise, but on the face of it
> >I would guess rather at
> >
> >bysort state county (year) : gen diff = emp - emp[_n-1]
> >
> >i.e. the difference between each year and the previous.
> >
> >A more robust approach would be to -tsset-
> >
> >egen countyid = group(state county), label
> >tsset countyid year
> >gen diff = D.emp
> >
> >Nick
> >[email protected]
> >
> >Eric G. Wruck
> >
> >> You were close but your generate (gen) statement wasn't
> quite right.
> >>
> >>
> >> . bysort year state county: gen employdiff = employment -
> >> employment[_n - 1]
> >> (2 missing values generated)
> >>
> >> . l, noobs
> >>
> >> +---------------------------------------------+
> >> | year state county employ~ employ~f |
> >> |---------------------------------------------|
> >> | 1 1 1 10 . |
> >> | 1 1 1 15 5 |
> >> | 2 2 1 20 . |
> >> | 2 2 1 30 10 |
> >> +---------------------------------------------+
> >
> >> >My data is structured as follows
> >> >
> >> >year state county employment
> >> >1 1 1 10
> >> >2 1 1 20
> >> >1 2 1 15
> >> >2 2 1 30
> >> >...
> >> >for 6 years, 50 states, and some counties in each state. I
> >> have 1.5 million observations.
> >> >
> >> >I want to construct a variable that is the difference in
> >> employment by year in each state and county.
> >> >
> >> >I tried
> >> >
> >> >by year state county, sort: gen newvar =
> > > employment-employment[_n-1] but that didn't work.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/