Yeah, I goofed. For one thing, I entered the data incorrectly. I was trying to follow what Gregor said he wanted, which I'm not sure I understood or that he wrote down clearly. I fully acknowledge that using the D. operator --which you & Kit suggested-- is probably the way to go.
Nevertheless, I want to try to correct what I did earlier. I added a third observation for one of the state county combinations. I am assuming that Gregor wants a difference in employment from one year to the next within state & county. So here goes:
. sort state county year
. l
+----------------------------------+
| year state county employ~t |
|----------------------------------|
1. | 1 1 1 10 |
2. | 2 1 1 20 |
3. | 3 1 1 22 |
4. | 1 2 1 15 |
5. | 2 2 1 30 |
+----------------------------------+
. bysort state county: gen diff = employment - employment[_n - 1]
(2 missing values generated)
. l
+-----------------------------------------+
| year state county employ~t diff |
|-----------------------------------------|
1. | 1 1 1 10 . |
2. | 2 1 1 20 10 |
3. | 3 1 1 22 2 |
4. | 1 2 1 15 . |
5. | 2 2 1 30 15 |
+-----------------------------------------+
If I understand the tsset stuff at all, that approach would force Gregor to come to terms with any date gaps & duplicate years which my approach glosses over. Is that right?
Eric
>There are two issues here: what to calculate and
>how to do it. Eric's example presumes two
>estimates for each combination of state, county, year
>and wanting to find the difference between them.
>Evidently this could arise, but on the face of it
>I would guess rather at
>
>bysort state county (year) : gen diff = emp - emp[_n-1]
>
>i.e. the difference between each year and the previous.
>
>A more robust approach would be to -tsset-
>
>egen countyid = group(state county), label
>tsset countyid year
>gen diff = D.emp
>
>Nick
>[email protected]
>
>Eric G. Wruck
>
>> You were close but your generate (gen) statement wasn't quite right.
>>
>>
>> . bysort year state county: gen employdiff = employment -
>> employment[_n - 1]
>> (2 missing values generated)
>>
>> . l, noobs
>>
>> +---------------------------------------------+
>> | year state county employ~ employ~f |
>> |---------------------------------------------|
>> | 1 1 1 10 . |
>> | 1 1 1 15 5 |
>> | 2 2 1 20 . |
>> | 2 2 1 30 10 |
>> +---------------------------------------------+
>
>> >My data is structured as follows
>> >
>> >year state county employment
>> >1 1 1 10
>> >2 1 1 20
>> >1 2 1 15
>> >2 2 1 30
>> >...
>> >for 6 years, 50 states, and some counties in each state. I
>> have 1.5 million observations.
>> >
>> >I want to construct a variable that is the difference in
>> employment by year in each state and county.
>> >
>> >I tried
>> >
>> >by year state county, sort: gen newvar =
> > employment-employment[_n-1] but that didn't work.
--
===================================================
Eric G. Wruck
Econalytics
2535 Sherwood Road
Columbus, OH 43209
ph: 614.231.5034
cell: 614.330.8846
eFax: 614.573.6639
eMail: [email protected]
website: http://www.econalytics.com
====================================================
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/