The solutions suggested all work with this kind of data and all have a clear logic.
Note that only Tirthankar's and Kieran's would apply as well to a string identifier.
They all involve a constructed extra variable. That can be avoided in this way:
bysort analystID (employerID) : drop if employerID[1] == employerID[_N]
The logic here is that if all values are the same in a group, then the first will equal the last, except that we must sort too.
See also the FAQ
How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html
This may not sound like the same problem, but change != to == and -list- to -drop- and the logic carries over.
Experienced users would want me to underline that any missing values on -employerID- would need consideration.
Nick
[email protected]
Eric A. Booth
==============
bysort analystID: egen max = max(employerID)
bysort analystID: egen min = min(employerID)
drop if max==min
Tirthankar Chakravarty
======================
Using Nick Cox's -egenmore- package (SSC):
/* Spells */
clear
// ssc install egenmore, replace
input forecast_no analystID employerID
1 1 1
2 1 1
3 1 1
1 2 1
2 2 1
3 2 2
4 2 2
1 3 3
2 3 4
end
egen nvalsID = nvals(employerID), by(analystID)
drop if nvalsID==1
list, clean
Howie Lempel
============
Create a variable with the mean absolute deviation from the mean of employer ID for each analyst. This will be 0 if the employer ID never changes.
bysort analystID: egen Demp = mdev(employerID)
Drop observations where the employer ID never changed.
drop if Demp==0
Kieran McCaul
=============
sort analystID employerID
by analystID employerID: gen N1=_N
by analystID: gen N2=_N
drop if N2==N1
Stefano Bonini
==============
I have a huge panel dataset containing analyst forecasts. Each analyst is associated with an employer. Sometimes analyst change employer. I want to restrict my dataset, dropping the observations of analysts that never change employer. The dataset may look like this
forecast# analystID employer ID
1 1 1
2 1 1
3 1 1
1 2 1
2 2 1
3 2 2
4 2 2
1 3 3
2 3 4
In this case I'd nee to drop all observations by analyst 1 because he never changes employer, while keeping those of analysts 2 and 3.
I really cannot figure out the way to do it as visual inspection is just impossible with over 1.2m obs.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/