<>
"Experienced users would want me to underline that any missing values on -employerID- would need consideration."
Difficult indeed, everything depends on what Stefano wants to assume about the missing cases. In the code below, I have included several guys with various degrees of "missingness"...
*************
clear*
input forecast /*
*/ analystID employerID
1 1 1
2 1 1
3 1 1
1 2 1
2 2 1
3 2 2
4 2 2
1 3 3
2 3 4
1 4 .
2 4 5
3 4 .
4 4 5
1 5 6
2 5 .
3 5 7
4 5 .
1 6 .
2 6 .
end
compress
list, noobs /*
*/ sepby(analy)
bys anal (employ): /*
get the last nonmissing
employer, trick from
http://www.stata.com/support/faqs/data/dropmiss.html
*/ egen lastnonmiempl =/*
egen allows expressions for some
of its functions
*/ max(cond(!missing(employ), employ, .))
bys anal:/*
*/ egen miss=/*
*/ total(mi(employ))
replace miss=miss!=0
list, noobs /*
*/ sepby(analy)
bysort analystID (employerID) :/*
*/ drop if employerID[1] /*
*/ == lastnonmiempl[1] /*
additionally: only those w/o
missings on the employer var
*/ & miss==0
list, noobs /*
*/ sepby(analy)
/*
Now it really depends
whether you want to drop
those who did not change jobs
during the "visible" part
of their career. If so, comment
this in:
bysort analystID (employerID) :/*
*/ drop if employerID[1] /*
*/ == lastnonmiempl[1]
*/
/* OR you could give them the
benefit of doubt, assuming
that the missing indicates
a job change. Leave everyting
as it is, then.
You still have to decide
how to go about this business
regarding analyst # 6
who has all missings...
*/
list, noobs /*
*/ sepby(analy)
*************
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Donnerstag, 11. Juni 2009 10:26
An: [email protected]
Betreff: st: RE: dropping observation
The solutions suggested all work with this kind of data and all have a clear logic.
Note that only Tirthankar's and Kieran's would apply as well to a string identifier.
They all involve a constructed extra variable. That can be avoided in this way:
bysort analystID (employerID) : drop if employerID[1] == employerID[_N]
The logic here is that if all values are the same in a group, then the first will equal the last, except that we must sort too.
See also the FAQ
How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html
This may not sound like the same problem, but change != to == and -list- to -drop- and the logic carries over.
Experienced users would want me to underline that any missing values on -employerID- would need consideration.
Nick
[email protected]
Eric A. Booth
==============
bysort analystID: egen max = max(employerID)
bysort analystID: egen min = min(employerID)
drop if max==min
Tirthankar Chakravarty
======================
Using Nick Cox's -egenmore- package (SSC):
/* Spells */
clear
// ssc install egenmore, replace
input forecast_no analystID employerID
1 1 1
2 1 1
3 1 1
1 2 1
2 2 1
3 2 2
4 2 2
1 3 3
2 3 4
end
egen nvalsID = nvals(employerID), by(analystID)
drop if nvalsID==1
list, clean
Howie Lempel
============
Create a variable with the mean absolute deviation from the mean of employer ID for each analyst. This will be 0 if the employer ID never changes.
bysort analystID: egen Demp = mdev(employerID)
Drop observations where the employer ID never changed.
drop if Demp==0
Kieran McCaul
=============
sort analystID employerID
by analystID employerID: gen N1=_N
by analystID: gen N2=_N
drop if N2==N1
Stefano Bonini
==============
I have a huge panel dataset containing analyst forecasts. Each analyst is associated with an employer. Sometimes analyst change employer. I want to restrict my dataset, dropping the observations of analysts that never change employer. The dataset may look like this
forecast# analystID employer ID
1 1 1
2 1 1
3 1 1
1 2 1
2 2 1
3 2 2
4 2 2
1 3 3
2 3 4
In this case I'd nee to drop all observations by analyst 1 because he never changes employer, while keeping those of analysts 2 and 3.
I really cannot figure out the way to do it as visual inspection is just impossible with over 1.2m obs.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/