Riano, Alejandro
>
> I have a huge industrial survey which is a panel dataset. I
> have the id of
> the each firm and the region in which this firm is based.
> I'd like to check
> how many of the firms in this dataset have errors in the
> sense that the
> same id would be associated with a different region and/or
> that a given firm
> would have different year of foundation. (to have an idea
> of the % of errors
> in the database);
> I also want to know which ones are the "problematic" firms.
>
. bysort firm (year) : gen prob1 = year[1] != year[_N]
. bysort id (region) : gen prob2 = region[1] != region [_N]
. list firm id year region if prob1 | prob2
Logic: for example, sort by -firm- and within each -firm- by -year-.
If the last
value of -year- for each -firm- differs from the first, you have
a problem.
FAQ explaining another example and giving further comment at
How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html
Also the same stuff, ad nauseam, at
How to move step by: step. The Stata Journal 2, 86-102.
This is also one problem where -egen, mode()- may be useful. You
let the data decide by majority vote what they really are.
As I recall, there is some generality built into -mode()-
so that it can be used with string variables as well as numeric.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/