A problem that comes up occasionally when collapsing data on organizations
is the inability to take the first instance of a variable in the set being
collapsed. This would be most useful when that variable is a string, but I
believe is also of general use. Below is my work around, but perhaps
someone has a better one?
Imagine data on organizations:
id org_name _1935 _1936
1 foo 10 .
1 foo_blah . 12
2 noo 54 55
I have duplicate ids and I want to collapse the data, but I don't want to
lose the name. The non-existent command would be
-collapse (first) org_name (sum) _*, by(id)-
instead
save orig
duplicates drop id, force
keep id org_name
save temp_name, replace
use orig, clear
collapse (sum) _*, by(id)
merge id using temp_name, sort
(first) in collapse could also be used to priviledge the data on one type
of case, i.e. the start date of an organization: - sort id start- then
collapse keeping earliest start date.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/