Michael,
Thanks a lot. I'll do that.
Louis
I guess you need to look at the help for collapse some more. -collapse-
calculates whatever summary statistics you specify for each unique
combination of the by variable list and then collapses the dataset to one
observation for each of these unique combinations. It will, by design,
always result in the by varlist becoming a unique identifier.
Michael Blasnik
[email protected]
----- Original Message ----- From: "louis boakye-yiadom"
<[email protected]>
To: <[email protected]>
Sent: Tuesday, March 08, 2005 10:48 AM
Subject: st: Making sure identifiers are unique
Dear all,
I've been trying to determine the identifiers of a data set, and to ensure
they're unique. Suspecting the variables, "region" and "district" are the
identifiers, I gave the commands below, and got the output shown:
. sort region district
. by region district: assert _N==1
62 contradictions in 97 by-groups
assertion is false
r(9);
Owing to the fact that I'm more interested in the "district"-level data, I
wanted to know whether a collapsed version of the data will have unique
identifiers. I therefore gave the following set of commands and got the
results shown:
. gen x=1
. collapse (count) x, by (region district)
. sort region district
. by region district: assert _N==1
My question is: What can account for the collaped data being uniquely
identified by "region" and "district", whilst the original data are not?
I'm using version 8.2.
Many thanks,
Louis
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/