Nick--
There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
list g
to produce correct-looking identifiers, for whatever reason. Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here. But a solution from first principles is easy, I think:
clear
loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
su numid if strid=="`v'", meanonly
la def numid `r(max)' "`v'", modify
}
la val numid numid
codebook numid
On 11/13/07, Nick Cox <[email protected]> wrote:
> Austin is right that -egen, group()- will assign integers
> 1 up. But if -encode- won't play at assigning labels because
> there are too many distinct values, then I don't think -labmask-
> (or even -egen, group()- with the -label- option) will help
> either.
>
> I am still puzzled at the original question. On the face of
> it the variable in question is some kind of identifier. It
> is difficult to see any sense in which it is better off as
> a numeric variable. If there are thousands of distinct values
> it would be no use for any kind of modelling, so far as I can imagine.
>
> Nick
> [email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/