I agree that -egen, group()- will get you numeric identifiers
even if you have to give up on the labels. Thanks for your information
on -xtreg-, which raises a question for StataCorp: why this insistence?
The issue with -encode- is a limit on the number of labels allowed.
That limit bites whatever side you try to scale the mountain from.
Nick
[email protected]
Austin Nichols
Nick--
There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
list g
to produce correct-looking identifiers, for whatever reason. Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here. But a solution from first principles is easy, I think:
clear
loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
su numid if strid=="`v'", meanonly
la def numid `r(max)' "`v'", modify
}
la val numid numid
codebook numid
On 11/13/07, Nick Cox <[email protected]> wrote:
Austin is right that -egen, group()- will assign integers
1 up. But if -encode- won't play at assigning labels because
there are too many distinct values, then I don't think -labmask-
(or even -egen, group()- with the -label- option) will help
either.
I am still puzzled at the original question. On the face of
it the variable in question is some kind of identifier. It
is difficult to see any sense in which it is better off as
a numeric variable. If there are thousands of distinct values
it would be no use for any kind of modelling, so far as I can imagine.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/