Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: string variable


From   [email protected]
To   [email protected]
Subject   RE: st: string variable
Date   Tue, 13 Nov 2007 18:10:40 +0100

Thanks Nick and Austin.
I will try the way that Austin suggest.

Anyway id is an important variable when you use panel data because you need a numeric variable.
ex:
iis(id) to set your panel.
Thanks again.
Catia



Quoting Nick Cox <[email protected]>:


I agree that -egen, group()- will get you numeric identifiers
even if you have to give up on the labels. Thanks for your information
on -xtreg-, which raises a question for StataCorp: why this insistence?

The issue with -encode- is a limit on the number of labels allowed.
That limit bites whatever side you try to scale the mountain from.

Nick
[email protected]

Austin Nichols

Nick--
There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
  egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
  list g
to produce correct-looking identifiers, for whatever reason.  Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here.  But a solution from first principles is easy,  I think:

clear
loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
 su numid if strid=="`v'", meanonly
 la def numid `r(max)' "`v'", modify
 }
la val numid numid
codebook numid


On 11/13/07, Nick Cox <[email protected]> wrote:
Austin is right that -egen, group()- will assign integers
1 up. But if -encode- won't play at assigning labels because
there are too many distinct values, then I don't think -labmask-
(or even -egen, group()- with the -label- option) will help
either.

I am still puzzled at the original question. On the face of
it the variable in question is some kind of identifier. It
is difficult to see any sense in which it is better off as
a numeric variable. If there are thousands of distinct values
it would be no use for any kind of modelling, so far as I can imagine.

Nick
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



Catia Nicodemo



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index