Despite the title, the issue here is one-to-one mapping
from string identifiers to numeric identifiers.
As Giorgia points out, -destring, ignore- is quite wrong for
her problem, as ignoring the non-numeric characters throws away
important information.
Joseph's solution is a reinvention of -egen, group()-.
It shows the logic to follow, but for convenience
you can do it directly:
egen numeric_panel_id = group(string_panel_id)
(Incidentally, keeping track of all the non-numeric
characters in a string variable is not that difficult.
A utility -charlist- on SSC is dedicated to this
small question.)
(Giorgia: the Statalist FAQ explains the Statalist
convention of using -cmdname- to refer to a command
of that name.)
Nick
[email protected]
Joseph Coveney
> First, generate a numeric variable that takes the value one
> at the first
> observation of a (sorted) panel unit, and zero at all succeeding
> observations of that panel unit. Then -sum()- the numeric
> variable across
> the dataset. The technique is illustrated below with dummy
> data of about 150 000 panel units.
>
> clear
> set more off
> set seed `=date("2006-02-25", "ymd")'
> set obs 150000
> generate str panel_unit = string(uniform(), "%19.18g")
> *
> * Begin here
> *
> bysort panel_unit: generate byte panel_number = _n == 1
> replace panel_number = sum(panel_number)
> exit
>
Giorgia Maffini
> I am working with a panel of more than 70,000 firms.
> When running FE and RE I need to specify the panel unit (firms in my
> dataset). The panel unit has to be recorded a numeric variable, as I
> understand.
>
> In my data the firm idendifier is a STRING variable with both
> numbers and
> letters. Example: firm with identifier FR12345 is different
> from firm with identifier GB12345.
>
> I used DESTRING-IGNORE but
> 1) it is difficult to track down all the characters present
> in the firm identifier variable
> 2) Different firms will get the same id number. Example: FR12345 and
> GB12345.
>
> I used ENCODE but I got the following error message (134):
> You attempted to
> encode a string variable that takes on more than 65,536 unique values.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/