Giorgia Maffini wrote:
is anybody aware of how I could solve the following problem:
I am working with a panel of more than 70,000 firms.
When running FE and RE I need to specify the panel unit (firms in my
dataset). The panel unit has to be recorded a numeric variable, as I
understand.
In my data the firm idendifier is a STRING variable with both numbers and
letters. Example: firm with identifier FR12345 is different from firm with
identifier GB12345.
I used DESTRING-IGNORE but
1) it is difficult to track down all the characters present in the firm
identifier variable
2) Different firms will get the same id number. Example: FR12345 and
GB12345.
I used ENCODE but I got the following error message (134): You attempted to
encode a string variable that takes on more than 65,536 unique values.
--------------------------------------------------------------------------------
First, generate a numeric variable that takes the value one at the first
observation of a (sorted) panel unit, and zero at all succeeding
observations of that panel unit. Then -sum()- the numeric variable across
the dataset. The technique is illustrated below with dummy data of about
150 000 panel units.
Joseph Coveney
clear
set more off
set seed `=date("2006-02-25", "ymd")'
set obs 150000
generate str panel_unit = string(uniform(), "%19.18g")
*
* Begin here
*
bysort panel_unit: generate byte panel_number = _n == 1
replace panel_number = sum(panel_number)
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/