Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Encode/destring

From   Joseph Coveney <[email protected]>
To   Statalist <[email protected]>
Subject   Re: st: Encode/destring
Date   Sat, 25 Feb 2006 20:58:56 +0900

Giorgia Maffini wrote:

is anybody aware of how I could solve the following problem:
I am working with a panel of more than 70,000 firms.
When running FE and RE I need to specify the panel unit (firms in my
dataset). The panel unit has to be recorded a numeric variable, as I

In my data the firm idendifier is a STRING variable with both numbers and
letters. Example: firm with identifier FR12345 is different from firm with
identifier GB12345.

1) it is difficult to track down all the characters present in the firm
identifier variable
2) Different firms will get the same id number. Example: FR12345 and

I used ENCODE but I got the following error message (134): You attempted to
encode a string variable that takes on more than 65,536 unique values.


First, generate a numeric variable that takes the value one at the first
observation of a (sorted) panel unit, and zero at all succeeding
observations of that panel unit.  Then -sum()- the numeric variable across
the dataset.  The technique is illustrated below with dummy data of about
150 000 panel units.

Joseph Coveney

set more off
set seed `=date("2006-02-25", "ymd")'
set obs 150000
generate str panel_unit = string(uniform(), "%19.18g")
* Begin here
bysort panel_unit: generate byte panel_number = _n == 1
replace panel_number = sum(panel_number)

*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index