-sencode- can indeed solve Friedrich's problem (using the -gsort()- option to encode in an arbitrary order). The current version of -sencode- (downloadable from SSC) uses file manipulation only in 2 places:

1. There is an initial -preserve- and a final -restore, not-, in case the user presses -Break- in the middle of executing -sencode-.

2. In order for -sencode- to work if the -label()- option is given as an existing label, -sencode- uses -label save- to save the existing label to a temporary file, and then uses -file- to read that temporary file and find the highest integer with an existing label, so that any additional string values encoded are allocated integers even higher. I couldn't find a better way, at least in Stata 7 or 8, to obtain the highest labelled integer for an existing label.


At 15:46 25/04/2005, Nick Cox wrote (in reply to Friedrich Huebler):

This problem, or at least a relative of
it, can be attacked, I think, using Roger Newson's -sencode-.

His solution includes a certain amount of file manipulation.
In my version of the problem when I looked
at it two years ago I didn't find any need
for that, but I haven't looked closely enough to work
out what aspects of the problem Roger solves that I
don't or indeed vice versa.

There doesn't seem to be a help file for my resulting program,
but the code is a bit more general than yours.

program seqencode, sortpreserve
*! NJC 1.0.0 1 May 2003
        version 8
        syntax varname(string) [if] [in], Generate(str) [ Label(str) Unique ]

        local limit = cond(c(flavor) == "Small", 1000, 65536)

        quietly {
                marksample touse, strok
                count if `touse'
                if r(N) == 0 error 2000

                // variable is new?
                confirm new variable `generate'

                // label is new?
                if "`label'" == "" local label "`generate'"
                capture label list `label'
                if _rc != 111 {
                        di as err "label `label' already defined"
                        exit 110

                if "`unique'" != "" {
                        // each value `touse' mapped to its own -label-
                        replace `touse' = -`touse'
                        sort `touse' `_sortindex'

                        // define labels
                        count if `touse'
                        if `r(N)' > `limit' error 134
                        forval i = 1 / `r(N)' {
                                label def `label' `i' ///
                                        `"`= `varlist'[`i']'"', modify

                        gen long `generate' = _n  if `touse'
                else {
                        // get first occurrences
                        tempvar first
                        bysort `touse' `varlist' (`_sortindex') : ///
                                gen byte `first' = -(_n == 1 & `touse')
                        sort `first' `_sortindex'

                        // define labels
                        count if `first'
                        if `r(N)' > `limit' error 134
                        forval i = 1 / `r(N)' {
                                label def `label' `i' ///
                                        `"`= `varlist'[`i']'"', modify

                        // copy values from first occurrences
                        gen long `generate' = _n  if `touse'
                        bysort `touse' `varlist' (`generate'): ///
                                replace `generate' = `generate'[1]

                compress `generate'

                // assign labels
                label val `generate' `label'
                label var `generate' `"`: variable label `varlist''"'

[email protected]

Friedrich Huebler

> When a string variable is converted to a numeric variable with
> -encode-, the numeric values follow the sort order of the string
> variable. I would like to -encode- a string variable based on the
> sort order of another variable. My original data is like this:
> var   mean
> a     1.5
> b     1.2
> b     1.2
> b     1.2
> c     1.8
> c     1.8
> I would like to create the variable "newvar" like this, using the
> sort order of the variable "mean":
> var   mean   newvar   (label for newvar)
> b     1.2    1        b
> b     1.2    1        b
> b     1.2    1        b
> a     1.5    2        a
> c     1.8    3        c
> c     1.8    3        c
> My solution is shown below. Creating "newvar" itself is simple but
> there must be a better way to assign the labels.
> sort mean
> egen newvar = group(mean)
> lab def newvar 1 "temp"
> levels(newvar), local(levels)
> foreach l of local levels {
>   gen temp = ""
>   replace temp = var if newvar==`l'
>   levels(temp), local(templabel)
>   lab def newvar `l' `templabel', modify
>   drop temp
> }
> lab val newvar newvar
> How can this code be improved? Thank you for your suggestions.

