To recap this thread: In response to my posting where I expressed my wish
to avoid explicit looping over observations to create a value label mapping
between a numeric variable and a string variable, Michael Blasnik suggested
the following approach:
Michael Blasnik wrote
[...]
>
> gen str1 cmd=""
> replace cmd="label define mylab
> "+string(nvar)+char(34)+svar+char(34)+"
> ,modify"
> outsheet cmd using cmd.do, nonames noquote
> drop cmd
>
> Then you can -run cmd-
>
> This approach just builds a string variable that has the label define
> commands and uses char(34) to insert quotes. Of course, if you have
> repetitive values of nvar you should first collapse the
> dataset down to one
> obs for each nvar
Nick Cox had previously suggested
> > > I don't know of anything quite like this, but
> > > for once a looping over observations would seem
> > > to solve the problem:
> > >
> > > local N = _N
> > > forval i = 1/`N' {
> > > local val = naics[`i']
> > > local label = labelnaics[`i']
> > > label def naicslab `val' "`label'" , modify
> > > }
> > >
> > > Nick
> > > [email protected]
> >
Nick already commented on Michael's solution, stating that although the
latter avoids explicit looping over observations, it nevertheless requires
converting a numeric variable to a string, something which Stata must
accomplish one observation at a time.
Michael's line of attack is to concatenate the -label define # "..."- into a
single variable which may then be sent to a text file via -outsheet-. It
works. However, my first reaction to it was that, given (Intercooled)
Stata's 80-character limit for string variables, the line
replace cmd="label define mylab "+string(nvar)+ /*
*/ char(34)+svar+char(34)+",modify"
will impose a significant constraint on the length of the value label
(contained in variable _svar_).
I then realised that by saving the data in tab-delimited format as Michael
did, we can avoid the concatenation altogether since Stata doesn't really
complain when it encounters a tab character in a .do file. The only
drawback I see so far is that we lose 4-character places for the value label
since we must add compound double quotes to the string variable.
To compare the relative efficiency of both solutions, I put together two
separate routines, one using the -forvalue- approach, the other using the
-outsheet- line. I then compared their relative performance (see table
below).
*! maplab1: define label using the mapping between two variables
program define maplab1
syntax varlist(min=2 max=2), [ labname(str) ]
tokenize `varlist'
cap confirm numeric var `1'
if _rc {
di as err "`1' must be a numeric variable"
exit 198
}
cap confirm str var `2'
if _rc {
di as err "`2' must be a string variable"
exit 198
}
if "`labname'"=="" { local labname `1' }
else {
local wc : word count `labname'
if `wc'!=1 {
di as err "labname() invalid"
exit 198
}
}
tempfile labfile
tempvar labdef labnam strvar mod
qui {
gen str8 `labdef' = "lab def "
gen str1 `labnam' = ""
replace `labnam' = "`labname'"
gen str1 `strvar' = ""
replace `strvar' = char(96)+char(34)+ /*
*/ substr(`2',1,76)+char(34)+char(39)
gen str7 `mod' = ",modify"
}
order `labdef' `labnam' `1' `strvar' `mod'
outsheet `labdef' `labnam' `1' `strvar' `mod' /*
*/ using `labfile', nonames noquote
run `labfile'
end
*! maplab2: define label using the mapping between two variables
program define maplab2
syntax varlist(min=2 max=2), [ labname(str) ]
tokenize `varlist'
cap confirm numeric var `1'
if _rc {
di as err "`1' must be a numeric variable"
exit 198
}
cap confirm str var `2'
if _rc {
di as err "`2' must be a string variable"
exit 198
}
if "`labname'"=="" { local labname `1' }
else {
local wc : word count `labname'
if `wc'!=1 {
di as err "labname() invalid"
exit 198
}
}
local N = _N
forval i = 1/`N' {
local val = naics[`i']
local label = labelnaics[`i']
label def naicslab `val' "`label'" , modify
}
end
maplab2 turned out to be much faster! It's running the .do file in in
maplab1 which is time consuming. Here's a quick tabulation
seconds elapsed
----------
obs ('000s) maplab1 maplab2
------------+----------------------
1 | .3 .2
3 | 1.1 1.3
5 | 4.7 2.7
7 | 11.6 4.2
9 | 21.8 5.6
30 | 300.3 21.2
where we have a different value label for each observation.
I'll go with maplab2 then. Thanks all.
Pat Joly
[email protected]
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/