Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: merging variables with labels
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: merging variables with labels
Date
Thu, 13 Jan 2011 10:10:13 +0000
Another answer to Ben's longer question is to check out -multencode-
from SSC, which can -encode- several string variables together using a
common set of value labels.
There is a much more direct answer to his shorter question. See
SJ-4-4 dm0009 . . . . . . . Stata tip 14: Using value labels in expressions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Higbee
Q4/04 SJ 4(4):488--489 (no commands)
tips for using value labels in expressions
http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Nick
On Thu, Jan 13, 2011 at 6:15 AM, Sergiy Radyakin <[email protected]> wrote:
> Ben describes a situation where two variables are obtained as a result of
> the encoding corresponding string variables. Stata automatically generated
> two systems of values labeling, which both overlap in their codes, even
> though the labels are different.
>
> When values of either of these two variables are put into the third one,
> neither of the labeling system applies, and the situation really reminds the
> classical "apples and oranges" basket. I will proceed under the assumption
> that the mixture problem is ruled out and the resulting variable makes sense.
>
> Stata's language is used in the following code, Mata can be more convenient
> and efficient.
>
> Here is an example:
>
> . clear
> . input str30 xs str30 ys price
> xs ys price
> 1. "North" "Red" 50
> 2. "South" "Green" 30
> 3. "East" "Blue" 48
> 4. "West" "Yellow" 24
> 5. "North" "Green" 36
> 6. "North" "Blue" 85
> 7. "South" "Red" 75
> 8. end
> . encode xs, generate(x)
> . encode ys, generate(y)
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . list, clean
> xs ys price x y zs z
> 1. North Red 50 North Red x 2
> 2. South Green 30 South Green y 2
> 3. East Blue 48 East Blue x 1
> 4. West Yellow 24 West Yellow y 4
> 5. North Green 36 North Green y 2
> 6. North Blue 85 North Blue x 2
> 7. South Red 75 South Red x 3
>
> Here zs shows where the value is coming from and z the value of the
> new variable itself. When the value is coming from variable x the label
> from the x's labeling system must be used, and similarly for y.
>
> The problem stems from the fact that the two labeling systems were
> created independently from one another, and thus may not be used
> in such an operation.
>
> If the original string data is still available, Ben may change the above
> program to make the two encodings dependent on each other, by
> forcing Stata to use the same labeling system for both of them.
> The encode command provides a convenient option -label- exactly
> for this purpose.
>
> . clear
> . input str30 xs str30 ys price
> xs ys price
> 1. "North" "Red" 50
> 2. "South" "Green" 30
> 3. "East" "Blue" 48
> 4. "West" "Yellow" 24
> 5. "North" "Green" 36
> 6. "North" "Blue" 85
> 7. "South" "Red" 75
> 8. end
> . encode xs, generate(x)
> . encode ys, generate(y) label(x)
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . label values z x
> . list, clean
> xs ys price x y zs z
> 1. North Red 50 North Red x North
> 2. South Green 30 South Green y Green
> 3. East Blue 48 East Blue x East
> 4. West Yellow 24 West Yellow y Yellow
> 5. North Green 36 North Green y Green
> 6. North Blue 85 North Blue x North
> 7. South Red 75 South Red x South
>
>
> See that in the z variable the directions names are used when the
> values are taken from the variable x and colors names are used when
> the values are taken from the variable y.
>
> We further confirm that labels are applicable to this variable:
> . label list
> x:
> 1 East
> 2 North
> 3 South
> 4 West
> 5 Blue
> 6 Green
> 7 Red
> 8 Yellow
>
> If the original string information is not available anymore, Ben can
> adjust the values and the labels of one variable in such a way that they
> do not overlap with values of the other variable.
>
> For values, often adding a constant, e.g. 10000 is sufficient.
> It's slightly more difficult with the labeling system. But here is an example:
>
> . clear
> . input str30 xs str30 ys price
> xs ys price
> 1. "North" "Red" 50
> 2. "South" "Green" 30
> 3. "East" "Blue" 48
> 4. "West" "Yellow" 24
> 5. "North" "Green" 36
> 6. "North" "Blue" 85
> 7. "South" "Red" 75
> 8. end
> . encode xs, generate(x)
> . encode ys, generate(y)
> . list, clean
> xs ys price x y
> 1. North Red 50 North Red
> 2. South Green 30 South Green
> 3. East Blue 48 East Blue
> 4. West Yellow 24 West Yellow
> 5. North Green 36 North Green
> 6. North Blue 85 North Blue
> 7. South Red 75 South Red
> . preserve
> . uselabel x y, clear
> . assert trunc==0
> . sum value if lname=="x"
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> value | 4 2.5 1.290994 1 4
> . local shift = r(max)+1
> . replace value=value+`shift' if lname=="y"
> (4 real changes made)
> . drop lname trunc
> . forvalues i=1/`=_N' {
> 2. label define zlab `=value[`i']' `"`=label[`i']'"', modify
> 3. }
> . tempfile lbl
> . label save zlab using `lbl'
> file C:\Users\Username\AppData\Local\Temp\ST_0000000q.tmp saved
> . restore
> . replace y=y+`shift'
> (7 real changes made)
> . label drop y // old labels not applicable anymore
> . do `lbl'
> . label define zlab 1 `"East"', modify
> . label define zlab 2 `"North"', modify
> . label define zlab 3 `"South"', modify
> . label define zlab 4 `"West"', modify
> . label define zlab 6 `"Blue"', modify
> . label define zlab 7 `"Green"', modify
> . label define zlab 8 `"Red"', modify
> . label define zlab 9 `"Yellow"', modify
> end of do-file
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . label values z zlab
> . list, clean
> xs ys price x y zs z
> 1. North Red 50 North 8 x North
> 2. South Green 30 South 7 y Green
> 3. East Blue 48 East 6 x East
> 4. West Yellow 24 West 9 y Yellow
> 5. North Green 36 North 7 y Green
> 6. North Blue 85 North 6 x North
> 7. South Red 75 South 8 x South
> . label list
> zlab:
> 1 East
> 2 North
> 3 South
> 4 West
> 6 Blue
> 7 Green
> 8 Red
> 9 Yellow
> x:
> 1 East
> 2 North
> 3 South
> 4 West
> end of do-file
>
> Note that the resulting listing is the same as with the first case.
> As always with Stata, there are other (and more efficient) ways of
> getting the same result.
>
> Finally to answer your question regarding obtaining the value based on
> the label.
>
> If you know the code, the label is obtained instantly:
> . sysuse auto
> . di `"`:label origin 1'"'
> Foreign
> . di `"`:label (foreign) 1'"'
> Foreign
>
> See help for extended macro functions.
>
> If you need an inverse transformation, i.e. obtaining the values by
> labels, you may do:
> . sysuse auto
> . uselabel origin
> . sum value if label=="Foreign", meanonly
> . di r(mean)
> 1
>
> If you don't want to destroy the data, then loop through all labeled
> values, and use
> `:label labelname labeledvalue' to check if it matches the label you
> are looking for.
>
> I would personally prefer Mata, which simplifies the matter a lot:
>
> .mata st_local("v",strofreal(st_vlsearch("origin", "Foreign")))
> .display `v'
> 1
>
> Hope this helps.
> On Wed, Jan 12, 2011 at 8:24 PM, Ben Hoen <[email protected]> wrote:
>> I have two variables (e.g. x and y) with value labels (e.g. xlab and ylab).
>> Both x and y were created via .encode (they were derived from string
>> variables) so they overlap in their underlying values but the value labels
>> are unique between x and y and within x and y.
>>
>> I would like to create a new variable z which would take on the value labels
>> of x (xlab) for some cases and y (ylab) for other cases based on an
>> expression. I do not care what the underlying values are, only that the
>> value labels are correctly assigned?
>>
>> Suggestions?
>>
>> Somewhat relatedly, is there a quick way on the command line to display the
>> underlying value of a variable if one only has the label? I imagine
>> something like: .display y if ylab="something"
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/