Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
There is a much more direct answer to his shorter question. See SJ-4-4 dm0009 . . . . . . . Stata tip 14: Using value labels in expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Higbee Q4/04 SJ 4(4):488--489 (no commands) tips for using value labels in expressions http://www.stata-journal.com/sjpdf.html?articlenum=dm0009 Nick On Thu, Jan 13, 2011 at 6:15 AM, Sergiy Radyakin <serjradyakin@gmail.com> wrote: > Ben describes a situation where two variables are obtained as a result of > the encoding corresponding string variables. Stata automatically generated > two systems of values labeling, which both overlap in their codes, even > though the labels are different. > > When values of either of these two variables are put into the third one, > neither of the labeling system applies, and the situation really reminds the > classical "apples and oranges" basket. I will proceed under the assumption > that the mixture problem is ruled out and the resulting variable makes sense. > > Stata's language is used in the following code, Mata can be more convenient > and efficient. > > Here is an example: > > . clear > . input str30 xs str30 ys price > xs ys price > 1. "North" "Red" 50 > 2. "South" "Green" 30 > 3. "East" "Blue" 48 > 4. "West" "Yellow" 24 > 5. "North" "Green" 36 > 6. "North" "Blue" 85 > 7. "South" "Red" 75 > 8. end > . encode xs, generate(x) > . encode ys, generate(y) > . generate zs = cond(price>45,"x","y") > . generate z = cond(price>45,x,y) > . list, clean > xs ys price x y zs z > 1. North Red 50 North Red x 2 > 2. South Green 30 South Green y 2 > 3. East Blue 48 East Blue x 1 > 4. West Yellow 24 West Yellow y 4 > 5. North Green 36 North Green y 2 > 6. North Blue 85 North Blue x 2 > 7. South Red 75 South Red x 3 > > Here zs shows where the value is coming from and z the value of the > new variable itself. When the value is coming from variable x the label > from the x's labeling system must be used, and similarly for y. > > The problem stems from the fact that the two labeling systems were > created independently from one another, and thus may not be used > in such an operation. > > If the original string data is still available, Ben may change the above > program to make the two encodings dependent on each other, by > forcing Stata to use the same labeling system for both of them. > The encode command provides a convenient option -label- exactly > for this purpose. > > . clear > . input str30 xs str30 ys price > xs ys price > 1. "North" "Red" 50 > 2. "South" "Green" 30 > 3. "East" "Blue" 48 > 4. "West" "Yellow" 24 > 5. "North" "Green" 36 > 6. "North" "Blue" 85 > 7. "South" "Red" 75 > 8. end > . encode xs, generate(x) > . encode ys, generate(y) label(x) > . generate zs = cond(price>45,"x","y") > . generate z = cond(price>45,x,y) > . label values z x > . list, clean > xs ys price x y zs z > 1. North Red 50 North Red x North > 2. South Green 30 South Green y Green > 3. East Blue 48 East Blue x East > 4. West Yellow 24 West Yellow y Yellow > 5. North Green 36 North Green y Green > 6. North Blue 85 North Blue x North > 7. South Red 75 South Red x South > > > See that in the z variable the directions names are used when the > values are taken from the variable x and colors names are used when > the values are taken from the variable y. > > We further confirm that labels are applicable to this variable: > . label list > x: > 1 East > 2 North > 3 South > 4 West > 5 Blue > 6 Green > 7 Red > 8 Yellow > > If the original string information is not available anymore, Ben can > adjust the values and the labels of one variable in such a way that they > do not overlap with values of the other variable. > > For values, often adding a constant, e.g. 10000 is sufficient. > It's slightly more difficult with the labeling system. But here is an example: > > . clear > . input str30 xs str30 ys price > xs ys price > 1. "North" "Red" 50 > 2. "South" "Green" 30 > 3. "East" "Blue" 48 > 4. "West" "Yellow" 24 > 5. "North" "Green" 36 > 6. "North" "Blue" 85 > 7. "South" "Red" 75 > 8. end > . encode xs, generate(x) > . encode ys, generate(y) > . list, clean > xs ys price x y > 1. North Red 50 North Red > 2. South Green 30 South Green > 3. East Blue 48 East Blue > 4. West Yellow 24 West Yellow > 5. North Green 36 North Green > 6. North Blue 85 North Blue > 7. South Red 75 South Red > . preserve > . uselabel x y, clear > . assert trunc==0 > . sum value if lname=="x" > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > value | 4 2.5 1.290994 1 4 > . local shift = r(max)+1 > . replace value=value+`shift' if lname=="y" > (4 real changes made) > . drop lname trunc > . forvalues i=1/`=_N' { > 2. label define zlab `=value[`i']' `"`=label[`i']'"', modify > 3. } > . tempfile lbl > . label save zlab using `lbl' > file C:\Users\Username\AppData\Local\Temp\ST_0000000q.tmp saved > . restore > . replace y=y+`shift' > (7 real changes made) > . label drop y // old labels not applicable anymore > . do `lbl' > . label define zlab 1 `"East"', modify > . label define zlab 2 `"North"', modify > . label define zlab 3 `"South"', modify > . label define zlab 4 `"West"', modify > . label define zlab 6 `"Blue"', modify > . label define zlab 7 `"Green"', modify > . label define zlab 8 `"Red"', modify > . label define zlab 9 `"Yellow"', modify > end of do-file > . generate zs = cond(price>45,"x","y") > . generate z = cond(price>45,x,y) > . label values z zlab > . list, clean > xs ys price x y zs z > 1. North Red 50 North 8 x North > 2. South Green 30 South 7 y Green > 3. East Blue 48 East 6 x East > 4. West Yellow 24 West 9 y Yellow > 5. North Green 36 North 7 y Green > 6. North Blue 85 North 6 x North > 7. South Red 75 South 8 x South > . label list > zlab: > 1 East > 2 North > 3 South > 4 West > 6 Blue > 7 Green > 8 Red > 9 Yellow > x: > 1 East > 2 North > 3 South > 4 West > end of do-file > > Note that the resulting listing is the same as with the first case. > As always with Stata, there are other (and more efficient) ways of > getting the same result. > > Finally to answer your question regarding obtaining the value based on > the label. > > If you know the code, the label is obtained instantly: > . sysuse auto > . di `"`:label origin 1'"' > Foreign > . di `"`:label (foreign) 1'"' > Foreign > > See help for extended macro functions. > > If you need an inverse transformation, i.e. obtaining the values by > labels, you may do: > . sysuse auto > . uselabel origin > . sum value if label=="Foreign", meanonly > . di r(mean) > 1 > > If you don't want to destroy the data, then loop through all labeled > values, and use > `:label labelname labeledvalue' to check if it matches the label you > are looking for. > > I would personally prefer Mata, which simplifies the matter a lot: > > .mata st_local("v",strofreal(st_vlsearch("origin", "Foreign"))) > .display `v' > 1 > > Hope this helps. > On Wed, Jan 12, 2011 at 8:24 PM, Ben Hoen <bhoen@lbl.gov> wrote: >> I have two variables (e.g. x and y) with value labels (e.g. xlab and ylab). >> Both x and y were created via .encode (they were derived from string >> variables) so they overlap in their underlying values but the value labels >> are unique between x and y and within x and y. >> >> I would like to create a new variable z which would take on the value labels >> of x (xlab) for some cases and y (ylab) for other cases based on an >> expression. I do not care what the underlying values are, only that the >> value labels are correctly assigned? >> >> Suggestions? >> >> Somewhat relatedly, is there a quick way on the command line to display the >> underlying value of a variable if one only has the label? I imagine >> something like: .display y if ylab="something" >> Ben Hoen Principal Research Associate Lawrence Berkeley National Laboratory Office: 845-758-1896 Cell: 718-812-7589 bhoen@lbl.gov * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/