|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Problem with -reshape- and value labels
I am having a problem whereby I start out with a data set that has a
number of variables with some different value labels. They
variables' names share a common prefix, and when I reshape the data
to long format, it seems that the value label assigned to the _last_
of the variables is carried to the new variable that equals the
common prefix. For example:
. des
Contains data
obs: 10
vars: 7
size: 160 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
resp1 byte %8.0g boolean 1 resp
resp2 byte %8.0g boolean 2 resp
resp3 byte %8.0g boolean 3 resp
resp4 byte %8.0g boolean 4 resp
resp5 byte %8.0g boolean 5 resp
resp6 byte %8.0g other 6 resp
-----------------------------------------------------------------------------------------------------------
Sorted by: seq
. reshape long resp, i(seq) j(item)
(note: j = 1 2 3 4 5 6)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 10 -> 60
Number of variables 7 -> 3
j variable (6 values) -> item
xij variables:
resp1 resp2 ... resp6 -> resp
-----------------------------------------------------------------------------
. des
Contains data
obs: 60
vars: 3
size: 720 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
item byte %9.0g
resp byte %8.0g other
-----------------------------------------------------------------------------------------------------------
Sorted by: seq item
Note: dataset has changed since last saved
But the real problem arises further on:
<snip> do stuff to resp variable
<end snip>
. reshape wide
(note: j = 1 2 3 4 5 6)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 60 -> 10
Number of variables 3 -> 7
j variable (6 values) item -> (dropped)
xij variables:
resp -> resp1 resp2 ... resp6
-----------------------------------------------------------------------------
. des
Contains data
obs: 10
vars: 7
size: 160 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
resp1 byte %8.0g other 1 resp
resp2 byte %8.0g other 2 resp
resp3 byte %8.0g other 3 resp
resp4 byte %8.0g other 4 resp
resp5 byte %8.0g other 5 resp
resp6 byte %8.0g other 6 resp
-----------------------------------------------------------------------------------------------------------
Sorted by: seq
Notice now that the value label "other" has been spread on to all of
the variables resp1-resp5 that originally had value label "boolean."
This then raises problems because I later attempt to select a group
of variables for some further analyses with:
ds, has(vallabel boolean)
which now comes up empty.
I can't get around this by just moving the resp6 variable earlier in
the data set: its unique value label gets singled out for the
long-format prefix-named variable regardless of where it physically
is in the data set. In fact, the work around seems to be to rename
one of the "boolean" labeled variables to have a name that is
alphabetically last.
That would keep the "boolean" label from getting wiped out, but then
it results in all the variables being so labeled when I reshape back
to wide, so the -ds- command then traps variables that should be
excluded from further analysis. Is there anyway to have -reshape-
restore the original labels?
(Evidently I can just relabel them by hand in this example, but the
real data set I'm working with has several dozen such variables, so
this starts to get impractical.)
I checked the -reshape- section of the manual and I find no mention
of anything about how value labels are handled.
Any help would be appreciated. Thanks in advance.
Clyde Schechter
Albert Einstein College of Medicine
Bronx, New York, USA
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/