Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
From
John Bensin <[email protected]>
To
[email protected]
Subject
st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
Date
Fri, 21 Jun 2013 08:34:26 -0500
I have several datasets that fit this general format (variable names
are in the first row):
varlabels category diff_category category3
"Some label" 0.79221 0.67874 0.70605
"Another label" 0.95949 0.75774 0.031833
"Additional label" 0.65574 0.74313 0.27692
"Fourth label" 0.035712 0.39223 0.046171
"Fifth label" 0.84913 0.65548 0.097132
"Final label" 0.93399 0.17119 0.82346
My goal is to use Stata 12 to get them into this format:
_var1 _var2 _var3 _var4 _var5
_var6 cat
0.792210 0.959490 0.655740 0.035712 0.849130
0.933990 "category"
0.678740 0.757740 0.743130 0.392230 0.655480
0.171190 "diff category"
0.706050 0.031833 0.276920 0.046171 0.097132
0.823460 "category3"
where _var1 is labelled "Some label", _var2 is labelled "Another
label", etc. The datasets are
different and numerous enough that labelling the variables
individually isn't possible. The names
of the category variables aren't predictable either.
My current solution is to improvise with Mata, xpose, and destring/tostring
clear
input str50 varlabels category diff_category category3
"Some label" -18.2059 0.67874 105869.3
"Another label" 0.95949 598.165 0.031833
"Additional label" 0.65574 0.74313 0.27692
"Fourth label" 194 0.39223 0.046171
"Fifth label" 0.84913 394.069 -5968387
"Final label" 0.93399 0.17119 0.82346
end
putmata varlabels, replace
mata
varlabels = varlabels'
end
xpose, clear varname
tostring v*, force usedisplayformat replace
getmata (v*)=varlabels, update force
foreach v of varlist v* {
local name = `v'[1]
lab var `v' "`name'"
}
drop if _varname == "varlabels"
destring, replace
Although this works, I question if there is a better way because I see
a few problems with this:
1. The combination of tostring/destring may cause a loss of
precision, according to Stata's
warnings.
2. The --getmata-- call forces the vector of labels into the
first observation, which seems clunky.
Is it safe to always rely on this working or do I risk
overwriting data?
I also pursued a solution using the --sxpose-- command from the SSC:
clear
input str50 varlabels category diff_category category3
"Some label" 0.79221 0.67874 0.70605
"Another label" 0.95949 0.75774 0.031833
"Additional label" 0.65574 0.74313 0.27692
"Fourth label" 0.035712 0.39223 0.046171
"Fifth label" 0.84913 0.65548 0.097132
"Final label" 0.93399 0.17119 0.82346
end
sxpose, clear force
foreach v of varlist _var* {
local name = `v'[1]
lab var `v' "`name'"
}
This costs me the names of the category variables, which I need to
preserve; it yields data that look like
this:
_var1 _var2 _var3 _var4
_var5 _var6
Some label Another label Additional label Fourth
label Fifth label Final label
0.792209983 0.959490001 0.655740023 0.035712
0.849129975 0.933990002
0.678740025 0.757740021 0.743130028
0.392230004 0.655480027 0.171189994
0.706049979 0.031833001 0.276919991
0.046170998 0.097131997 0.823459983
Any suggestions? I'm happy to continue with my Mata/xpose solution if
it won't cost me data, but if there's a
better solution I'll use it.
John Bensin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/