Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
From
John Bensin <[email protected]>
To
[email protected]
Subject
Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
Date
Fri, 21 Jun 2013 09:55:49 -0500
That is much simpler; thank you for the help!
John
On Fri, Jun 21, 2013 at 9:08 AM, Nick Cox <[email protected]> wrote:
> I am the author of the -sxpose- (SSC) referred to; it is as stated
> designed to transpose a string variable dataset. What you did comes
> with a warning, so I readily agree that it is not a good solution.
>
> With your dataset, I can save the string values, -drop- that variable,
> -xpose- and then copy back.
>
> . l
> +---------------------------------------------------+
> | varlabels category diff_c~y catego~3 |
> |---------------------------------------------------|
> 1. | Some label .79221 .67874 .70605 |
> 2. | Another label .95949 .75774 .031833 |
> 3. | Additional label .65574 .74313 .27692 |
> 4. | Fourth label .035712 .39223 .046171 |
> 5. | Fifth label .84913 .65548 .097132 |
> |---------------------------------------------------|
> 6. | Final label .93399 .17119 .82346 |
> +---------------------------------------------------+
> . forval i = 1/6 {
> 2. local label`i' = varlabels[`i']
> 3. }
> . drop varlabels
> . xpose, clear
> . l
> +--------------------------------------------------------+
> | v1 v2 v3 v4 v5 v6 |
> |--------------------------------------------------------|
> 1. | .79221 .95949 .65574 .035712 .84913 .93399 |
> 2. | .67874 .75774 .74313 .39223 .65548 .17119 |
> 3. | .70605 .031833 .27692 .046171 .097132 .82346 |
> +--------------------------------------------------------+
> . forval i = 1/6 {
> 2. label var v`i' "`label`i''"
> 3. }
>
> This could be programmed up for more general situations.
> .
> Nick
> [email protected]
>
>
> On 21 June 2013 14:40, John Bensin <[email protected]> wrote:
>> It looks like my email client mangled my first post, and I think the
>> formatting is relevant for the data sections, so I'll try this again.
>>
>> I have several datasets that fit this general format (variable names
>> are in the first row):
>>
>>
>> varlabels category diff_category category3
>> "Some label" 0.79221 0.67874 0.70605
>> "Another label" 0.95949 0.75774 0.031833
>> "Additional label" 0.65574 0.74313 0.27692
>> "Fourth label" 0.035712 0.39223 0.046171
>> "Fifth label" 0.84913 0.65548 0.097132
>> "Final label" 0.93399 0.17119 0.82346
>>
>>
>> My goal is to use Stata 12 to get them into this format:
>>
>> _var1 _var2 _var3 _var4 _var5 _var6 cat
>> 0.792210 0.959490 0.655740 0.035712 0.849130 0.933990
>> "category"
>> 0.678740 0.757740 0.743130 0.392230 0.655480 0.171190
>> "diff category"
>> 0.706050 0.031833 0.276920 0.046171 0.097132 0.823460
>> "category3"
>>
>> where _var1 is labelled "Some label", _var2 is labelled "Another
>> label", etc. The datasets are different and numerous enough that
>> labelling the variables individually isn't possible. The names of the
>> category variables aren't predictable either.
>>
>> My current solution is to improvise with Mata, xpose, and destring/tostring
>>
>>
>> clear
>> input str50 varlabels category diff_category category3
>> "Some label" -18.2059 0.67874 105869.3
>> "Another label" 0.95949 598.165 0.031833
>> "Additional label" 0.65574 0.74313 0.27692
>> "Fourth label" 194 0.39223 0.046171
>> "Fifth label" 0.84913 394.069 -5968387
>> "Final label" 0.93399 0.17119 0.82346
>> end
>>
>> putmata varlabels, replace
>>
>> mata
>> varlabels = varlabels'
>> end
>>
>> xpose, clear varname
>> tostring v*, force usedisplayformat replace
>> getmata (v*)=varlabels, update force
>>
>> foreach v of varlist v* {
>> local name = `v'[1]
>> lab var `v' "`name'"
>> }
>>
>> drop if _varname == "varlabels"
>> destring, replace
>>
>>
>>
>> Although this works, I question if there is a better way because I see
>> a few problems with this:
>>
>> 1. The combination of tostring/destring may cause a loss of precision,
>> according to Stata's warnings.
>>
>> 2. The --getmata-- call forces the vector of labels into the first
>> observation, which seems clunky. Is it safe to always rely on this
>> working or do I risk overwriting data?
>>
>> I also pursued a solution using the --sxpose-- command from the SSC:
>>
>>
>> clear
>> input str50 varlabels category diff_category category3
>> "Some label" 0.79221 0.67874 0.70605
>> "Another label" 0.95949 0.75774 0.031833
>> "Additional label" 0.65574 0.74313 0.27692
>> "Fourth label" 0.035712 0.39223 0.046171
>> "Fifth label" 0.84913 0.65548 0.097132
>> "Final label" 0.93399 0.17119 0.82346
>> end
>>
>> sxpose, clear force
>>
>> foreach v of varlist _var* {
>> local name = `v'[1]
>> lab var `v' "`name'"
>> }
>>
>>
>> This costs me the names of the category variables, which I need to
>> preserve, as well as some accuracy. It yields data that look like
>> this:
>>
>> _var1 _var2 _var3 _var4
>> _var5 _var6
>> Some label Another label Additional label Fourth label
>> Fifth label Final label
>> 0.792209983 0.959490001 0.655740023 0.035712
>> 0.849129975 0.933990002
>> 0.678740025 0.757740021 0.743130028 0.392230004
>> 0.655480027 0.171189994
>> 0.706049979 0.031833001 0.276919991 0.046170998
>> 0.097131997 0.823459983
>>
>>
>> Any suggestions? I'm happy to continue with my Mata/xpose solution if
>> it won't cost me data, but if there's a better solution I'll use it.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/