Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?

From	John Bensin <[email protected]>
To	[email protected]
Subject	Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
Date	Fri, 21 Jun 2013 09:55:49 -0500

That is much simpler; thank you for the help!

John

On Fri, Jun 21, 2013 at 9:08 AM, Nick Cox <[email protected]> wrote:
> I am the author of the -sxpose- (SSC) referred to; it is as stated
> designed to transpose a string variable dataset. What you did comes
> with a warning, so I readily agree that it is not a good solution.
>
> With your dataset, I can save the string values, -drop- that variable,
> -xpose- and then copy back.
>
> . l
>      +---------------------------------------------------+
>      |        varlabels   category   diff_c~y   catego~3 |
>      |---------------------------------------------------|
>   1. |       Some label     .79221     .67874     .70605 |
>   2. |    Another label     .95949     .75774    .031833 |
>   3. | Additional label     .65574     .74313     .27692 |
>   4. |     Fourth label    .035712     .39223    .046171 |
>   5. |      Fifth label     .84913     .65548    .097132 |
>      |---------------------------------------------------|
>   6. |      Final label     .93399     .17119     .82346 |
>      +---------------------------------------------------+
> . forval i = 1/6 {
>   2. local label`i' = varlabels[`i']
>   3. }
> . drop varlabels
> . xpose, clear
> . l
>      +--------------------------------------------------------+
>      |     v1        v2       v3        v4        v5       v6 |
>      |--------------------------------------------------------|
>   1. | .79221    .95949   .65574   .035712    .84913   .93399 |
>   2. | .67874    .75774   .74313    .39223    .65548   .17119 |
>   3. | .70605   .031833   .27692   .046171   .097132   .82346 |
>      +--------------------------------------------------------+
> . forval i = 1/6 {
>   2. label var v`i' "`label`i''"
>   3. }
>
> This could be programmed up for more general situations.
> .
> Nick
> [email protected]
>
>
> On 21 June 2013 14:40, John Bensin <[email protected]> wrote:
>> It looks like my email client mangled my first post, and I think the
>> formatting is relevant for the data sections, so I'll try this again.
>>
>> I have several datasets that fit this general format (variable names
>> are in the first row):
>>
>>
>> varlabels             category     diff_category     category3
>> "Some label"        0.79221      0.67874         0.70605
>> "Another label"     0.95949      0.75774         0.031833
>> "Additional label"  0.65574      0.74313         0.27692
>> "Fourth label"      0.035712     0.39223         0.046171
>> "Fifth label"       0.84913      0.65548         0.097132
>> "Final label"       0.93399      0.17119         0.82346
>>
>>
>> My goal is to use Stata 12 to get them into this format:
>>
>> _var1      _var2      _var3      _var4      _var5      _var6        cat
>> 0.792210   0.959490   0.655740   0.035712   0.849130   0.933990
>> "category"
>> 0.678740   0.757740   0.743130   0.392230   0.655480   0.171190
>> "diff category"
>> 0.706050   0.031833   0.276920   0.046171   0.097132   0.823460
>> "category3"
>>
>> where _var1 is labelled "Some label", _var2 is labelled "Another
>> label", etc. The datasets are different and numerous enough that
>> labelling the variables individually isn't possible. The names of the
>> category variables aren't predictable either.
>>
>> My current solution is to improvise with Mata, xpose, and destring/tostring
>>
>>
>> clear
>> input str50 varlabels category diff_category category3
>> "Some label"        -18.2059     0.67874     105869.3
>> "Another label"     0.95949      598.165     0.031833
>> "Additional label"  0.65574      0.74313     0.27692
>> "Fourth label"        194          0.39223     0.046171
>> "Fifth label"       0.84913      394.069     -5968387
>> "Final label"       0.93399      0.17119     0.82346
>> end
>>
>> putmata varlabels, replace
>>
>> mata
>>     varlabels = varlabels'
>> end
>>
>> xpose, clear varname
>> tostring v*, force usedisplayformat replace
>> getmata (v*)=varlabels, update force
>>
>> foreach v of varlist v* {
>>     local name = `v'[1]
>>     lab var `v' "`name'"
>> }
>>
>> drop if _varname == "varlabels"
>> destring, replace
>>
>>
>>
>> Although this works, I question if there is a better way because I see
>> a few problems with this:
>>
>> 1. The combination of tostring/destring may cause a loss of precision,
>> according to Stata's warnings.
>>
>> 2. The --getmata-- call forces the vector of labels into the first
>> observation, which seems clunky. Is it safe to always rely on this
>> working or do I risk overwriting data?
>>
>> I also pursued a solution using the --sxpose-- command from the SSC:
>>
>>
>> clear
>> input str50 varlabels category diff_category category3
>> "Some label"        0.79221      0.67874     0.70605
>> "Another label"     0.95949      0.75774     0.031833
>> "Additional label"  0.65574      0.74313     0.27692
>> "Fourth label"        0.035712     0.39223     0.046171
>> "Fifth label"       0.84913      0.65548     0.097132
>> "Final label"       0.93399      0.17119     0.82346
>> end
>>
>> sxpose, clear force
>>
>> foreach v of varlist _var* {
>>     local name = `v'[1]
>>     lab var `v' "`name'"
>> }
>>
>>
>> This costs me the names of the category variables, which I need to
>> preserve, as well as some accuracy. It yields data that look like
>> this:
>>
>> _var1             _var2               _var3               _var4
>>        _var5            _var6
>> Some label        Another label      Additional label     Fourth label
>>      Fifth label        Final label
>> 0.792209983        0.959490001        0.655740023        0.035712
>>      0.849129975        0.933990002
>> 0.678740025        0.757740021        0.743130028        0.392230004
>>      0.655480027        0.171189994
>> 0.706049979        0.031833001        0.276919991        0.046170998
>>      0.097131997        0.823459983
>>
>>
>> Any suggestions? I'm happy to continue with my Mata/xpose solution if
>> it won't cost me data, but if there's a better solution I'll use it.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
  - From: John Bensin <[email protected]>
- Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
  - From: John Bensin <[email protected]>
- Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Subtracting observations by group
Next by Date: st: Statalist advice: a summary
Previous by thread: Re: st: Is there a better way to transpose a dataset and automatically label variables based on a string variable?
Next by thread: st: Subtracting observations by group
Index(es):
- Date
- Thread