Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: renaming variables based on long labels
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: renaming variables based on long labels
Date
Thu, 21 Jul 2011 10:14:46 -0500
You can use -abbrev(,)- followed by -subinstr()-. I don't think that
is absolutely guaranteed to produce distinct variable names, but you
can try. If you are happy with 32 character names the combinatorics
would be in your favour, although using the same elements such as
"Relationship" works the other way.
Nick
On Thu, Jul 21, 2011 at 9:51 AM, Cohen, Elan <[email protected]> wrote:
> Nick,
>
> My -su- example was simply meant to illustrate that Stata _does_ have machinery for recognizing unique variable names, or at least unique combinations of characters. Here's another example.
>
> local pre somethinglong
> g `pre'abcd = 0
> g `pre'abdd = 0
> su `pre'*
> * The variables display as:
> something~cd
> something~dd
>
> g `pre'aacd = 0
> su `pre'*
> * The variables display as:
> somethin~bcd
> something~dd
> somethin~acd
>
> Somehow Stata knows to abbreviate the variables differently based on the other relevant variables. I'm imagining I could apply this same machinery to -rename- (obviously replacing '~' with say '_'). I can't view the source code for -summarize- so I'm not sure how Stata does this.
>
> Does anyone know of another command that does something similar with source code available?
>
> Thanks,
>
> - Elan
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Nick Cox
>> Sent: Thursday, July 21, 2011 10:37
>> To: [email protected]
>> Subject: Re: st: renaming variables based on long labels
>>
>> You are confusing quite different facts.
>>
>> In your code, -renvars- (SJ) is adding the prefix "longprefix" to the
>> variable names in the -bg2- dataset. But it so happens that all the
>> variable names so created remain legal. The longest is just 18
>> characters long so the upper limit of 32 is not biting at all.
>>
>> However, the names are long enough for Stata's abbreviation machinery
>> to be used when you call -summarize-, but this is not a matter of
>> changing the variable names, and in any case the character ~ is not
>> allowed within variable names.
>>
>> I imagine your problem is perfectly soluble, but will require more ad
>> hoc code from you. Example:
>>
>> foreach var of varlist VAR* {
>> local label `=strtoname("`:var lab `var''")
>> local label : subinstr local label "Relationship" "Reln", all
>> local label : subinstr local label "Democrat" "D", all
>> ....
>> }
>>
>> On Thu, Jul 21, 2011 at 9:00 AM, Cohen, Elan <[email protected]> wrote:
>> >
>> > I have a dataset with variable names VAR1-VAR40. All variables have
>> (pretty long) labels. I'm using
>> >
>> > foreach var of varlist VAR* {
>> > rename `var' `=strtoname("`:var lab `var''")'
>> > }
>> >
>> > to rename each variable according to their label. I'm running into a
>> problem because the variable labels are not unique within the first 32
>> characters and so I'm getting "Relationship_of_interviewee_to_s already
>> defined".
>> >
>> > Can Stata automatically generate a unique name based on existing variables?
>> This seems like it's possible based on the following output:
>> >
>> > webuse bg2
>> > renvars, prefix(longprefix)
>> > su
>> >
>> > where Stata truncates variables names and automatically finds a unique
>> string to display for each variable (-findit renvars-).
>> >
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/