Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Another question regarding string variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Another question regarding string variables
Date
Wed, 27 Feb 2013 10:30:10 +0000
Often overlooked in this territory are -egen- functions for strings.
The tale starts with -head()- and -tail()- from 1999:
STB-50 dm70 . . . . . . . . . . . . . . . . Extensions to generate, extended
(help egenodd if installed) . . . . . . . . . . . . . . . . N. J. Cox
7/99 pp.9--17; STB Reprints Vol 9, pp.34--45
24 additional egen functions presented; includes various string,
data management, and statistical functions;
many of the egen functions added to Stata 7
These were implemented in official Stata version 7 as different
_options_ to -egen-'s -ends()- function, but I still prefer my
original syntax.
egen first = ends(name), head
egen last = ends(name), tail
should do what Michael wants. Using -word()- repeatedly and regular
expressions are better tricks in general, but these exist tailor-made
already.
Nick
[previous posts combined and edited]
Michael Stewart
> Thank you very much Kieran and Steve for the timely help
> All functions are working
Kieran McCaul
You can do this with regular expressions:
clear *
input str20 name
"John Howard R"
end
gen first = regexs(1) if regexm(name, "([a-zA-Z]+)[ ]*([a-zA-Z]+)[ ]*[a-zA-Z]")
gen last = regexs(2)+ " " +regexs(3) if regexm(name, "([a-zA-Z]+)[
]*([a-zA-Z]+)[ ]*([a-zA-Z])")
list
or you could do it with the word() function:
clear *
input str20 name
"John Howard R"
end
gen first = word(name,1)
gen last = word(name,2)+ " " +word(name,3)
list
Steve Nakoneshny
>> I don't have access to the help file from my phone, but I'm fairly certain you should be able to extract *any* word from a string var using the -word- function.
>>
>> Completely untested off the top of my head (with no recollection of the appropriate syntax):
>>
>> g lname = word(yourvar,1)
>> g fname = word(yourvar,2)+word(yourvar,3)
>>
>> The above is an inelegant means of approximating your needs. Adjusting for valid syntax would be a good start. I have no doubt that there are other string function solutions that would equally suffice.
>>
>> If you are wedded to using -split-, you may with to insert a comma between words 1 & 2 of your string via -subinstr- and then proceed with -split yourvar,parse(,)-.
Michael Stewart
-word()- will give me the second word
But what I am trying to get is the first word and rest of the string
as second variable.
For example: John Howard R --> John & (Howard R ) as two strings AND not as
John & Howard & R separately as three strings
Steve Nakoneshny
There is a string function called -word()- that will serve your
purpose. See -h word()- for more details.
Michael Stewart
>>>>> I am trying to find if there is a function to split a string "Howard
>>>>> James R" --> "Howard" & ("James R")
>>>>> If I use -split-, I would get Howard, James and R which is not what I want
>>>>> I want to split the string after the first word into two string
>>>>> variables first variable containing first word and second variable
>>>>> containing rest of the string
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/