Have you considered -tokenize- using "," as the parse character?
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Michael S.
Hanson
Sent: Wednesday, June 29, 2005 4:30 PM
To: [email protected]
Subject: st: Substring extraction based on punctuation
I have a (large) set of variables with labels of the (general) form:
Some text, some more text, still more text
Also some text, lots and lots more text, text
(etc.)
The commas are the separators of interest to me: I would like to
extract the sub-strings before, between and after the commas (excluding
the commas and trailing spaces themselves) into three local string
variables for further use. The number of words in each part of the
label vary as do the total number of words; hence the -word # of
`varname'- extended macro does not appear to apply here. The closest I
have come with extended macros is:
local varlbl : variable label `varname'
local varlbl1 : piece 1 20 of "`varname'"
local varlbl2 : piece 2 20 of "`varname'"
local varlbl3 : piece 3 20 of "`varname'"
but this doesn't reliably return the desired substrings (given the
variation in words (and in word length) between commas) -- 20 here is
simply an approximate value that works for a particular subset of
labels. Same with the -nobreak- option. (This code also does not
strip off the commas.)
So instead of extended macros, I've tried using string functions. I
suspect that if I knew and understood regular expression syntax, I
could make use of -regexm- and -regexs- on `varlbl' -- but I don't.
Instead, the following "works":
local varlbl : variable label `varname'
local l = length("`varlbl'")
local c1 = strpos("`varlbl'",",")
local c2 = strpos(reverse("`varlbl'"),",")
local varlbl1 = substr("`varlbl'",1,`c1'-1)
local varlbl2 = substr("`varlbl'",`c1'+2,`l'-`c1'-`c2'-1)
local varlbl3 = substr("`varlbl'",`l'-`c2'+3,`l')
... but I'm really hoping to find some alternative code that is
"cleaner" and more transparent. Any such suggestions are welcome.
Thanks in advance.
-- Mike
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/