Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: How do I split my string variable by capital letters?
From
"Joseph Coveney" <[email protected]>
To
<[email protected]>
Subject
st: Re: How do I split my string variable by capital letters?
Date
Thu, 24 Nov 2011 12:58:48 +0900
Elena Vidal wrote:
I'm having a bit of a problem splitting a string variable.
An example of the variable reads:
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"
This corresponds to a single cell. That means: all 5 lines appear in 1 cell in
my data, and I want to split it up so that each line is a different variable.
I've tried splitting it with this code:
split var12, gen(var12b)
but that didn't work.
I'd appreciate the help trying to sort this out!
--------------------------------------------------------------------------------
Are you saying that there are carriage-return/line-feed characters in the cell?
If so, then you can still use -split-. You just need to specify the -parse()-
option. See the illustration below. I've seen this before when retrieving data
from prettified Excel workbooks.
Before using -split-, you should double-check what you have in there that's
delimiting lines. Sometimes it's only a line-feed character or a
carriage-return character, and not both. An earlier thread this week (about the
perennial problem of ASCII character 160) contains recommendations that are
applicable to identifying nonprinting characters in your string variables. Take
a look at that thread for how to determine what the line delimiter is in your
dataset.
Joseph Coveney
. * Set up
. set obs 1
obs was 0, now 1
. input str54 var12
var12
1. "Startup/Seed?Early Stage?Expansion?Expansion?Expansion"
. replace var12 = subinstr(var12, "?", "`=char(13)'`=char(10)'", .)
var12 was str54 now str58
(1 real change made)
. tempfile tmpfil0
. outsheet using "`tmpfil0'", names quote
.
. * Looks like this
. type "`tmpfil0'"
var12
"Startup/Seed
Early Stage
Expansion
Expansion
Expansion"
.
. * Solution
. split var12, generate(var12b) parse("`=char(13)'`=char(10)'")
variables created as string:
var12b1 var12b2 var12b3 var12b4 var12b5
. list var12b?
+----------------------------------------------------------------+
| var12b1 var12b2 var12b3 var12b4 var12b5 |
|----------------------------------------------------------------|
1. | Startup/Seed Early Stage Expansion Expansion Expansion |
+----------------------------------------------------------------+
.
.
. exit
end of do-file
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/