Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Breaking one string variable into several new variables
From
Eric Booth <[email protected]>
To
[email protected]
Subject
Re: st: Breaking one string variable into several new variables
Date
Wed, 24 Feb 2010 17:02:24 -0600
>
It's easier to fix your problem by importing the data correctly--it appears that Stata doesn't understand your data structure.
Your data are a .txt file, but how are the delimited? (It looks like a tab from your example)
What command did you use to import them? You may want to try opening the file up in a spreadsheet program and saving them as a
tab-delimited or a comma-delimited file so that you know how to properly specify your import command (e.g., insheet, infile, etc).
Also, you could try converting the file to .dta or other filetypes using Stat Transfer. The point is that whatever import command you
used did not tell Stata about the correct delimiter and so it placed all the observations into one column (v1).
Your data structure looks consistent, so I doubt that one of the import commands just won't work for you, but if not,
then try using the -split- command rather than the -substr- function. So with the first observation:
******
clear
inp str90 var1
"oilseed farming 100 cotton farming 2000 .1"
end
split var1
li
*****
~ Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Feb 24, 2010, at 4:22 PM, Anna Rakhman wrote:
> Dear Statalist,
>
> I have the following issue I was hoping you could help with. I've imported
> data from a .txt file and no matter how I import it, I always end up with
> one variable while I really need 6 different variables.
>
> This is what my file now looks like now (this is the first 4 observations of
> variable v1, the only variable in the dataset):
>
> industry1 industry1_def industry2
> industry2_def year value
> 1 oilseed farming 100
> cotton farming 2000 .1
> 2 logging 200
> iron ore mining 2000 .2
> 3 blah and blah and blah 300
> yata, yata 2000 .3
>
> This is a made-up example, but as you can see, the problem is that each
> column should be a separate variable.
>
> I've tried using gen split1=(v1,1), gen split2=(v1,-1) and gen
> split3=(v1,-2) to get industr1, value, and year as separate variables, but
> I'm not sure how to get industry2 as a separate variable because it is not a
> fixed number of words from either end of the string.
>
> Any suggestions?
>
> Thanks!
> Anna
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/