Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reading in long string variables (yet again)
From
Steve Nakoneshny <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: reading in long string variables (yet again)
Date
Fri, 20 Apr 2012 10:08:59 -0600
Eric,
The initial datafile is tab-delimited and contains a mixture of both categorical and non-categorical numeric variables along with a healthy number of string variables (without double quotes). I have already written a fair bit of code to manipulate these data into a workable dataset. This one string variable only became problematic when we realized that for some observations, its length exceeded 244 characters.
Thanks to your suggestion of -intext-, I think I've found a solution that will work for me. I can probably operationalize my workflow better (not to mention the code I've written), but that's a separate concern. Here's my solution:
Starting with my exported tab-delimted text file, I used StatTransfer to create two new text files. One file contained only my unique id variable. The other contained only this sticky string variable. I thought that by doing it this way, the sort order of the source file would be maintained (an assumption I rely on later). Although I don't include the code here, I then merged the resulting file back into my source dataset so I can make use of the "chunk" I wanted to get in the first place.
--- begin code ---
intext using "ds.txt", gen(ds) length(21)
drop in 1 // intext place varnames in first obs. do not want.
gen n = _n
tempfile ds
save `ds'
insheet using "tumorid.txt", clear
gen n = _n
merge 1:1 n using `ds'
drop n _merge
reshape long ds, i(tumorid)
drop if ds == "" // removes blank records
drop if ds == `";""' // removes incomplete chunk fragments
bys tumorid: gen N = _N
keep if _j == N // keeps only the chunk of interest
isid tumorid
drop _j N
--- end code ---
Steve
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/