Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: String variables over 244 in a dataset with two delimiters
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Re: String variables over 244 in a dataset with two delimiters
Date
Tue, 20 Sep 2011 10:21:52 -0400
Joseph Coveney <[email protected]>:
Good answer. If some substrings delimited by semicolons are greater
than 244 characters in lengths, and you want to keep all information,
you can also use -file- to step through the file one line at a time
and save bits of longer strings as separate variables, e.g. in 100
character chunks.
On Mon, Sep 19, 2011 at 11:40 PM, Joseph Coveney <[email protected]> wrote:
> Adam Ozimek wrote:
>
> I have a dataset that is tab delimited, and one of the variables is a string
> that can be over 244 characters. If I read this using insheet, or inputst, or I
> think anything else, it truncates this variable. However, there is an aspect of
> the string variable that I hope will let me get around this: it is delimited by
> semicolon. Is there a way to select one of the columns in a tab delimited
> dataset, and read in by parsing it as semi-colon delimited? Is there some
> otherway to rescue the long variable without the truncation?
>
> --------------------------------------------------------------------------------
>
> There are a couple of ways to approach this problem, but I think that the most
> direct is to use Stata's -filefilter- command to convert semicolons to
> double-quote + tab + double-quotes, and then read the converted file in with
> -insheet-. (To learn more about-filefilter-, see Stata's online help for the
> command or see its entry in the user manual.)
>
> Notes:
>
> 1. This assumes that your string column's contents are surrounded by
> double-quotation marks. If not, then just convert the semicolons to tabs alone.
>
> 2. If your tab-delimited file has a header row (column names), then remember to
> insert a new name for your newly created column. There are a couple of ways to
> do that, too, in Stata, but again -filefilter- might be the most direct.
>
> 3. Don't overwrite your original. (I'm not sure that -filefilter- will even
> allow you to name <newfile> the same as <oldfile>, but if it does, don't do it.)
>
>
> 4. The converted file can be a temporary file by using -tempfile- in conjunction
> with -filefilter-. This makes the project's intermediate-file-cleanup chores
> easier.
>
> Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/