Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: String variables over 244 in a dataset with two delimiters
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Re: String variables over 244 in a dataset with two delimiters
Date
Thu, 22 Sep 2011 19:53:05 +0100
There is a bug here.
if (sep == "")
should be
if (sep == " ")
On Thu, Sep 22, 2011 at 2:35 PM, Nick Cox <[email protected]> wrote:
> What's implicit, I hope, is that I am guessing is that the best
> strategy for Adam's specific problem is to separate out the long
> variable, in which case it can be parsed on semi-colons and merged
> back in somehow.
>
> I am not keen on trying to write a program for Adam's mix of tabs
> delimiting variables and semi-colons also being used within the
> longest string.
>
> Here is a nth field program. It selects the n th field from each line
> (record) of a text file and puts it elsewhere. Asking for a nth field
> that does not exist or a nth field being empty is not a problem; empty
> strings are returned in each case. I can't guarantee that this copes
> with all problems and would be pleased to hear of cleaner approaches.
>
> *! NJC 1.0.0 22 Sept 2011
> program nthfield
> version 9
> syntax anything(name=files) [, N(int 1) DELIMiter(str) ]
>
> gettoken data files : files
> gettoken field files : files
> if "`data'" == "" | "`field'" == "" | "`files'" != "" {
> di as err "syntax is: " ///
> as txt "nthfield {it:datafile fieldfile}"
> exit 198
> }
>
> confirm file "`data'"
> confirm new file "`field'"
>
> if "`delimiter'" == "" local sep = char(9)
> else local sep "`delimiter'"
>
> tempname in out
> file open `in' using "`data'", r
> file open `out' using "`field'", w
> file read `in' line
>
> while r(eof) == 0 {
> mata : _nth("line", `n', "`sep'")
> file write `out' `"`line'"' _n
> file read `in' line
> }
> file close `out'
> end
>
> version 9
> mata :
>
> void _nth(string scalar macname, scalar n, string scalar sep) {
> string rowvector fields
> string scalar nth
> scalar nf, nsep, j
>
> fields = tokens(st_local(macname), sep)
> nf = cols(fields)
> nth = ""
>
> if (sep == "") {
> if (n <= nf) nth = fields[n]
> }
> else {
> j = nsep = 0
> while (nsep < (n - 1) & j < nf) {
> if (fields[++j] == sep) nsep++
> }
> if (j < nf) {
> if (fields[j + 1] != sep) nth = fields[j + 1]
> }
> }
>
> st_local(macname, nth)
> }
>
> end
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/