Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: skipping rogue commas when importing csv file using -infile-
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: skipping rogue commas when importing csv file using -infile-
Date
Fri, 26 Oct 2012 18:43:41 +0100
I'd be tempted to read the whole thing in as one string variable and
process it within Stata.
I realise that there are limits on this, in terms of both storage
required and whether the beast will fit into str244. (But Mata may
help on the latter.)
If you can do that, the number of commas is just
gen nocommas = length(strvar) - length(subinstr(strvar, ",", "", .))
Nick
On Fri, Oct 26, 2012 at 4:59 PM, Rob Shaw <[email protected]> wrote:
> Hi
>
> I'm importing (part of) a large text file into Stata using --infile--.
> The file is a csv.
>
> However, it seems that a small number of lines have a rouge extra
> comma in them, which is then pushing all the data along by one
> variable. This happens not just for that line but for all subsqequent
> lines as well!
>
> I'm not too bothered if I have to later drop or reprocess this
> individual line but does anyone know if there is there a way to stop
> it affecting all the lines afterwards as well?
>
> File example (with identical records in this example)
>
> ABC,DEF,GH,IJK
> ABC,DEF,GH,IJK
> ABC,DEF,G,H,IJK
> ABC,DEF,GH,IJK
> ABC,DEF,GH,IJK
>
> What I then get is for var1 is
>
> ABC
> ABC
> ABC
> IJK
> IJK
>
> and var2 is
>
> DEF
> DEF
> DEF
> ABC
> ABC
>
> etc
>
> using hexdump it seems that all the lines finish with \r\n so if there
> is a way to use this to 'reset' at each line then that would work.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/