Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: importing quirky csv
From
James Sams <[email protected]>
To
[email protected]
Subject
Re: st: importing quirky csv
Date
Fri, 25 Nov 2011 11:03:31 -0600
On Thursday 24, November 2011 08:51:16 you wrote:
> I have a large number of large comma-separated text files that I am
> trying to import. "insheet" is not working; it imports the data, but
> many lines are missing. I think the reason is the file contains string
> fields that a) have embedded spaces, and b) are not enclosed in
> quotes.
I've run across this and found that, unless you want to write your own csv
parser (which is trickier than you might think), you will have to work outside
of Stata. That said, it is easy to automate from a do file. I've found Python's
csv parser to be quite robust and able to write out the csv files in such a way
that Stata will happily read them. The approach I took was to just parse the
entire directory of csv's and then import those into Stata. However, let's say
you wanted to make a script that you call for each file from within Stata, then
the python code should look like this (assuming python 2.7 and actually commas
as the separators. Note that whitespace is very important in python):
#!/usr/bin/env python
# reprocess_csv.py
# make files readable for stata
import sys
import csv
DELIMITER = ","
def reprocess(in_fn, out_fn):
with open(in_fn, 'rb') as in_fd:
with open(out_fn, 'wb') as out_fd:
reader = csv.reader(in_fd, delimiter=DELIMITER)
writer = csv.writer(out_fd, delimiter=DELIMITER)
writer.writerows(reader)
if __name__ == "__main__":
reprocess(sys.argv[1], sys.argv[2])
and then in stata:
local my_original_file "bad.csv"
tempfile good
! python reprocess_csv.py `my_original_file' `good'
insheet using `good', comma
I did write this on the fly, so there may be typos that I didn't catch, but it
is based on code I've used previously that works reliably.
--
James Sams
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/