|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Getting rid of line-breaks in Data
Elmar:
I had a similar issue with an unknown character (it wasn't a box...it
was a symbol that looked like a em-dash with a dot over it and,
similar to your situation, acted like a end-of-line character for some
programs. I used file filter with some of its patterns for EOL
characters until one of them knocked it out--solved my issue.
So, you may try all the EOL patterns mentioned in the -filefilter-
help file:
filefilter oldfile.txt newfile.txt , from(\n) to(\t)
If "\n" doesnt work, try to substitute it with "\r", "\M", "\W", or
"\U" or some ASCII characters (you might want to try the ascii
"\254d", see: http://www.theasciicode.com.ar/ascii-table-codes/ascii-codes-254.html
for more).
Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249
On Jun 18, 2009, at 7:32 PM, Matt Spittal wrote:
Dear Elmar,
Carriage returns can be very difficult to deal with. I don't have
any clear
answers, except to say that I have found a good text editor to be
invaluable
for cleaning a file. For instance, with my text editor
(TextWrangler) I can
change between UNIX, Windows and Mac carriage returns and I can use
grep
functions to find and replace symbols like the carriage return. If
you can
export your data from Access as a text file (csv) and then clean it
within a
text editor, then this might be a good solution.
I am not sure what computer system or text editor you are using at
present,
but some very good advice on text editors is given here.
http://fmwww.bc.edu/repec/bocode/t/textEditors.html
Good luck,
-- Matt
[email protected]
On 18/6/09 5:28 PM, "Elmar Saathoff" <[email protected]> wrote:
Dear list members,
I am frequently using data that were imported from PDAs via
MsAccess. In
some cases these data contain some little squares that do not seem
to do
much harm in Stata, but that other applications interpret as
linebreaks/carriage returns/paragraph marks, which is quite a
hassle. It
seems that these things are inadvertently entered into the PDAs by
the
people collecting the data. Unfortunately I cannot show them in this
email, because my email client also interprets them as carriage
returns.
Anyway, I have been trying to identify and get rid of these things by
programming (using "subinstr", "egen...split" etc.), but
unfortunately,
whatever I do, Stata also interprets them as carriage returns, both
in
do files and in the command window, even if I change the delimiter to
";" via the delimit command.
Any advice would be greatly appreciated.
Thanks in advance, Elmar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/