Hi everyone,
Sorry for bothering you again but I have a tricky problem here (at
least for me).
I'm trying to load a 500mb text file into Stata (actually, I need to
open 20 files like that). I have already read the manual / statalist
archive / faq, etc. and still cant get what I want.
Characteristics of the data:
- It is in Tab-Separated-Values.
- I don't want to load the entire variable list (only 4 variables from
a total of 50). Note that those vars are not in consecutive order.
- The first line is used for the headers. The headers CONTAIN spaces
and no quotes (eg: User Code).
- Instead of dots for missing values, there are spaces.
- When a value is missing in the last variable from a line, the file
just omits the missing value (instead of putting a TAB plus a SPACE).
Failed Attempts:
- First I tried to use -insheet- but the data is too big and won't
fit. I suspect that part of the fault is caused by the use of "float"
even for variables that are of "byte" type.
- As the data is not on fixed format, I discarded the use of infix..
- When I tried to use the free format -infile- (infile1) Stata got
confused with the use of spaces as missing values. So I used the
-filefilter- command to replace the spaces with dots.
- Afterwards, I discovered that when there was a missing value in the
last variable in observation "N", infile1 would use the first value of
observation "N+1" instead of the missing value! But as the manual
states, that behaviour is correct, since the program allows for
observations to span multiple lines.
- In a last attempt, I tried using infile2 with a dictionary. However,
I couldnt use _skip because it would skip columns, not variables.
Also, even if i only wanted 4 variables, the names of all the 50vars
need to be stated in the dictionary.
My last hope is to copy/paste names for the 50 vars, and to use -in-
to open chunks of the entire file, dropping the non-wanted variables
afterwards. However, even if it works, the solucion will probably be
very slow, and I feel like there must be a better way.
Any ideas? Or should I go with the "split/append" logic?
Thanks a lot!
Sergio Correia
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/