Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -use- from a compressed file
From
Daniel Feenberg <[email protected]>
To
[email protected]
Subject
st: -use- from a compressed file
Date
Thu, 18 Aug 2011 16:06:28 -0400 (EDT)
The Stata knowledge base includes a note on reading ASCII data from a
pipe, which would allow one to read a file without storing the
decompressed version on disk. We have never had success with the method
shown there - I always get the error message "mypipe.pip: not found". We
have terabytes of data that compresses very well, so this was always a
disappointment. We'd be interested in hearing if it works for anyone else.
While investigating this we found a work-around that seems much better.
Unlike the knowledge base suggestion, it will work with .dta files in
addition to ASCII files. This is very much more interesting to us. This
is done with the http option of the -use- command.
Our first try was to add the file test.cgi to our webservers cgi-bin
directory:
#!/bin/sh
echo Content-type: application/x-stata
/usr/bin/zcat /data/sample.dta.gz
and we find that
use http://www.nber.org/test
works from Stata but this involved a lot of overhead as the file whipped
around the LAN several times, so we haven't pursued taking the file name
from the URL or otherwise making this practical.
We are developing an alternative that doesn't require an actual webserver,
or even root permissions. This is done with the nc command which ships
with most Linux distributions and is available for windows also. At the
Stata prompt run the compound command:
.! (echo -ne "HTTP/1.0 200 OK\r\n\r\n"; zcat /data/sample.dta.gz;) | nc -l 8080 &
This command sets up the computer to transmit a header and the
decompressed file to the first process that reads from port 8080. Since
8080 is a high port, no special permission is required to use it. This
command won't return till the file is read from that port, when it will
show you the exact Stata request. Because of the & Stata continues while
nc waits. Then
. use http://127.0.0.1:8080
Note that you can't use "localhost" instead of 127.0.0.1 because the -use-
command won't accept one-part host names.
If there is no nc on your machine, look for ncat, netcat or socat. Some
versions will require a '-p' before the port number. You can install nc on
a Windows machine and should be able to do the same thing, but we haven't
tried it.
This could also be used for ascii files, encrypted files, split files, and
perhaps other types. If only Stat/Transfer would write to the standard
output!
There is a security issue - you give up the read restrictions in the Unix
permission bits. It is also slower than reading the uncompressed file from
disk, but still fast enough for us.
We have been trying to package this into an ado file, but without much
success, since a user-friendly ado program would need to find an available
port by itself, which we haven't seen a good way to do yet, and to
communicate it back to the use command, for which we are also at a loss. I
was hoping someone on the list might be inspired to suggest a method or
that Statacorp might just incorporate decompression into the use command.
Daniel Feenberg
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/