[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using data off the web

From	Alan Riley <[email protected]>
To	[email protected]
Subject	Re: st: using data off the web
Date	Mon, 14 Apr 2008 14:28:25 -0500

David Airey ([email protected]) wished that -insheet- could
read from files over the web, just like -use-:
> I know that "use" allows using Stata .dta files from the web, but if a
> site has "insheet"-like data (and nothing else), it would be nice if I
> could insheet from the http site, like you can "use" from an http site.

Some discussion ensued with various workarounds such as first -copy-ing
the file from the web to the local disk.

However, none of the workarounds are necessary. -insheet- already can
read files directly from the web:

. insheet using "http://www.genenetwork.org/cgi-bin/WebQTL.py?cmd=cor&probeset=rs13481111&db=BXDGeno&searchdb=bra12-03MAS5&return=500";
(4 vars, 500 obs)

. describe

Contains data
obs: 500
vars: 4
size: 17,000 (99.8% of memory free)
----------------------------------------------------------------------
storage display value
variable name type format label variable label
----------------------------------------------------------------------
preprobesetid str11 %11s <pre>ProbesetID
correlation float %9.0g Correlation
strains byte %8.0g #Strains
pvalue str14 %14s p-value
----------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

I suspect that David tried a command similar to the one above without
quotes around the filename, which may have resulted an error message.
With a simple filename, quotes would not be required, but the filename
above is complicated with several characters in it (such as '=' which
could trip up Stata's parser).

By the way, the URL above does NOT return a plain text file. In the
output of -describe- above, you will notice the HTML tag "<pre>" in
the first variable label. And, if you -list- the data, you will see
the last value of the variable 'pvalue' contains a closing HTML tag
"</pre>" on it.

It is important when reading data directly from the web to remember
that Stata will see exactly what is sent to it by the remote
webserver. This may not be the same as what your eye sees in your
browser. It is a good idea to use a browser to do a "view source" on
the page of interest to make sure there are not extraneous HTML tags
in it that are probably not wanted. One possibility would be to
use Stata's -filefilter- command to strip out such tags.

--Alan Riley
([email protected])
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: using data off the web
  - From: David Airey <[email protected]>

References:
- st: using data off the web
  - From: David Airey <[email protected]>

Prev by Date: st: RE: Comparing models - Akaike, Wald-Test?
Next by Date: st: RE: How to graph a large plot which contains several two-way scatter plots in it?
Previous by thread: RE: st: RE: using data off the web
Next by thread: Re: st: using data off the web
Index(es):
- Date
- Thread