| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: collecting raw data from the web via browser automation
As a later post indicates, you can use Perl's LWP module for this, or
as Phil suggests, Python. But when it comes down to it Michael's
suggestion below is far more useful:
--cut here--
capt program drop _all
program goograb,rclass
syntax ,Name(string)
local name : subinstr local name " " "+",all
local url "http://scholar.google.com/scholar?
q=`name'&ie=UTF-8&oe=UTF-8&hl=en&btnG=Search"
copy "`url'" test.html, text replace
end
-- cut here--
goograb, name(blasnik michael)
returns test.html (hardcoded out of laziness; could use a tempfile
and then use file commands to snarf it and work with the contents).
Give -goograb- any other name and it will look for their stuff in
Google Scholar.
Kit Baum, Boston College Economics
http://ideas.repec.org/e/pba1.html
On May 23, 2006, at 2:33 AM, Michael wrote:
I'm not sure if any of these tools can actually solve the problem
originally
posted.
The example Kit gives shows accessing a static web page -- a page that
already exists "as is" and one you could also simply copy to your
local
drive using Stata itself (copy http:/.../...) and then parse it as
needed.
It's easy to download that data directly to Stata and I don't think
that is
the problem.
I think what the original post asked for (and what I would be
interested in
as well) is a way to access web pages that are only created when an
action
is taken or selection is made on a different web page, so there is no
specific web address that holds the data you want. I have thought
about
trying to use auto-it or another scripting language to launch a
browser,
make selections on a web page and then capture the data that's spawned
typically in a new window.
Do any of the tools mentioned by Kit or Phil actually do this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/