In an earlier response of mine to this post I blamed the ...
(dot-dot-dot) special character for breaking my file read code. That
was not the reason.
The command file read `fh' line chokes on do-file lines where a
comment is inserted before the end of the line with the double forward
slash syntax. I have no idea how to make that go away. I tried
enclosing my file read/file write routine within this if-condition:
if !regexm("macval(`line')","[[a-zA-Z0-9][:punct:]]*\/\/"){
read line in this file
write line in that file
}
But that had no effect.
Gabi
On Thu, Apr 3, 2008 at 12:20 AM, Sebastian Bauhoff <[email protected]> wrote:
> Dear Statalisters,
>
> I need to download a large number of html files from the internet and parse
> their content. The structure of the html pages is always the same, and I
> need to extract only a small part that is identified within the html code.
> I would like to use Stata to download the files, extract the information I
> want, and save the result in a dataset. Any suggestions or pointers much
> appreciated.
>
> Thanks,
> Sebastian
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/