Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: reading HTML source in Chinese but get a messy code
From
"Li Chuntao (Tony)" <[email protected]>
To
[email protected]
Subject
st: reading HTML source in Chinese but get a messy code
Date
Thu, 6 Jun 2013 21:36:21 +0800
Dear Listers,
I want to import the following HTML source files:
http://qq.ico.la/qq459322466.html
The source file contains some information in Chinese, which is
located in line 32 to 73.
i tried to import the information by using the following code:
clear all
set obs 500
copy "http://qq.ico.la/qq459322466.html" d:\qq.txt, replace
mata:
fh = fopen("d:\qq.txt", "r")
for(i=1; i<=34; i++) {
junk=fget(fh)
}
for(i=; i<=20; i++) {
junk=fget(fh)
junk
}
end
but the result data in memory is only a messy.
Similar code has been used for other webpage, thanks to Prof. Kit
Baum, as can be seen following:
clear all
set obs 500
local stkcd="000002"
gen str20 date="2012.12.31"
copy "http://stockdata.stock.hexun.com/2008/lr.aspx?stockid=`stkcd'&accountdate=2012.12.31"
d:\date.txt, replace
mata:
fh = fopen("d:\date.txt", "r")
for(i=1; i<=444; i++) {
junk=fget(fh)
}
Can someone familiar with Chinese encoding give me some hits?
Best
Chuntao
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/