Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates
From
Gordon Hughes <[email protected]>
To
[email protected]
Subject
st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates
Date
Wed, 04 Jan 2012 11:00:22 +0000
The HadCRUT3 dataset is a relatively straightforward one to handle
once you have done a minor edit on the header lines. All you really
need to do is a global replacement of "rows" by " " and "columns.
Missing=" by " ". You can use Notepad++ for this if your regular
text editor cannot handle files of 60+ Mb.
Once you have done this, the code below will do almost everything you
want. [I wrote this to handle a version of the HadCRUT3 dataset up
to mid-2010.] The edited version of the original data is stored in
the file "hadcrut3.txt". The panel_id and time_id variables are
defined at the end of the code. Replace the command -save- by
-saveold- if you want to save the data as a State 9 dataset.
Gordon Hughes
[email protected]
========
capture cd "g:\CRU_Data";
capture log close;
log using "hadcrut3_grid_data.log", replace;
infile year month unit nrows ncolumns missval
row1_1-row1_72 row2_1-row2_72 row3_1-row3_72 row4_1-row4_72
row5_1-row5_72 row6_1-row6_72 row7_1-row7_72 row8_1-row8_72
row9_1-row9_72 row10_1-row10_72 row11_1-row11_72 row12_1-row12_72
row13_1-row13_72 row14_1-row14_72 row15_1-row15_72 row16_1-row16_72
row17_1-row17_72 row18_1-row18_72 row19_1-row19_72 row20_1-row20_72
row21_1-row21_72 row22_1-row22_72 row23_1-row23_72 row24_1-row24_72
row25_1-row25_72 row26_1-row26_72 row27_1-row27_72 row28_1-row28_72
row29_1-row29_72 row30_1-row30_72 row31_1-row31_72 row32_1-row32_72
row33_1-row33_72 row34_1-row34_72 row35_1-row35_72 row36_1-row36_72
using hadcrut3.txt;
drop unit nrows ncolumns missval;
compress;
reshape
long row1_ row2_ row3_ row4_ row5_ row6_ row7_ row8_ row9_ row10_
row11_ row12_ row13_ row14_ row15_ row16_ row17_ row18_
row19_ row20_
row21_ row22_ row23_ row24_ row25_ row26_ row27_ row28_
row29_ row30_
row31_ row32_ row33_ row34_ row35_ row36_, i(year
month) j(hgrid);
forvalues n=1/36 {;
rename row`n'_ dtemp`n';
};
reshape long dtemp, i(year month hgrid) j(vgrid);
replace dtemp=. if dtemp <= -1000000;
sort vgrid hgrid year month;
by vgrid hgrid: egen cell_nobs=count(dtemp);
drop if cell_nobs <= 0;
* panel_id: 5 deg grid cells starting from 90-85N, 180-175W = 1;
gen panel_id=(vgrid-1)*72+hgrid;
* time-_id: months starting with Jan 1850 = 1;
gen time_id=(year-1850)*12+month;
sort panel_id year time_id;
compress;
describe;
save "hadcrut3_grid_data.dta", replace;
==================
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/