There are a few ways that you could approach this, but I think the simplest
may be to just read in the data as text strings and then manipulate it. If
the rows are no more than 80 characters (or if you hace Stata/SE, then you
could change the first line to read in through column 244 instead of 80),
then this might work:
infix str line 1-80 using rawdata.txt
* identify lines with Grid-ref info
gen gridref=line if index(line,"Grid")
gen gridgroup=sum(gridref!="")
replace gridref=gridref[_n-1] if gridref==""
drop if index(line,"Grid")
split line, gen(month) destring
mvdecode month*, mv(-999)
gen year=1900+gridgroup
replace gridref=subinstr(gridref,"Grid-ref=","",.)
split gridref, gen(coord) parse(,) destring
rename coord1 xcoord
rename coord2 ycoord
order year xcoord ycoord
drop gridref gridgroup line
This code should give you:
xcoord and ycoord numeric vars,
year (numeric var running 1901 through 2000)
month1, month2, month3, ...month12 : numeric variables containing the data
values
If it doesn't work as expected, get rid of the last line (that drops the
working variables) and take a look at the data where i doesn't seem to work.
You may have problems with the destring option on the split command if there
are non-numeric characters.
Good luck,
Michael Blasnik
[email protected]
----- Original Message -----
From: "Rio, Martin" <[email protected]>
To: <[email protected]>
Sent: Monday, March 15, 2004 11:35 AM
Subject: st: Reading non-standard data
> I need to read a climate dataset into Stata. My problem is that the
dataset
> in not 1-observation per row, but the information in each row depends on a
> previous row. The dataset contains monthly measurements for each cell on a
> grid from 1901-2000. The grid is big, but for the purpose of illustration
> suppose it is 10x10 (10 cells wide by 10 cells tall).
>
> Each cell of the grid is identified by a header observation that states
its
> x, y coordinated and is followed by rows that contain the data
corresponding
> to that grid, which are 100 rows (years 1901-2000) and 12 columns (months
> Jan-Dec). The format is fixed in the sense that there is always a header
row
> followed by 100 rows with 12 columns sepparated by a space (missing values
> are coded -999, so that there are always values for columns)
>
> This is an example of how the dataset looks:
>
> Grid-ref= 1, 1
> 40 73 139 97 66 338 523 265 172 57 96 72
> 86 87 125 106 74 233 275 419 89 37 67 72
> ...
> 60 112 58 22 145 273 576 451 82 91 194 61
> Grid-ref= 1, 2
> 32 103 108 68 126 312 565 486 213 54 76 29
> 13 106 42 93 112 344 244 422 88 104 59 73
> ...
> 101 104 57 46 91 165 601 107 163 60 161 59
> [and so on, until]
> Grid-ref= 10, 10
> 13 102 19 98 58 309 247 303 181 48 68 81
> 123 60 133 107 107 208 321 502 48 107 80 103
> ...
> 101 89 187 74 201 515 509 490 102 86 100 52
>
> I would like to read these data into a Stata dataset with columns:
> x-coordinate, y-coordinate, year, jan, feb, ..., dec.
>
> I've been trying for a while, but I am really at a loss and would
appreciate
> any help. If someone wants to help me and finds this email confusing,
please
> e-mail me for clarification.
>
> Thanks in advance,
>
> Mart�n Rio
> Charles River Associates
> Washington, DC
> phone: (202) 662-3957
> fax: (202) 662-3910
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/