That looks indeed like a rather complex problem, for which i don't have a
golden key. The problem with your present setup is that for instance 'egen
rmean(day1-day2) will not work because flag1 is in that list as well.
My approach would be to first reshape the data to long form (which is more
comfortable to most datamanipulation issues), so that you have the
variables day (which should better be renamed 'temperature' or something),
flag, an indicator for day 1 to 31, and an indicator for a/b/c.
From there it should be possible to construct some group-variable that
indicates each group of records where temperature should be filled in and
their accumulated value from which it should be calculated. In your example
on the second row you have a group where day1, day2 and day3 belong
together, day4 is a group and day5 is a group. Let your grouping variable
assign a value of 1 to the first 3 records (day1, day2, day3), a value of 2
to the record of day4, etc.
From there you could work somethinh out using the 'by groupvar:' construct
and explicit subscripting, as in "by groupvar: replace
temperature=temperature[_N] if temperature<." You might want to convert
accumulated temperatures to averages before this.
There are probably a lot of other ways to tackle the problem, but i think
the reshape-to-long is the most important thing here.
At 06:07 17-7-2003, Radu Ban wrote:
Dear listers,
This is a data management question. The data that I'm looking at (daily
U.S. weather) has the following structure.
day1 flag1 day2 flag2 day3 flag3 day4 flag4
day5 flag5 ... day31 flag31
0 s a2 a 0 s 0 s
a5 a a31
0 s 0 s b3 a b4
b5 b31
c1 0 s 0 s 0 s
c5 a c31
the "s" flag means that the measured element (say inches of rain) is
accumulated over those days, which are assigned a 0 value, and the
accumulated amount is reported in the day flagged with "a". i would like
to replace the 0 value for the accumulation days with the average of the
accumulated value over those days.
given the notations above, specifically, i would like to replace 0, 0, b3
(in the second row) with b3/3; 0, 0, 0, c5 (in the third row) with c5/4,
and so on. note that, as in the first row there can be more than one
accumulation series per row.
i figured out that each type of accumulation, a_ij(starting at day i
ending at day j) must be identified, so that in the end i can use:
forval j = 2/31 {
forval i = 1/`j' {
egen daymean = rmean(day`i'-day`j') if a_`i'`j' == 1
replace day`i' = daymean
drop daymean
}
}
but i'm not sure how to define all a_ij
Ernest Berkhout
SEO Amsterdam Economics
University of Amsterdam
Room 3.08
Roetersstraat 29
1018 WB Amsterdam
The Netherlands
tel.:+ 31 20 525 1657
fax:+ 31 20 525 1686
http://www.seo.nl
===========================
A statistician: someone who insists
on being certain about uncertainty
===========================
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/