Immense thanks to Ernest, Nick, and Bill who took their time to answer my
query. In my post I forgot to mention that there are other types of flags,
"T" for trace ammounts, "M" for missing. Also each row is identified by a
weather station id and the type of meteorological element it represents
(rain, snow, temperature, wind, etc). And the accumulation flags apply
only to rain and snow.
Pondering over it last night I came up with a solution of my own, albeit a
bit forced. I'm pasting it below. Let me know, time permitting of course,
if you think it is correct. I tested it and it works. I tend to use it
because I understand it better.
*generating accumulation begining and end numeric flags;
forval i=1/31 {;
gen accb`i' = 1 if flag`i' == "S";
gen acce`i' = 1 if flag`i' == "A";
};
*ordering so that i can use the varlist abbreviation safely;
order accb* acce*;
*identifying all accumulations of type a_i_j, starting on day i ending on
day j;
forval j = 2/31 {;
local k = `j' - 1;
forval i = 1/`k' {;
egen test = rmiss(accb`i'-accb`k');
gen a_`i'_`j' = test == 0 & acce`j' == 1;
qui sum a_`i'_`j';
if r(max) == 0 {drop a_`i'_`j'};
drop test;
};
};
*have to drop accumulations that are included in each other (i.e. 2-4
included in 1-4);
forval k = 30(-1)1 {;
local k2 = `k' - 2;
local k1 = `k' - 1;
forval j = 1/`k2' {;
local j1 = `j' + 1;
forval i = `j1'/`k1' {;
noisily cap replace a_`i'_`k' = 0 if a_`j'_`k' == 1;
};
};
};
*dropping non-existing accumulations;
foreach var of varlist a_* {;
qui sum `var';
if r(max) == 0 {drop `var'};
};
*replacing with average value;
unab listacc: a_*;
foreach acc of local listacc {;
local sep1 = index("`acc'","_");
local rem = substr("`acc'",`sep1'+1,.);
local sep2 = index("`rem'","_");
local i = substr("`rem'",1,`sep2'-1);
local j = substr("`rem'",`sep2'+1,.);
egen dayavg = rmean(day`i'-day`j') if `acc' == 1;
foreach var of varlist day`i'-day`j' {;
replace `var' = dayavg if `acc' == 1;
};
drop dayavg;
};
On Thu, 17 Jul 2003, William Gould, Stata wrote:
> Radu Ban <[email protected]> writes,
>
> > This is a data management question. The data that I'm looking at (daily U.S.
> > weather) has the following structure.
> >
> > day1 flag1 day2 flag2 day3 flag3 day4 flag4 day5 flag5 ... day31 flag31
> > 0 s a2 a 0 s 0 s a5 a a31
> > 0 s 0 s b3 a b4 b5 b31
> > c1 0 s 0 s 0 s c5 a c31
> >
> > the "s" flag means that the measured element (say inches of rain) is
> > accumulated over those days, which are assigned a 0 value, and the
> > accumulated amount is reported in the day flagged with "a". i would like to
> > replace the 0 value for the accumulation days with the average of the
> > accumulated value over those days.
> >
> > given the notations above, specifically, i would like to replace 0, 0, b3
> > (in the second row) with b3/3; 0, 0, 0, c5 (in the third row) with c5/4, and
> > so on. note that, as in the first row there can be more than one
> > accumulation series per row.
>
> I do now know what the solution to this problem is yet -- I will -- but I
> do know these kinds of problems are easier viewed the long way:
>
> orig_obs i day flag
> 1 1 0 s
> 1 2 a2 a
> 1 3 0 s
> 1 4 0 s
> 1 5 a5 a
> ...
> 1 31 a31
> 2 1 0 s
> 2 2 0 s
> 2 3 b3 a
> 2 4 0 s
> 2 5 c5 a
> ...
> 2 31 c31
> etc.
>
> -reshape- can make the data look like that -- we'll worry about the details
> later.
>
> I am unsure from Radu's description whether variable flag ever contains
> anything on that "a" and "s". Radu sort of implies " " is also possible,
> in which case the measurement would be for that day. If so, I could
> view that as an "a" observation: One accumulates the single day and
> divides by one. So just in case there are any blanks,
>
> . replace flag = "a" if flag=="s"
>
> and then, just to verify that these data are as they have been explained to
> be, let's verify that flag is now always "s" or "a":
>
> . assert flag=="s" | flag=="a"
>
> Now the problem is getting easier:
>
> 1. We start accumulation at the first observation for each orig_obs
> group.
>
> 2. We continue accumulation up until we see an "a".
>
> 3. Then, we start accumulation again.
>
> So let's add a new variable begin that will record 1 every time an accumulation
> begins:
>
> . by orig_obs: gen begin = cond(_n==1 | flag[_n-1]=="a", 1, 0)
>
> Now our dataset looks like,
>
> orig_obs i day flag begin
> 1 1 0 s 1
> 1 2 a2 a 0
> 1 3 0 s 1
> 1 4 0 s 0
> 1 5 a5 a 0
> ...
> 1 31 a31 a <could be 0 or 1>
> 2 1 0 s 1
> 2 2 0 s 0
> 2 3 b3 a 0
> 2 4 0 s 1
> 2 5 c5 a 0
> ...
> 2 31 c31 a <could be 0 or 1>
>
> Understand what I did: I merely created a variable equal to 1 marking
> the beginning of each group in which we need to distribute sum. If I now
> sum variable begin, I will have group numbers:
>
> . gen group = sum(begin)
>
> The data set now looks like:
>
> orig_obs i day flag begin group
> 1 1 0 s 1 1
> 1 2 a2 a 0 1
> 1 3 0 s 1 2
> 1 4 0 s 0 2
> 1 5 a5 a 0 2
> ...
> 1 31 a31 a ? 10 <- i just made up 10
> 2 1 0 s 1 11
> 2 2 0 s 0 11
> 2 3 b3 a 0 11
> 2 4 0 s 1 12
> 2 5 c5 a 0 12
> ...
> 2 31 c31 a ? ?
> etc.
>
> Now the problem is easy: within group, replace all the observations
> with the value of the last observation in the group, divided by the
> number of observations in the group:
>
> . sort group
> . by group: gen newday = day[_N]/_N
>
> I have my solution. Actually, I could have typed
>
> . by group: replace day = day[N]/_N
>
> but I want to -list- the result and make sure it looks right to me. Then
> I can replace day:
>
> . replace day = newday
> . drop newday
>
> Now all I need to do is switch the data back to being in the wide format.
>
> So here is the complete solution:
>
> . gen orig_obs = _n
> . reshape long day flag, i(i)
>
> . replace flag = "a" if flag=="s"
> . assert flag=="s" | flag=="a"
>
> . sort orig_obs i
> . by orig_obs: gen begin = cond(_n==1 | flag[_n-1]=="a", 1, 0)
> . gen group = sum(begin)
>
> . sort group
> . by group: gen newday = day[_N]/_N
> . list <-- look to make sure right
> . replace day = newday
> . drop newday group
>
> . reshape wide
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/