Radu Ban
> i have a dataset in which a variable has to be the same within
groups
> created by other variables. the variable is a 0-1 binary. if, within
a
> group there's at least one 0 zero value i want to set all the values
to 0.
> the groups are generated by the "serial" and "month" variables.
>
> i used the following command:
>
> bysort serial month: replace var = 0 if (var ~= var[_n-1] & _n>1) |
> (var ~= var[_n+1] & _n==1)
>
> but this appears to be causing some randomness because the end
results of
> the code are different with each re-run. and i don't see other
commands
> that might cause randomness.
An equivalent is to generate from your binary variable
its minimum within groups, which is canned as an -egen- function
egen min = min(var), by(serial month)
replace var = min
A way to do it from first principles is in fact shorter
and avoids the device of another variable
bysort serial month (var) : replace var = var[1]
On the other hand, there is some information loss in your overwriting
the original variable.
Any way, there is a lot going on here: let's break it into steps
1. Sort on -serial-, within that order by -month-, and within that
by -var-.
2. Within the categories defined by -serial- and -month-
-replace var- by its first value, -var[1]-. Note the principle
that the subscript, here [1], is interpreted within categories, that
is, within the groups defined by -serial- and -month-. After
sorting the minimum value of -var- is held within the first
observation 1.
3. If there are ties for minimum, you still get the right answer.
4. Missing values won't mess this up as they are sorted to
the high end of each group. However, this implies that getting
the maximum in this way would require more care.
The difference between this approach and yours is that
you don't sort on -var- within the categories defined defined
by -serial- and -month-.
Stata in this context, as in others, as quite literal. Given
your instruction
bysort month serial: ...
it is satisfied with any solution satisfying that instruction.
It pays no attention whatsovever to the order of -var- within
the categories. In addition, as you observe, there is even
some unpredictability about their order. So it is essential
that you arrange the exact -sort- order you want.
There was a tutorial on -by:- in Stata Journal 2(1), 2002.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/