Tweaking Rodrigo's program, which improves a lot
on the original:
1. This will fail if data are not sorted as desired.
2. Using -egen- is over the top here. You can and
should use _N directly.
3. As you just want the maximum, use -summarize,
meanonly-.
4. This will fail if the data are not set up
as panel data.
Now find my bugs...
program balance, sort
version 8.2
syntax [if] [in]
marksample touse
tempvar count
qui {
local ivar: char _dta[iis]
if "`ivar'" == "" {
di as err "no identifier set"
exit 498
}
bysort `touse' `ivar': gen long `count' = _N
sum `count' if `touse', meanonly
keep if `count' == r(max) & `touse'
}
end
(Also, contrary to this, -syntax id- is just illegal
-syntax- syntax.)
Nick
[email protected]
Rodrigo A. Alfaro
> If you are planning to run this program only in one dataset
> (that uses id as
> cross-sectional identifier) you dont need to put it in the
> syntax. When you
> put id in the syntax you are creating a temporary variable
> called id that
> will be use further, for that reason you have to invoke it using `id'
> instead of id.
>
> You are not using [if] and [in], putting these in the syntax
> line just
> allows to use them. This means that the users can conditional
> the result but
> your program still needs to apply the restrictions... read
> marksample to get
> a formal uses of [if] and [in].
>
> You have to define a temporary variable count in order to
> prevent that this
> variable already exists in the dataset. Again, you have to
> invoke it using
> `count' instead of count. After keep you will lose
> information, be careful
> with that.
>
> Finally, I don't understand what did you mean with "compile".
> You just load
> the program with -do balance- or even more, you can save this
> program with
> the extension .ado into your personal folder and this will be
> your own
> command.
>
>
> Rodrigo.
> PS: This is my version of your program:
>
> program balance
> version 8.2
> syntax [if] [in]
> tempvar count
> qui {
> local ivar: char _dta[iis]
> by `ivar': egen `count'=count(`ivar') `if' `in'
> sum `count'
> local max=r(max)
> keep if `count'==`max'
> }
> end
>
> I dropped rclass, because I don't need to save any value
> after reducing the
> panel. Also, I deleted id of the syntax because you can use
> char _dta[iis]
> that tells you which cross-sectional variable was defined
> using -tsset-.
> Note that local variables as well [if] and [in] are invoked using `'.
Dirk Nachbar
> I am trying to write my first Stata program and was wondering
> if someone
> could go through it and tell me if it's correct, how I should
> refer to id
> and what rclass means (just copied that).
> Another thing, I compiled it once and then wanted to
> recompile it. How do I
> do that?
>
> /*
> program to balance an unbalanced panel, keep only those
> individuals with the
> max duration
> */
> program balance, rclass
> version 8.2
> syntax id [if] [in]
> qui {
> sort id
> by id: egen count=count(id)
> sum count
> local max=r(max)
> keep if count==max
> }
> end
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/