Guillermo Cruces
[ ... ]
> In my example, I have a household survey where I don't have
> direct information
> about the number of kids of each individual, but I have
> something like this:
> hhid and member are just the household id and number of
> member. Variables
> fatherm and motherm tell you the number of the member of
> the father and the
> mother, if in the household:
[ ... ]
> I want to create the variable ownkids that gives me the
> number of own kids
> living in the house:
[ ... ]
I replied to Guillermo's posting with a proffered
solution, but I didn't answer one of his questions.
> My force brute solution, which makes a lot of unnecessary
> comparisons and takes
> very long (because I generate and drop many variables) is
> of the form: with
> maxmem being the number of members of each household (group
> i, max is the number
> of groups),
> forvalues i = 1/`max' {
> qui sum member if group==`i'
> local maxmem=r(max) forvalues j = 1/`maxmem' {
> di "-----------Household number `i', number of
> members: `maxmem'"
> forvalues k = 1/`maxmem' {
> di "Household `i', member `j', comparing with `k'"
> qui gen a=motherm==`j' if member==`k'&group==`i'
> qui egen b=max(a)
> qui replace mkids=mkids+b if member==`j'&group==`i'
> drop a b
> qui gen a=fatherm==`j' if member==`k'&group==`i'
> qui egen b=max(a)
> qui replace fkids=fkids+b if member==`j'&group==`i'
> drop a b
> }
> }
> }
>
> This creates two variables, mkids and fkids, which are the
> number of kids for
> mothers and fathers. For each member of the household, I
> compare if . The egen,
> replace, drop, takes very long, and even longer if the
> dataset in memory is
> large (I had to partition the dataset in 25 parts to make
> this run faster).
> The main problem (the main awkwardness in this program) is
> that I gen, egen,
> etc. because I could not just create a scalar that reflects
> the value of a
> variable for one precise observation, something of the form
> (which of course
> doesn't work):
> local a=mother==`j' if member==`k'&group==`i' (meaning:
> mother etc. should
> refer to the observation: member==`k'&group==`i')
> I coudn't use something like motherm[_...] becauseI was not
> using by: ... .
> What I would like to know if there are more efficient ways
> of doing this (I'm
> sure there are!).
As indicated separately, this code is a triple
loop which can be reduced to at most one loop.
For the details, see my earlier posting.
But the steps
. egen b = max(a)
...
. drop b
could have been cut in a way that is of much wider
interest and applicability.
Guillermo wants just one number, the maximum. A good way to
get it is, in general,
. summarize a, meanonly
followed by
. scalar b = r(max)
or
. local b = r(max)
or just by using r(max) or `r(max)' directly after
the -summarize-
. qui replace fkids=fkids + r(max) if member==`j'&group==`i'
If you try this out for yourself, say with the auto
data
. su mpg, meanonly
you will see nothing! The point, however, is what -summarize-
leaves in its own wake. Type
. ret li
and you will see results which can be picked up
for subsequent use. Note in particular that
. su mpg, meanonly
is faster than
. su mpg
because the second also calculates the sd and the
variance. If you don't need either, you should
use the speedier command.
A separate point is that -egen- is an ado which
calls another ado, and so there is an overhead
for Stata which is obliged to interpret a few
dozen command lines. Done once, that is less
than a blink, but done repeatedly, it doesn't
help any process which is already too slow.
Some of these points were mentioned
in the recently posted -stylerules-
package on SSC:
Use -summarize, meanonly- for speed when its returned results are
sufficient.
Avoid -egen- within programs: it is usually slower than a direct
attack.
Never use a variable to hold a constant: a macro or a scalar is all
that
is needed.
Nick
[email protected]
P.S. On the last rule, I just found an exception. For
a graphical purpose, I need a variable which is a constant.
The variable defines a horizontal line, on which I show
the information from another variable, something like this
. gen bar = 0
. gra foo bar bazz, sy(o[anothervariable])
That's the trouble with style "rules": style is
a subject on which there are exceptions to every
rule you can think of, even this one.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/