| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: 'sophisticated' subscripting
Here is another way to do it. In some ways, it is worse
technique, but in other ways it shows some of the power of Stata.
bysort city : egen pop1700 = total(cond(year == 1700, pop, .))
What is going on here?
1. bysort city :
Stata must work within panels defined by -city-. We have to -sort- if we
have not already sorted by -city-. With panel data, we have probably
done that already, say as a side-effect of -tsset-, but it does no harm
to specify the -sort-. You can say
bys city :
or
by city, sort:
or
sort city
by city:
I like what I wrote first, but it's a matter of taste only.
2. cond(year == 1700, pop, .)
If the year is 1700, I want to use the value of -pop-; otherwise, forget
it.
3. egen pop1700 = total( )
-egen- should add up the results of the expression I just used -- within
panel, as explained in #1. I am assuming that 1700 occurs at most once
within each panel. If there is no observation for 1700 in a panel. the
result is missing, as it should be. If thre is an observation for 1700,
then only the value for that will be used in the total, as it should it
be. Missings will be ignored in the total for a panel, unless, as just
mentioned, all values are missing, in which case the result will have to
be missing.
Note that this is _not_ equivalent to
bysort city : egen pop1700 = total(pop) if year == 1700
as that leaves missings almost everywhere, and is absolutely
no gain over
gen pop1700 = pop if year == 1700
In fact, it is much less efficient.
In various versions before Stata 9, -egen, total()- was called
-egen, sum()-.
Nick
[email protected]
Nick Cox
Precisely this problem was discussed just a few days ago. See
this post from 18 May:
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0705/Author/article-645.html
Here it is again:
gen pop1700 = pop if year == 1700
bysort city (pop1700) : replace pop1700 = pop1700[1]
Davide Cantoni
> I have an (unbalanced) panel of cities, with their respective
> populations over a series of years. Now I want to generate a new
> variable that gives me, for each city, the population of that city in
> 1700. Due to the unbalanced nature of the panel, it is NOT the case
> that 1700 is the first (or nth) observation within group, so that
> subscripting within groups is not going to help me in this case. Any
> suggestions?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/