Nick,
Sorry I did not describe the data. The two vars are part of a huge
dataset that has more than 100,000 observations. What I really want to do
is to use the percentages as weights to adjust for regression
coefficients. That is, I ran a regression on logincome with about 70
independent vars, 52 of which are dummies for industry. I save the
coefficients for these dummies as b1-b52 and then obtain the percentage
for each industry as p1-p52. The final product I want is the standard
deviation of the industry effects calculated by:
let i=1/52
egen mubar=sum(b`i' * p`i')
egen variance=sum(p`i' * ((b`i'- mubar)^2) )
gen sd=sqrt(variance)
I can get p`i' by counting the N for the whole sample and then counting
N`i' for each industry so that p`i'=N`i'/N. But this takes a lot of time
becuase I need to generate 52 dummy variables. I am wondering if there is
a faster way of doing this. Thanks very much.
Best,
Zun
On Tue, 3 Dec 2002, Nick Cox wrote:
> Zun
> >
> > I have two vars ind (52 categories) and occ (7 categories),
> > and I want
> > the percentage distribution of ind for each category of
> > occ. Note that
> > not each ind category has cases. For instance:
> >
> > Occ=1
> > ind pct
> > 1 .0309522
> > 2 .0334331
> > 3 0
> > 4 .0356777
> > 5 .3402772
> > 6 .0294558
> > . .
> > . .
> > 52 .3151532
> >
> > Occ=2
> > ind pct
> > 1 .0036623
> > 2 .0006301
> > 3 0
> > 4 .0064976
> > 5 0
> > 6 .0455619
> > . .
> > . .
> > 52 .0953769
> >
> > As shown above, ind=3 is not in both occ=1 and occ=2 while
> > ind=5 is in
> > occ=1 but not in occ=2.
> >
> > My questions are:
> >
> > First, if I use tabulate to get the percentage distribution of any
> > categorical variable, how can I save the percentages in a
> > new dataset
> > that looks like one of the tables above.
> >
> > Second, in the specific example above, is there a way I can
> > create a new
> > dataset that looks like this:
> >
> > ind pctocc1 pctocc2
> > 1 .0309522 .0036623
> > 2 .0334331 .0006301
> > 3 0 0
> > 4 .0356777 .0064976
> > 5 .3402772 0
> > 6 .0294558 .0455619
> > . . .
> > . . .
> > 52 .3151532 .0953769
> >
>
> I guess that you have at most 52 * 7 observations.
> Forget -tabulate-: a direct calculation is better.
>
> Typing
>
> . findit percent
>
> does point to lots of things; but one pertinent is -egen-.
>
> . bysort occ : egen pctocc = pc(ind)
>
> followed by a -reshape- may help. You may need
> to -replace- any missings by 0.
>
>
> Nick
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/