Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: unique value count in several variables


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: unique value count in several variables
Date   Sun, 19 Jun 2005 22:54:25 +0100

Scott's program does not claim to subdivide by 
your key and year and it does not do so. 

What you call "Nick's original program" appears
to be my first code as modified by you. It was based 
on the idea that -nvals- did not exist beforehand, 
and indeed the purpose of the code is to create 
-nvals-. In your case, you appear to have used it after 
creating -nvals- in some other way. That won't 
work. At a minimum, you need to drop -nvals- first. 
It is possible also that complications you didn't 
tell us about have not been taken into account in 
modifying the code, as you are here using variable
names not previously explained. 

Naturally, people often simplify their problem 
for Statalist to show the essence of it. That's 
great for the people who answer the questions. 
However, the original posters then need to add back the 
complications in exactly the right way. 

Otherwise put, there is nothing in this report 
that looks to me like a bug in Scott's code or mine
given the original example you specified. 

You are right that the second approach will be slower 
than the first. There's a lot of looping and testing -if-. 

Nick 
[email protected] 

Wanli Zhao
 
> I feel I need to report on my running for people interested. 
> I have a large
> panel, about 1600 cross-section and 11 years. Scott's program 
> generates
> nvals variable with a single value 1005 ( I do not know what 
> it means) for
> all the gvkey-year. Nick's modification seems to work. The 
> problem is the
> time is unacceptable. I broke the program and the values seem 
> correct for
> finished part.
> Nick's original "reshape" program also gave me an error 
> message as follows:
> [reshape error
> (note: j = ssic1 ssic2)
> i (gvkey year sid) indicates the top-level grouping such as 
> subject id.
> j (_j) indicates the subgrouping such as time.
> xij variable is K.
> Thus, the following variable(s) should be constant within i:
>       nvals
> nvals not constant within i (gvkey year sid) for 28662 values of i:]
> 
> I guess the problem is that my ssic1 and ssic2 have many 
> missing values.
> Thanks.
> 
> Wanli Zhao
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Sunday, June 19, 2005 8:06 AM
> To: [email protected]
> Subject: st: RE: RE: RE: RE: unique value count in several variables
> 
> Please remove the "gen" from the last line of the loop. 
> 
> Nick
> [email protected] 
> 
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]]On Behalf Of Nick Cox
> > Sent: 19 June 2005 12:37
> > To: [email protected]
> > Subject: st: RE: RE: RE: unique value count in several variables
> > 
> > 
> > I too am fond of -levelsof-. For the problem mentioned, this would 
> > need to be embedded in a loop over groups, somewhat as follows:
> > 
> > gen nvals = . 
> > egen group = group(Gvkey year)
> > su group, meanonly
> > qui forval i = 1/`r(max)' { 
> > 	levelsof psic if group == `i', local(p) 
> > 	levelsof ssic if group == `i', local(s)
> > 	local total: list s | p
> > 	local total:list uniq total
> > 	local count:list sizeof total
> > 	replace gen nvals = `count' if group == `i' 
> > }
> > 
> > Nick
> > [email protected]
> > 
> > > -----Original Message-----
> > > From: [email protected]
> > > [mailto:[email protected]]On Behalf Of Scott 
> > > Merryman
> > > Sent: 19 June 2005 12:30
> > > To: [email protected]
> > > Subject: st: RE: RE: unique value count in several variables
> > > 
> > > 
> > > In addition to Nick's suggestion of using -reshape-, another 
> > > possibility is to use -levelsof- and the macro extended functions 
> > > (assuming your cross sections are not too large):
> > > 
> > > 
> > > . l, noobs
> > > 
> > >   +------------------------------------+
> > >   | gvkey   psic   ssic   year   subno |
> > >   |------------------------------------|
> > >   |  1223   4767   4743   1999       1 |
> > >   |  1223   4767   4763   1999       2 |
> > >   |  1223   4757   4767   1999       3 |
> > >   |  1223   4767   4753   1999       4 |
> > >   |  1223   4777   4787   1999       5 |
> > >   |------------------------------------|
> > >   |  1223   4767   4743   1999       6 |
> > >   +------------------------------------+
> > > 
> > > . levelsof psic, local(p)
> > > 4757 4767 4777
> > > 
> > > . levelsof ssic, local(s)
> > > 4743 4753 4763 4767 4787
> > > 
> > > . local total: list s | p
> > > 
> > > . local total:list uniq total
> > > 
> > > . local count:list sizeof total
> > > 
> > > . gen nvals = `count'
> > > 
> > > . l, noobs
> > > 
> > >   +--------------------------------------------+
> > >   | gvkey   psic   ssic   year   subno   nvals |
> > >   |--------------------------------------------|
> > >   |  1223   4767   4743   1999       1       7 |
> > >   |  1223   4767   4763   1999       2       7 |
> > >   |  1223   4757   4767   1999       3       7 |
> > >   |  1223   4767   4753   1999       4       7 |
> > >   |  1223   4777   4787   1999       5       7 |
> > >   |--------------------------------------------|
> > >   |  1223   4767   4743   1999       6       7 |
> > >   +--------------------------------------------+
> > > 
> > > 
> > > Scott
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:owner- 
> > > > [email protected]] On Behalf Of Wanli Zhao
> > > > Sent: Saturday, June 18, 2005 3:17 PM
> > > > To: [email protected]
> > > > Subject: st: RE: unique value count in several variables
> > > > 
> > > > Thanks, Nick. I looked into the suggestions and I think I
> > might have
> > > > confused you on my problem. My panel data is like this:
> > > > Gvkey  psic  ssic  year  subno
> > > > 1223   4767  4743  1999  1
> > > > 1223   4767  4763  1999  2
> > > > 1223   4757  4767  1999  3
> > > > 1223   4767  4753  1999  4
> > > > 1223   4777  4787  1999  5
> > > > 1223   4767  4743  1999  6
> > > > 
> > > > Using command unique, I can count the distinct values of
> > > psic and ssic by
> > > > gvkey by year. So for psic it's 3 and for ssic it's 5. what
> > > I want is to
> > > > count the distinct values of both psic and ssic by gvkey by
> > > year. In this
> > > > case, it's 7 (4767, 4757, 4777, 4743, 4763, 4753, 4787). 
> > > How to generate a
> > > > new variable for my purpose? Hope I'm clear now. Pls help.
> > > > 
> > > > Thanks.
> > > > Wanli Zhao
> > > > 
> > > 
> > > 
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > > 
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> > 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index