Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Ivan Png <iplpng@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Identifying first observation in each panel of unbalanced panel |
Date | Mon, 4 Jun 2012 13:44:35 -0400 |
Many thanks, Nick. Incidentally, thanks for the yeoman service to all STATAlisters. The discrepancy I found was by using xtreg to run a fixed-effects regression on the sample. xtreg reported 2773 companies. Yet, when I used the code below on the regression sample, I got only 1048 companies. So, the only reason I could think of was that the flag identified only companies that were present in year 1. On 4 June 2012 13:21, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Your code looks fine to me, so I have difficulty understanding why you think it doesn't work. > > The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order. > > You can check logic in terms of this example: > > . webuse grunfeld > > . su year > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > year | 200 1944.5 5.780751 1935 1954 > > . drop if year == 1935 & mod(company, 2) > (5 observations deleted) > > . tab year > > year | Freq. Percent Cum. > ------------+----------------------------------- > 1935 | 5 2.56 2.56 > 1936 | 10 5.13 7.69 > 1937 | 10 5.13 12.82 > 1938 | 10 5.13 17.95 > 1939 | 10 5.13 23.08 > 1940 | 10 5.13 28.21 > 1941 | 10 5.13 33.33 > 1942 | 10 5.13 38.46 > 1943 | 10 5.13 43.59 > 1944 | 10 5.13 48.72 > 1945 | 10 5.13 53.85 > 1946 | 10 5.13 58.97 > 1947 | 10 5.13 64.10 > 1948 | 10 5.13 69.23 > 1949 | 10 5.13 74.36 > 1950 | 10 5.13 79.49 > 1951 | 10 5.13 84.62 > 1952 | 10 5.13 89.74 > 1953 | 10 5.13 94.87 > 1954 | 10 5.13 100.00 > ------------+----------------------------------- > Total | 195 100.00 > > . bysort company (year) : gen first = _n == 1 > > . l company year if first > > +----------------+ > | company year | > |----------------| > 1. | 1 1936 | > 20. | 2 1935 | > 40. | 3 1936 | > 59. | 4 1935 | > 79. | 5 1936 | > |----------------| > 98. | 6 1935 | > 118. | 7 1936 | > 137. | 8 1935 | > 157. | 9 1936 | > 176. | 10 1935 | > +----------------+ > > Nick > n.j.cox@durham.ac.uk > > Ivan Png > > I am analyzing an unbalanced panel of company data, organized by > company (gvkey) and year. I want to create a flag to the first > observation of each company in the panel. I tried > > . sort gvkey year > . by gvkey , sort: gen flag = 1 if _n == 1 > > However, this only flagged flag = 1 if a company was present in year 1 > of the panel. It missed any company that appeared in later years. > > I searched statalist and found this: > http://www.stata.com/statalist/archive/2005-04/msg00334.html > > But it doesn't work. I'd be grateful for any relevant help. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/