Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifying first observation in each panel after regression
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Identifying first observation in each panel after regression
Date
Tue, 5 Jun 2012 00:00:49 +0100
It should make absolutely no difference whether you do this before or
after a regression. I think we need to see evidence of what you think
is happening in terms of a dataset you provide in its entirety or
using a dataset downloadable by all. Otherwise I'd advise taking up
your puzzlement with Stata tech-support. They would want a copy of
your dataset.
On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <[email protected]> wrote:
> What I don't understand: Why the
>
> . by gvkey , sort : gen flag = 1 if _n ==1
>
> works when I invoke it before the regression (it then picks up the
> first observation of each company), but not when I invoke it after the
> regression (it misses many companies).
>
> I used exactly the same command in both cases.
>
>
> On 4 June 2012 18:31, Nick Cox <[email protected]> wrote:
>> Which bit don't you understand?
>>
>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <[email protected]> wrote:
>>> Dear Nick--
>>>
>>> Many thanks for your hint. I found the solution. I execute
>>> . by gvkey , sort: gen flag = 1 if _n == 1
>>> before the regression.
>>>
>>> Then, after the regression, I execute
>>> . gen regsample == 1 if e(sample)
>>>
>>> And, to identify the first observation of each company in the
>>> regression sample, I use
>>> regsample == 1 & flag == 1
>>>
>>> However, I still don't understand the reason it works.
>>>
>>>
>>> On 4 June 2012 14:24, Nick Cox <[email protected]> wrote:
>>>> What code do you mean by "the code below"?
>>>>
>>>> I suspect there's something else up with your dataset that leads to
>>>> what you see. Examine the data omitted by
>>>>
>>>> . edit if !e(sample)
>>>>
>>>> after your -xtreg- command.
>>>>
>>>> Nick
>>>>
>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <[email protected]> wrote:
>>>>> Many thanks, Nick. Incidentally, thanks for the yeoman service to all
>>>>> STATAlisters.
>>>>>
>>>>> The discrepancy I found was by using xtreg to run a fixed-effects
>>>>> regression on the sample. xtreg reported 2773 companies. Yet, when I
>>>>> used the code below on the regression sample, I got only 1048
>>>>> companies. So, the only reason I could think of was that the flag
>>>>> identified only companies that were present in year 1.
>>>>
>>>> On 4 June 2012 13:21, Nick Cox <[email protected]> wrote:
>>>>
>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work.
>>>>>>
>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order.
>>>>>>
>>>>>> You can check logic in terms of this example:
>>>>>>
>>>>>> . webuse grunfeld
>>>>>>
>>>>>> . su year
>>>>>>
>>>>>> Variable | Obs Mean Std. Dev. Min Max
>>>>>> -------------+--------------------------------------------------------
>>>>>> year | 200 1944.5 5.780751 1935 1954
>>>>>>
>>>>>> . drop if year == 1935 & mod(company, 2)
>>>>>> (5 observations deleted)
>>>>>>
>>>>>> . tab year
>>>>>>
>>>>>> year | Freq. Percent Cum.
>>>>>> ------------+-----------------------------------
>>>>>> 1935 | 5 2.56 2.56
>>>>>> 1936 | 10 5.13 7.69
>>>>>> 1937 | 10 5.13 12.82
>>>>>> 1938 | 10 5.13 17.95
>>>>>> 1939 | 10 5.13 23.08
>>>>>> 1940 | 10 5.13 28.21
>>>>>> 1941 | 10 5.13 33.33
>>>>>> 1942 | 10 5.13 38.46
>>>>>> 1943 | 10 5.13 43.59
>>>>>> 1944 | 10 5.13 48.72
>>>>>> 1945 | 10 5.13 53.85
>>>>>> 1946 | 10 5.13 58.97
>>>>>> 1947 | 10 5.13 64.10
>>>>>> 1948 | 10 5.13 69.23
>>>>>> 1949 | 10 5.13 74.36
>>>>>> 1950 | 10 5.13 79.49
>>>>>> 1951 | 10 5.13 84.62
>>>>>> 1952 | 10 5.13 89.74
>>>>>> 1953 | 10 5.13 94.87
>>>>>> 1954 | 10 5.13 100.00
>>>>>> ------------+-----------------------------------
>>>>>> Total | 195 100.00
>>>>>>
>>>>>> . bysort company (year) : gen first = _n == 1
>>>>>>
>>>>>> . l company year if first
>>>>>>
>>>>>> +----------------+
>>>>>> | company year |
>>>>>> |----------------|
>>>>>> 1. | 1 1936 |
>>>>>> 20. | 2 1935 |
>>>>>> 40. | 3 1936 |
>>>>>> 59. | 4 1935 |
>>>>>> 79. | 5 1936 |
>>>>>> |----------------|
>>>>>> 98. | 6 1935 |
>>>>>> 118. | 7 1936 |
>>>>>> 137. | 8 1935 |
>>>>>> 157. | 9 1936 |
>>>>>> 176. | 10 1935 |
>>>>>> +----------------+
>>>>>>
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>> Ivan Png
>>>>>>
>>>>>> I am analyzing an unbalanced panel of company data, organized by
>>>>>> company (gvkey) and year. I want to create a flag to the first
>>>>>> observation of each company in the panel. I tried
>>>>>>
>>>>>> . sort gvkey year
>>>>>> . by gvkey , sort: gen flag = 1 if _n == 1
>>>>>>
>>>>>> However, this only flagged flag = 1 if a company was present in year 1
>>>>>> of the panel. It missed any company that appeared in later years.
>>>>>>
>>>>>> I searched statalist and found this:
>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html
>>>>>>
>>>>>> But it doesn't work. I'd be grateful for any relevant help.
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Best wishes
> Ivan Png
> Skype: ipng00
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/