Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifying first observation in each panel after regression
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: Identifying first observation in each panel after regression
Date
Tue, 5 Jun 2012 06:30:20 -0400
One likely reason: There are missing values for some of your covariates:
Some of these occur in the first observation (year) a company has and
are excluded from analysis sample even when other observations from
the company are included.
Steve
[email protected]
On Jun 5, 2012, at 5:50 AM, Ivan Png wrote:
Many thanks. Sorry, you are right. I wrote wrongly. What I meant was that,
When I run the regression, it shows 2773 groups (companies). But when I run
. gen rdsample = 1 if e(sample)
. by gvkey , sort : gen flag = 1 if _n == 1
/* flag first observation of each company */
. su year if flag == 1 & rdsample == 1
It indicates 1048 unique companies. I do not understand where are the
other 2773 - 1048 = 1725 companies.
Anyhow, a friend just suggested the following (and it works)
. sort rdsample gvkey year
. by rdsample gvkey , sort: gen flag = 1 if rdsample == 1 & _n == 1
. su year if flag == 1
This shows 2773 companies. I just do not understand why.
On 4 June 2012 22:36, Steve Samuels <[email protected]> wrote:
>
> Correction: the "flag2" statement is run after the regression.
>
>
>
> Your claim of discrepancy is false, and you did not test it in the do file, which runs the "by gvkey:" statement only after -xtreg-.
>
> When I run your do file with:
>
> . by gvkey , sort : gen flag1 = 1 if _n ==1 // before the xtreg statement
>
> . by gvkey , sort : gen flag2 = 1 if _n ==1 // after the xtreg statement
>
> tab flag1 flag2, missing
>
> | flag2
> flag1 | 1 . | Total
> -----------+----------------------+----------
> 1 | 6,982 0 | 6,982
> . | 0 70,797 | 70,797
> -----------+----------------------+----------
> Total | 6,982 70,797 | 77,779
>
>
>
> Steve
> [email protected]
>
> On Jun 4, 2012, at 8:13 PM, Ivan Png wrote:
>
> Thanks, Nick.
>
> Here's the code
> https://docs.google.com/open?id=0Bxt3Gm6VpSgiZmJkRUZUUktJQzA
>
> And here's the data
> https://docs.google.com/open?id=0Bxt3Gm6VpSgiNFhhV3dsang4b3M
>
>
>
> On 4 June 2012 19:00, Nick Cox <[email protected]> wrote:
>> It should make absolutely no difference whether you do this before or
>> after a regression. I think we need to see evidence of what you think
>> is happening in terms of a dataset you provide in its entirety or
>> using a dataset downloadable by all. Otherwise I'd advise taking up
>> your puzzlement with Stata tech-support. They would want a copy of
>> your dataset.
>>
>> On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <[email protected]> wrote:
>>> What I don't understand: Why the
>>>
>>> . by gvkey , sort : gen flag = 1 if _n ==1
>>>
>>> works when I invoke it before the regression (it then picks up the
>>> first observation of each company), but not when I invoke it after the
>>> regression (it misses many companies).
>>>
>>> I used exactly the same command in both cases.
>>>
>>>
>>> On 4 June 2012 18:31, Nick Cox <[email protected]> wrote:
>>>> Which bit don't you understand?
>>>>
>>>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <[email protected]> wrote:
>>>>> Dear Nick--
>>>>>
>>>>> Many thanks for your hint. I found the solution. I execute
>>>>> . by gvkey , sort: gen flag = 1 if _n == 1
>>>>> before the regression.
>>>>>
>>>>> Then, after the regression, I execute
>>>>> . gen regsample == 1 if e(sample)
>>>>>
>>>>> And, to identify the first observation of each company in the
>>>>> regression sample, I use
>>>>> regsample == 1 & flag == 1
>>>>>
>>>>> However, I still don't understand the reason it works.
>>>>>
>>>>>
>>>>> On 4 June 2012 14:24, Nick Cox <[email protected]> wrote:
>>>>>> What code do you mean by "the code below"?
>>>>>>
>>>>>> I suspect there's something else up with your dataset that leads to
>>>>>> what you see. Examine the data omitted by
>>>>>>
>>>>>> . edit if !e(sample)
>>>>>>
>>>>>> after your -xtreg- command.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <[email protected]> wrote:
>>>>>>> Many thanks, Nick. Incidentally, thanks for the yeoman service to all
>>>>>>> STATAlisters.
>>>>>>>
>>>>>>> The discrepancy I found was by using xtreg to run a fixed-effects
>>>>>>> regression on the sample. xtreg reported 2773 companies. Yet, when I
>>>>>>> used the code below on the regression sample, I got only 1048
>>>>>>> companies. So, the only reason I could think of was that the flag
>>>>>>> identified only companies that were present in year 1.
>>>>>>
>>>>>> On 4 June 2012 13:21, Nick Cox <[email protected]> wrote:
>>>>>>
>>>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work.
>>>>>>>>
>>>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order.
>>>>>>>>
>>>>>>>> You can check logic in terms of this example:
>>>>>>>>
>>>>>>>> . webuse grunfeld
>>>>>>>>
>>>>>>>> . su year
>>>>>>>>
>>>>>>>> Variable | Obs Mean Std. Dev. Min Max
>>>>>>>> -------------+--------------------------------------------------------
>>>>>>>> year | 200 1944.5 5.780751 1935 1954
>>>>>>>>
>>>>>>>> . drop if year == 1935 & mod(company, 2)
>>>>>>>> (5 observations deleted)
>>>>>>>>
>>>>>>>> . tab year
>>>>>>>>
>>>>>>>> year | Freq. Percent Cum.
>>>>>>>> ------------+-----------------------------------
>>>>>>>> 1935 | 5 2.56 2.56
>>>>>>>> 1936 | 10 5.13 7.69
>>>>>>>> 1937 | 10 5.13 12.82
>>>>>>>> 1938 | 10 5.13 17.95
>>>>>>>> 1939 | 10 5.13 23.08
>>>>>>>> 1940 | 10 5.13 28.21
>>>>>>>> 1941 | 10 5.13 33.33
>>>>>>>> 1942 | 10 5.13 38.46
>>>>>>>> 1943 | 10 5.13 43.59
>>>>>>>> 1944 | 10 5.13 48.72
>>>>>>>> 1945 | 10 5.13 53.85
>>>>>>>> 1946 | 10 5.13 58.97
>>>>>>>> 1947 | 10 5.13 64.10
>>>>>>>> 1948 | 10 5.13 69.23
>>>>>>>> 1949 | 10 5.13 74.36
>>>>>>>> 1950 | 10 5.13 79.49
>>>>>>>> 1951 | 10 5.13 84.62
>>>>>>>> 1952 | 10 5.13 89.74
>>>>>>>> 1953 | 10 5.13 94.87
>>>>>>>> 1954 | 10 5.13 100.00
>>>>>>>> ------------+-----------------------------------
>>>>>>>> Total | 195 100.00
>>>>>>>>
>>>>>>>> . bysort company (year) : gen first = _n == 1
>>>>>>>>
>>>>>>>> . l company year if first
>>>>>>>>
>>>>>>>> +----------------+
>>>>>>>> | company year |
>>>>>>>> |----------------|
>>>>>>>> 1. | 1 1936 |
>>>>>>>> 20. | 2 1935 |
>>>>>>>> 40. | 3 1936 |
>>>>>>>> 59. | 4 1935 |
>>>>>>>> 79. | 5 1936 |
>>>>>>>> |----------------|
>>>>>>>> 98. | 6 1935 |
>>>>>>>> 118. | 7 1936 |
>>>>>>>> 137. | 8 1935 |
>>>>>>>> 157. | 9 1936 |
>>>>>>>> 176. | 10 1935 |
>>>>>>>> +----------------+
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>> Ivan Png
>>>>>>>>
>>>>>>>> I am analyzing an unbalanced panel of company data, organized by
>>>>>>>> company (gvkey) and year. I want to create a flag to the first
>>>>>>>> observation of each company in the panel. I tried
>>>>>>>>
>>>>>>>> . sort gvkey year
>>>>>>>> . by gvkey , sort: gen flag = 1 if _n == 1
>>>>>>>>
>>>>>>>> However, this only flagged flag = 1 if a company was present in year 1
>>>>>>>> of the panel. It missed any company that appeared in later years.
>>>>>>>>
>>>>>>>> I searched statalist and found this:
>>>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html
>>>>>>>>
>>>>>>>> But it doesn't work. I'd be grateful for any relevant help.
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> --
>>> Best wishes
>>> Ivan Png
>>> Skype: ipng00
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Best wishes
> Ivan Png
> Skype: ipng00
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Best wishes
Ivan Png
Skype: ipng00
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/