Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: xtdescribe and panel data
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: xtdescribe and panel data
Date
Thu, 8 Sep 2011 13:20:31 +0100
There is no need to modify any code. You can just apply standard Stata
logic and existing code.
Your problem falls into two parts.
1. Identifying panels with (at least) five consecutive observations
Once you have a pattern variable, the condition that an observation
belongs to a panel with at least five consecutive observations is
... if strpos(patternvar, "11111")
as "11111" will be included somewhere as a substring within the value
of the pattern variable. The condition that an observation belongs to
a panel with precisely five consecutive observations is
... if strpos(patternvar, "11111") & !strpos(patternvar, "111111")
and so forth. You implied that you wanted the second (five), but your
examples make clear that you really want the first (at least five).
That's fine. You could -drop- any observations, and thus any panels,
that don't satisfy your criterion, but that would not reduce each
panel to its longest spell of consecutive observations.
2. Keeping just the longest spell of consecutive observations at least
some length
A little searching turns up relevant material. See the help for
-tsspell- (SSC) and the FAQ
FAQ . . . . . . Identifying runs of consecutive observations in panel data
. . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
8/02 How do I identify runs of consecutive observations
in panel data?
http://www.stata.com/support/faqs/data/panel.html
In fact the FAQ contains all that you really need to answer your
question. I will use -tsspell- (SSC) on an example.
webuse abdata
You don't need to use -xtpatternvar- (previously posted in this
thread) to see what it is like, but I will illustrate nevertheless.
. xtpatternvar, gen(pattern)
. tab pattern
pattern | Freq. Percent Cum.
------------+-----------------------------------
..1111111 | 14 1.36 1.36
.1111111. | 273 26.48 27.84
.11111111 | 152 14.74 42.58
1111111.. | 434 42.10 84.68
11111111. | 32 3.10 87.78
111111111 | 126 12.22 100.00
------------+-----------------------------------
Total | 1,031 100.00
. xtset
panel variable: id (unbalanced)
time variable: year, 1976 to 1984
delta: 1 unit
The help for -tsspell- gives an example of identifying spells of
consecutive observations. The FAQ explains the logic.
. tsspell, f(L.year==.)
-tsspell- creates three new variables, by default _spell, _seq, _end.
. ds
c1 emp indoutpt k yearm1 nL2 kL2
yr1976 yr1979 yr1982 pattern _end
ind wage n ys id wL1 ysL1
yr1977 yr1980 yr1983 _spell
year cap w rec nL1 kL1 ysL2
yr1978 yr1981 yr1984 _seq
The length of a spell is the highest value of _seq within that spell.
. egen length = max(_seq), by(id _spell)
The length of the _longest_ spell for any panel will be
gen maxlength = max(_seq), by(id)
Now we can use any relevant condition(s) we like to select spells.
. keep if length == 8
In other words, for your problem as now stated, you don't need my
-xtpatternvar- at all. But -tsspell- might come in handy. See also
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification
That's a discussion of principles; there is no reference to -tsspell-.
Nick
On Thu, Sep 8, 2011 at 12:40 PM, A. Berâ <[email protected]> wrote:
> Dear Dr. Cox,
>
> Thank you very much for your detailed and helpful response.
>
> May I ask one more question if you don't mind? Is it possible to
> modify your code as follows:
>
> Assume I would like to include in my analysis those firms that have n,
> say five, consecutive observations. So for the firms below, the first
> should be included; the second will not be included; for the third
> one, the first two years should be deleted and the last 8 years should
> be included; and for the last one, middle 5 observations will be
> included
>
> ......11111111
> 111...........
> 11....11111111
> 11..11111.1111
>
> Regards,
>
> a.b.
>
> On Tue, Sep 6, 2011 at 7:25 PM, Nick Cox <[email protected]> wrote:
>>
>> This is a fiddly calculation, so I packaged it in a more respectable
>> program. The main algorithm is simplified a bit too. Example first,
>> code later.
>>
>> . webuse abdata
>>
>> . xtset
>> panel variable: id (unbalanced)
>> time variable: year, 1976 to 1984
>>
>> . xtpatternvar , gen(pattern)
>>
>> . tab pattern
>>
>> pattern | Freq. Percent Cum.
>> ------------+-----------------------------------
>> ..1111111 | 14 1.36 1.36
>> .1111111. | 273 26.48 27.84
>> .11111111 | 152 14.74 42.58
>> 1111111.. | 434 42.10 84.68
>> 11111111. | 32 3.10 87.78
>> 111111111 | 126 12.22 100.00
>> ------------+-----------------------------------
>> Total | 1,031 100.00
>>
>>
>> *! NJC 1.0.0 6 Sept 2011
>> program xtpatternvar, sort
>> version 9.2
>> syntax [if] [in] , GENerate(name)
>>
>> confirm new var `generate'
>> local g `generate'
>>
>> quietly {
>> xtset
>> local t `r(timevar)'
>> local id `r(panelvar)'
>>
>> marksample touse
>> count if `touse'
>> if r(N) == 0 error 2000
>>
>> su `t' if `touse', meanonly
>> local max = r(max)
>> local min = r(min)
>> local range = r(max) - r(min) + 1
>>
>> if `range' > 244 {
>> di as err "no go; patterns too long for str244"
>> exit 498
>> }
>>
>> local miss : di _dup(`range') "."
>>
>> bysort `touse' `id' (`t') : ///
>> gen `g' = substr("`miss'", 1, `t'[1]-`min') + "1" if _n == 1
>>
>> by `touse' `id' : replace `g' = ///
>> substr("`miss'", 1, `t'- `t'[_n-1] - 1) + "1" if _n > 1
>>
>> by `touse' `id': replace `g' = ///
>> `g' + substr("`miss'", 1, `max'-`t'[_N]) if _n == _N
>>
>> by `touse' `id' : replace `g' = `g'[_n-1] + `g' if _n > 1
>>
>> by `touse' `id' : replace `g' = cond(`touse', `g'[_N], "")
>>
>> compress `g'
>> }
>> end
>>
>>
>>
>> On Tue, Sep 6, 2011 at 10:31 AM, Nick Cox <[email protected]> wrote:
>> > On Tue, Sep 6, 2011 at 9:12 AM, A. Berâ <[email protected]> wrote:
>> >
>> >> I have some panel data as described below. Few questions:
>> >>
>> >> 1. Can these data be analyzed by panel data methods? I would
>> >> appreciate any suggestions about a suitable approach for these data.
>> >
>> > You have panel data. You let slip that the panels are firms. Do
>> > something that makes economic sense.
>> > That seems all that can be advised.
>> >
>> >> 2. How can I delete firms that have a specific pattern? For example
>> >> how can I delete these type of firms: 1..........111 ?
>> >
>> > You can create a pattern variable like this.
>> >
>> > use http://www.stata-press.com/data/r10/xtdatasmpl.dta, clear
>> > xtset idcode year
>> > keep if idcode <= 5
>> > su year, meanonly
>> > local max = r(max)
>> > local min = r(min)
>> > local range = r(max) - r(min) + 1
>> > local miss : di _dup(`range') "."
>> > bysort idcode (year) : gen this = substr("`miss'", 1, year[1]-`min') +
>> > "1" if _n == 1
>> > by idcode : replace this = substr("`miss'", 1, year- year[_n-1] - 1) +
>> > "1" if _n > 1
>> > by idcode : replace this = this + substr("`miss'", 1, `max'-year[_N])
>> > if _n == _N
>> > by idcode : gen pattern = this[1]
>> > by idcode : replace pattern = pattern[_n-1] + this if _n > 1
>> > by idcode : replace pattern = pattern[_N]
>> > tab pattern
>> > xtdes
>> >
>> > After that you can do things conditionally on values of -pattern-.
>> >
>> >> 3. Is imputation appropriate if "holes" between years is more than one?
>> >
>> > You could interpolate. People usually don't with this kind of data.
>> >
>> >> Many thanks for any help.
>> >> --
>> >> abdullah berâ
>> >>
>> >>
>> >> . xtdescribe, patterns(1000)
>> >>
>> >> id: 2, 3, ..., 37376 n = 22997
>> >> date: 1996, 1997, ..., 2009 T = 14
>> >> Delta(date) = 1 unit
>> >> Span(date) = 14 periods
>> >> (id*date uniquely identifies each observation)
>> >>
>> >> Distribution of T_i: min 5% 25% 50% 75% 95% max
>> >> 1 1 2 4 9 14 14
>> >>
>> >> Freq. Percent Cum. | Pattern
>> >> ---------------------------+----------------
>> >> 3171 13.79 13.79 | 1.............
>> >> 2447 10.64 24.43 | 11111111111111
>> >> 1932 8.40 32.83 | 11............
>> >> 1471 6.40 39.23 | ...........111
>> >> 1066 4.64 43.86 | ..........1111
>> >
>> > <big snip>
>> >
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/