Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: limiting time series data fills based on duration between observations

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: limiting time series data fills based on duration between observations
Date	Fri, 5 Jul 2013 11:25:04 +0100

Another approach is that you can install -tsspell- from SSC to
identify spells of missing values and then use the created variable
-_seq- to stipulate replacement only for the first 3 in each spell.

tsset company date
tsfill
tsspell, cond(missing(employ))
replace employ = L.employ if inrange(_seq,1,3)

Nick
[email protected]


On 5 July 2013 09:48, Nick Cox <[email protected]> wrote:
> The Statalist FAQ does spell out that you are asked to use full real
> names for posting. I am guessing that "StataQ" really is not your
> surname. Please note for future postings.
>
> This is almost an FAQ
>
> http://www.stata.com/support/faqs/data-management/replacing-missing-values/
>
> but the restriction to three time slots only adds a complication.
>
> There are various ways to do it. Here is one:
>
> . list
>
>      +------------------------------+
>      | company   employ~s      date |
>      |------------------------------|
>   1. |     ABC      10000    2010m1 |
>   2. |     ABC      10100    2010m2 |
>   3. |     ABC       9500    2010m5 |
>   4. |     ABC       9600   2010m12 |
>   5. |     DEF       2000    2009m5 |
>      |------------------------------|
>   6. |     DEF       2100   2009m10 |
>   7. |     DEF       2300   2009m11 |
>      +------------------------------+
>
> . tsset company date
>        panel variable:  company (unbalanced)
>         time variable:  date, 2009m5 to 2010m12, but with gaps
>                 delta:  1 month
>
> . tsfill
>
> . gen miss1 = missing(employ) & !missing(L.employ)
>
> . gen miss2 = missing(employ) & !missing(L2.employ)
>
> . gen miss3 = missing(employ) & !missing(L3.employ)
>
> . replace employ = L.employ if miss1|miss2|miss3
> (8 real changes made)
>
> . l
>
>      +------------------------------------------------------+
>      | company   employ~s      date   miss1   miss2   miss3 |
>      |------------------------------------------------------|
>   1. |     ABC      10000    2010m1       0       0       0 |
>   2. |     ABC      10100    2010m2       0       0       0 |
>   3. |     ABC      10100    2010m3       1       1       0 |
>   4. |     ABC      10100    2010m4       0       1       1 |
>   5. |     ABC       9500    2010m5       0       0       0 |
>      |------------------------------------------------------|
>   6. |     ABC       9500    2010m6       1       0       0 |
>   7. |     ABC       9500    2010m7       0       1       0 |
>   8. |     ABC       9500    2010m8       0       0       1 |
>   9. |     ABC          .    2010m9       0       0       0 |
>  10. |     ABC          .   2010m10       0       0       0 |
>      |------------------------------------------------------|
>  11. |     ABC          .   2010m11       0       0       0 |
>  12. |     ABC       9600   2010m12       0       0       0 |
>  13. |     DEF       2000    2009m5       0       0       0 |
>  14. |     DEF       2000    2009m6       1       0       0 |
>  15. |     DEF       2000    2009m7       0       1       0 |
>      |------------------------------------------------------|
>  16. |     DEF       2000    2009m8       0       0       1 |
>  17. |     DEF          .    2009m9       0       0       0 |
>  18. |     DEF       2100   2009m10       0       0       0 |
>  19. |     DEF       2300   2009m11       0       0       0 |
>      +------------------------------------------------------+
>
> All that said, this is a rather arbitrary interpolation method. For
> other possibilities, see
>
> ipolate (official)
> cipolate (SSC)
> csipolate (SSC)
> pchipolate (SSC)
>
> I particularly recommend -pchipolate-.
>
>
>
> Nick
> [email protected]
>
>
> On 5 July 2013 04:20, Ethan StataQ <[email protected]> wrote:
>> I have data that looks like this:
>>
>> Company       Employees     YearMonth
>> ABC              10,000            2010m1
>> ABC              10,100            2010m2
>> ABC              9,500              2010m5
>> ABC              9,600              2010m12
>> DEF               2,000             2009m5
>> DEF               2,100             2009m10
>> DEF               2,300             2009m11
>>
>> I would like to create a time series of this data such that the number
>> of employees is presumed to remain unchanged during the months between
>> observations. However, if the number of months between observations
>> exceeds 3 months then I do not want to make this assumption. In this
>> case the number of employees should be represented as "." instead such
>> that the above data appears as follows:
>>
>>
>> Company       Employees     YearMonth
>> ABC              10,000            2010m1
>> ABC              10,100            2010m2
>> ABC              10,100            2010m3
>> ABC              10,100            2010m4
>> ABC              9,500              2010m5
>> ABC              .                     2010m6
>> ABC              .                     2010m7
>> ABC              .                     2010m8
>> ABC              .                     2010m9
>> ABC              .                     2010m10
>> ABC              .                     2010m11
>> ABC              9,600              2010m12
>> DEF              2,000              2009m5
>> DEF               .                    2009m6
>> DEF               .                    2009m7
>> DEF               .                    2009m8
>> DEF               .                    2009m9
>> DEF               2,100             2009m10
>> DEF               2,300             2009m11
>>
>>
>> The idea is that I presume the number of employees remains the same as
>> the last reading as long as a new reading occurs within 3 months.
>>
>> I would really appreciate some help in figuring out how to achieve
>> this. If this question has been answered, a redirect url would be
>> greatly appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: limiting time series data fills based on duration between observations
  - From: Ethan StataQ <[email protected]>
- Re: st: limiting time series data fills based on duration between observations
  - From: Nick Cox <[email protected]>

Prev by Date: st: new update simpplot available from SSC
Next by Date: Re: st: Show Command Output in Foreach Loop ...
Previous by thread: Re: st: limiting time series data fills based on duration between observations
Next by thread: st: Compare models for panel data with xtreg fixed effects
Index(es):
- Date
- Thread