Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: limiting time series data fills based on duration between observations

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: limiting time series data fills based on duration between observations
Date	Fri, 5 Jul 2013 09:48:57 +0100

The Statalist FAQ does spell out that you are asked to use full real
names for posting. I am guessing that "StataQ" really is not your
surname. Please note for future postings.

This is almost an FAQ

http://www.stata.com/support/faqs/data-management/replacing-missing-values/

but the restriction to three time slots only adds a complication.

There are various ways to do it. Here is one:

. list

     +------------------------------+
     | company   employ~s      date |
     |------------------------------|
  1. |     ABC      10000    2010m1 |
  2. |     ABC      10100    2010m2 |
  3. |     ABC       9500    2010m5 |
  4. |     ABC       9600   2010m12 |
  5. |     DEF       2000    2009m5 |
     |------------------------------|
  6. |     DEF       2100   2009m10 |
  7. |     DEF       2300   2009m11 |
     +------------------------------+

. tsset company date
       panel variable:  company (unbalanced)
        time variable:  date, 2009m5 to 2010m12, but with gaps
                delta:  1 month

. tsfill

. gen miss1 = missing(employ) & !missing(L.employ)

. gen miss2 = missing(employ) & !missing(L2.employ)

. gen miss3 = missing(employ) & !missing(L3.employ)

. replace employ = L.employ if miss1|miss2|miss3
(8 real changes made)

. l

     +------------------------------------------------------+
     | company   employ~s      date   miss1   miss2   miss3 |
     |------------------------------------------------------|
  1. |     ABC      10000    2010m1       0       0       0 |
  2. |     ABC      10100    2010m2       0       0       0 |
  3. |     ABC      10100    2010m3       1       1       0 |
  4. |     ABC      10100    2010m4       0       1       1 |
  5. |     ABC       9500    2010m5       0       0       0 |
     |------------------------------------------------------|
  6. |     ABC       9500    2010m6       1       0       0 |
  7. |     ABC       9500    2010m7       0       1       0 |
  8. |     ABC       9500    2010m8       0       0       1 |
  9. |     ABC          .    2010m9       0       0       0 |
 10. |     ABC          .   2010m10       0       0       0 |
     |------------------------------------------------------|
 11. |     ABC          .   2010m11       0       0       0 |
 12. |     ABC       9600   2010m12       0       0       0 |
 13. |     DEF       2000    2009m5       0       0       0 |
 14. |     DEF       2000    2009m6       1       0       0 |
 15. |     DEF       2000    2009m7       0       1       0 |
     |------------------------------------------------------|
 16. |     DEF       2000    2009m8       0       0       1 |
 17. |     DEF          .    2009m9       0       0       0 |
 18. |     DEF       2100   2009m10       0       0       0 |
 19. |     DEF       2300   2009m11       0       0       0 |
     +------------------------------------------------------+

All that said, this is a rather arbitrary interpolation method. For
other possibilities, see

ipolate (official)
cipolate (SSC)
csipolate (SSC)
pchipolate (SSC)

I particularly recommend -pchipolate-.



Nick
[email protected]


On 5 July 2013 04:20, Ethan StataQ <[email protected]> wrote:
> I have data that looks like this:
>
> Company       Employees     YearMonth
> ABC              10,000            2010m1
> ABC              10,100            2010m2
> ABC              9,500              2010m5
> ABC              9,600              2010m12
> DEF               2,000             2009m5
> DEF               2,100             2009m10
> DEF               2,300             2009m11
>
> I would like to create a time series of this data such that the number
> of employees is presumed to remain unchanged during the months between
> observations. However, if the number of months between observations
> exceeds 3 months then I do not want to make this assumption. In this
> case the number of employees should be represented as "." instead such
> that the above data appears as follows:
>
>
> Company       Employees     YearMonth
> ABC              10,000            2010m1
> ABC              10,100            2010m2
> ABC              10,100            2010m3
> ABC              10,100            2010m4
> ABC              9,500              2010m5
> ABC              .                     2010m6
> ABC              .                     2010m7
> ABC              .                     2010m8
> ABC              .                     2010m9
> ABC              .                     2010m10
> ABC              .                     2010m11
> ABC              9,600              2010m12
> DEF              2,000              2009m5
> DEF               .                    2009m6
> DEF               .                    2009m7
> DEF               .                    2009m8
> DEF               .                    2009m9
> DEF               2,100             2009m10
> DEF               2,300             2009m11
>
>
> The idea is that I presume the number of employees remains the same as
> the last reading as long as a new reading occurs within 3 months.
>
> I would really appreciate some help in figuring out how to achieve
> this. If this question has been answered, a redirect url would be
> greatly appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: limiting time series data fills based on duration between observations
  - From: Nick Cox <[email protected]>

References:
- st: limiting time series data fills based on duration between observations
  - From: Ethan StataQ <[email protected]>

Prev by Date: st: factor variables and margins for xtscc
Next by Date: st: RE: OLS equivalence for unbalanced panel
Previous by thread: st: limiting time series data fills based on duration between observations
Next by thread: Re: st: limiting time series data fills based on duration between observations
Index(es):
- Date
- Thread