Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: limiting time series data fills based on duration between observations
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: limiting time series data fills based on duration between observations
Date
Fri, 5 Jul 2013 11:25:04 +0100
Another approach is that you can install -tsspell- from SSC to
identify spells of missing values and then use the created variable
-_seq- to stipulate replacement only for the first 3 in each spell.
tsset company date
tsfill
tsspell, cond(missing(employ))
replace employ = L.employ if inrange(_seq,1,3)
Nick
[email protected]
On 5 July 2013 09:48, Nick Cox <[email protected]> wrote:
> The Statalist FAQ does spell out that you are asked to use full real
> names for posting. I am guessing that "StataQ" really is not your
> surname. Please note for future postings.
>
> This is almost an FAQ
>
> http://www.stata.com/support/faqs/data-management/replacing-missing-values/
>
> but the restriction to three time slots only adds a complication.
>
> There are various ways to do it. Here is one:
>
> . list
>
> +------------------------------+
> | company employ~s date |
> |------------------------------|
> 1. | ABC 10000 2010m1 |
> 2. | ABC 10100 2010m2 |
> 3. | ABC 9500 2010m5 |
> 4. | ABC 9600 2010m12 |
> 5. | DEF 2000 2009m5 |
> |------------------------------|
> 6. | DEF 2100 2009m10 |
> 7. | DEF 2300 2009m11 |
> +------------------------------+
>
> . tsset company date
> panel variable: company (unbalanced)
> time variable: date, 2009m5 to 2010m12, but with gaps
> delta: 1 month
>
> . tsfill
>
> . gen miss1 = missing(employ) & !missing(L.employ)
>
> . gen miss2 = missing(employ) & !missing(L2.employ)
>
> . gen miss3 = missing(employ) & !missing(L3.employ)
>
> . replace employ = L.employ if miss1|miss2|miss3
> (8 real changes made)
>
> . l
>
> +------------------------------------------------------+
> | company employ~s date miss1 miss2 miss3 |
> |------------------------------------------------------|
> 1. | ABC 10000 2010m1 0 0 0 |
> 2. | ABC 10100 2010m2 0 0 0 |
> 3. | ABC 10100 2010m3 1 1 0 |
> 4. | ABC 10100 2010m4 0 1 1 |
> 5. | ABC 9500 2010m5 0 0 0 |
> |------------------------------------------------------|
> 6. | ABC 9500 2010m6 1 0 0 |
> 7. | ABC 9500 2010m7 0 1 0 |
> 8. | ABC 9500 2010m8 0 0 1 |
> 9. | ABC . 2010m9 0 0 0 |
> 10. | ABC . 2010m10 0 0 0 |
> |------------------------------------------------------|
> 11. | ABC . 2010m11 0 0 0 |
> 12. | ABC 9600 2010m12 0 0 0 |
> 13. | DEF 2000 2009m5 0 0 0 |
> 14. | DEF 2000 2009m6 1 0 0 |
> 15. | DEF 2000 2009m7 0 1 0 |
> |------------------------------------------------------|
> 16. | DEF 2000 2009m8 0 0 1 |
> 17. | DEF . 2009m9 0 0 0 |
> 18. | DEF 2100 2009m10 0 0 0 |
> 19. | DEF 2300 2009m11 0 0 0 |
> +------------------------------------------------------+
>
> All that said, this is a rather arbitrary interpolation method. For
> other possibilities, see
>
> ipolate (official)
> cipolate (SSC)
> csipolate (SSC)
> pchipolate (SSC)
>
> I particularly recommend -pchipolate-.
>
>
>
> Nick
> [email protected]
>
>
> On 5 July 2013 04:20, Ethan StataQ <[email protected]> wrote:
>> I have data that looks like this:
>>
>> Company Employees YearMonth
>> ABC 10,000 2010m1
>> ABC 10,100 2010m2
>> ABC 9,500 2010m5
>> ABC 9,600 2010m12
>> DEF 2,000 2009m5
>> DEF 2,100 2009m10
>> DEF 2,300 2009m11
>>
>> I would like to create a time series of this data such that the number
>> of employees is presumed to remain unchanged during the months between
>> observations. However, if the number of months between observations
>> exceeds 3 months then I do not want to make this assumption. In this
>> case the number of employees should be represented as "." instead such
>> that the above data appears as follows:
>>
>>
>> Company Employees YearMonth
>> ABC 10,000 2010m1
>> ABC 10,100 2010m2
>> ABC 10,100 2010m3
>> ABC 10,100 2010m4
>> ABC 9,500 2010m5
>> ABC . 2010m6
>> ABC . 2010m7
>> ABC . 2010m8
>> ABC . 2010m9
>> ABC . 2010m10
>> ABC . 2010m11
>> ABC 9,600 2010m12
>> DEF 2,000 2009m5
>> DEF . 2009m6
>> DEF . 2009m7
>> DEF . 2009m8
>> DEF . 2009m9
>> DEF 2,100 2009m10
>> DEF 2,300 2009m11
>>
>>
>> The idea is that I presume the number of employees remains the same as
>> the last reading as long as a new reading occurs within 3 months.
>>
>> I would really appreciate some help in figuring out how to achieve
>> this. If this question has been answered, a redirect url would be
>> greatly appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/