Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Finding patterns of consecutive number


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: Finding patterns of consecutive number
Date   Thu, 26 Apr 2012 17:10:55 +0100

Nick 
[email protected] 


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Marshall Garland
Sent: 26 April 2012 17:04
To: [email protected]
Subject: Re: st: Finding patterns of consecutive number

Hi Nick-

Thanks for the simplified code and the links. This was far easier and
more intuitive than the programming somersaults that I was attempting.

I had read this FAQ, but I was conceptually struggling with how to
apply it to my circumstance.

Cheers,

-mwg

On Thu, Apr 26, 2012 at 3:45 AM, Nick Cox <[email protected]> wrote:
> The code will simplify as
>
> if _n == 1 | (test - test[_n-1] != 1)
>
> could be written
>
> if (test - test[_n-1] != 1)
>
> because -test[0]- will be evaluated as missing. But in practice with
> spell problems, the first observation in a panel often needs explicit
> attention as we know nothing about what preceded it. And code that
> deals explicitly with the first observation is often easier to
> understand.
>
> This may also be of interest:
>
> FAQ     . . . . . . Identifying runs of consecutive observations in panel data
>        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
>        8/02    How do I identify runs of consecutive observations
>                in panel data?
>                http://www.stata.com/support/faqs/data/panel.html
>
> On Thu, Apr 26, 2012 at 2:58 AM, Marshall Garland
> <[email protected]> wrote:
>
>> This is exactly what I needed.
>>
>> Thanks so much for your help and prompt reply.
>
> On Wed, Apr 25, 2012 at 8:24 PM, Nick Cox <[email protected]> wrote:
>
>>> I think of your problem as defining spells of consecutive integers, so
>>> that a spell starts with the first observation in each panel or if the
>>> previous value was not one fewer.
>>>
>>> bysort id (year) : gen progress = string(test) if _n == 1 | (test -
>>> test[_n-1] != 1)
>>> by id : replace progress = progress[_n-1] + string(test) if missing(progress)
>>>
>>> Dealing with spells: see also -tsspell- (SSC) or
>>>
>>> SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>>        Q2/07   SJ 7(2):249--265                                 (no commands)
>>>        shows how to handle spells with complete control over
>>>        spell specification
>>>
>>> By the way, as the putative author of -tostring-, I note that it is
>>> overkill here. The -string()- function is all you need.
>
> On Wed, Apr 25, 2012 at 11:47 PM, Marshall Garland
>>> <[email protected]> wrote:
>>>
>>>> I have panel student testing data spanning six years. Each year, I
>>>> have a unique student and student test level and outcome. Testing
>>>> levels across years are not necessarily consecutive, nor are years.
>>>> For each student, in each year, I'd like to create a variable that
>>>> captures the longitudinal test progression for each student, in each
>>>> year. However, for each year, I'd like the maximum consecutive test
>>>> progression, without disruptions. This maximum test progression should
>>>> only be calculated for consecutive years, too.
>>>>
>>>> I've posted my data at the end of this message, which will help
>>>> describe my objective. For student A, in 2008/09, her test progression
>>>> is 6543, since she had 4 consecutive years of test data. This is
>>>> perfect. Student B, however, in 2008/09, has a test progression of
>>>> 7643. However, I only want to record, for student B, the maximum
>>>> consecutive test progression, which is 76 and ignore the 43. The 43
>>>> progression will be captured in the corresponding year (2006/07).
>>>>
>>>> I can't figure out a way to adjust for this discontinuity. I've tried
>>>> a number of things, including this. But, this still captures repeated
>>>> test levels across years (student C below, in 2008/09).
>>>>
>>>> Thanks for help in advance.
>>>>
>>>> Cheers,
>>>>
>>>> -mwg
>>>>
>>>> /****************************************************
>>>> bys research_id: gen test_t=d.test_level_2
>>>> bys research_id: egen max_test_t=max(test_t)
>>>>
>>>> ///group creation for consecutive runs
>>>> forvalues i=0/6 {
>>>>        gen group_`i'=.
>>>>        bys research_i (sch_yr): replace group_`i'=test_level_2[_n-`i'] if
>>>> max_test_t==1 & test_t==1
>>>>        tostring group_`i', replace
>>>>        replace group_`i'="" if group_`i'=="."
>>>> }
>>>>
>>>> egen group_ty_cons=concat(group_0- group_6)
>>>> tab group_ty_cons
>>>> /**************************************************************
>>>>
>>>> Here's my data:
>>>> student year    test_level      progression
>>>> A       2005/06 3
>>>> A       2006/07 4       43
>>>> A       2007/08 5       543
>>>> A       2008/09 6       6543
>>>> B       2005/06 3
>>>> B       2006/07 4       43
>>>> B       2007/08 6       643
>>>> B       2008/09 7       7643
>>>> C       2005/06 6
>>>> C       2006/07 7       76
>>>> C       2007/08 8       876
>>>> C       2008/09 8       8876

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index