Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: capturing the sizes of the sequences of countinous (uninterrupted) values equal to 1
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: capturing the sizes of the sequences of countinous (uninterrupted) values equal to 1
Date
Wed, 30 Nov 2011 09:49:02 +0000
Toy example using -tsspell- (SSC).
clear
set obs 10
gen id = _n
forval j = 1/5 {
gen time`j' = runiform() < 0.7
3. }
. l
+--------------------------------------------+
| id time1 time2 time3 time4 time5 |
|--------------------------------------------|
1. | 1 1 1 1 0 0 |
2. | 2 1 1 1 1 1 |
3. | 3 1 1 1 1 1 |
4. | 4 1 0 0 1 1 |
5. | 5 1 1 0 1 0 |
|--------------------------------------------|
6. | 6 1 0 1 1 1 |
7. | 7 1 1 1 0 1 |
8. | 8 1 1 0 1 0 |
9. | 9 1 1 1 1 1 |
10. | 10 0 0 1 1 0 |
+--------------------------------------------+
. reshape long time , i(id)
(note: j = 1 2 3 4 5)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 10 -> 50
Number of variables 6 -> 3
j variable (5 values) -> _j
xij variables:
time1 time2 ... time5 -> time
-----------------------------------------------------------------------------
. rename time state
. d
Contains data
obs: 50
vars: 3
size: 650 (99.9% of memory free)
--------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------
id float %9.0g
_j byte %9.0g
state float %9.0g
--------------------------------------------------------------------------------------------------
Sorted by: id _j
Note: dataset has changed since last saved
. rename _j time
. tsset id time
panel variable: id (strongly balanced)
time variable: time, 1 to 5
delta: 1 unit
. l
+-------------------+
| id time state |
|-------------------|
1. | 1 1 1 |
2. | 1 2 1 |
3. | 1 3 1 |
4. | 1 4 0 |
5. | 1 5 0 |
|-------------------|
6. | 2 1 1 |
7. | 2 2 1 |
8. | 2 3 1 |
9. | 2 4 1 |
10. | 2 5 1 |
|-------------------|
11. | 3 1 1 |
12. | 3 2 1 |
13. | 3 3 1 |
14. | 3 4 1 |
15. | 3 5 1 |
|-------------------|
16. | 4 1 1 |
17. | 4 2 0 |
18. | 4 3 0 |
19. | 4 4 1 |
20. | 4 5 1 |
|-------------------|
21. | 5 1 1 |
22. | 5 2 1 |
23. | 5 3 0 |
24. | 5 4 1 |
25. | 5 5 0 |
|-------------------|
26. | 6 1 1 |
27. | 6 2 0 |
28. | 6 3 1 |
29. | 6 4 1 |
30. | 6 5 1 |
|-------------------|
31. | 7 1 1 |
32. | 7 2 1 |
33. | 7 3 1 |
34. | 7 4 0 |
35. | 7 5 1 |
|-------------------|
36. | 8 1 1 |
37. | 8 2 1 |
38. | 8 3 0 |
39. | 8 4 1 |
40. | 8 5 0 |
|-------------------|
41. | 9 1 1 |
42. | 9 2 1 |
43. | 9 3 1 |
44. | 9 4 1 |
45. | 9 5 1 |
|-------------------|
46. | 10 1 0 |
47. | 10 2 0 |
48. | 10 3 1 |
49. | 10 4 1 |
50. | 10 5 0 |
+-------------------+
. tsspell, cond(state==1)
. l
+------------------------------------------+
| id time state _seq _spell _end |
|------------------------------------------|
1. | 1 1 1 1 1 0 |
2. | 1 2 1 2 1 0 |
3. | 1 3 1 3 1 1 |
4. | 1 4 0 0 0 0 |
5. | 1 5 0 0 0 0 |
|------------------------------------------|
6. | 2 1 1 1 1 0 |
7. | 2 2 1 2 1 0 |
8. | 2 3 1 3 1 0 |
9. | 2 4 1 4 1 0 |
10. | 2 5 1 5 1 1 |
|------------------------------------------|
11. | 3 1 1 1 1 0 |
12. | 3 2 1 2 1 0 |
13. | 3 3 1 3 1 0 |
14. | 3 4 1 4 1 0 |
15. | 3 5 1 5 1 1 |
|------------------------------------------|
16. | 4 1 1 1 1 1 |
17. | 4 2 0 0 0 0 |
18. | 4 3 0 0 0 0 |
19. | 4 4 1 1 2 0 |
20. | 4 5 1 2 2 1 |
|------------------------------------------|
21. | 5 1 1 1 1 0 |
22. | 5 2 1 2 1 1 |
23. | 5 3 0 0 0 0 |
24. | 5 4 1 1 2 1 |
25. | 5 5 0 0 0 0 |
|------------------------------------------|
26. | 6 1 1 1 1 1 |
27. | 6 2 0 0 0 0 |
28. | 6 3 1 1 2 0 |
29. | 6 4 1 2 2 0 |
30. | 6 5 1 3 2 1 |
|------------------------------------------|
31. | 7 1 1 1 1 0 |
32. | 7 2 1 2 1 0 |
33. | 7 3 1 3 1 1 |
34. | 7 4 0 0 0 0 |
35. | 7 5 1 1 2 1 |
|------------------------------------------|
36. | 8 1 1 1 1 0 |
37. | 8 2 1 2 1 1 |
38. | 8 3 0 0 0 0 |
39. | 8 4 1 1 2 1 |
40. | 8 5 0 0 0 0 |
|------------------------------------------|
41. | 9 1 1 1 1 0 |
42. | 9 2 1 2 1 0 |
43. | 9 3 1 3 1 0 |
44. | 9 4 1 4 1 0 |
45. | 9 5 1 5 1 1 |
|------------------------------------------|
46. | 10 1 0 0 0 0 |
47. | 10 2 0 0 0 0 |
48. | 10 3 1 1 1 0 |
49. | 10 4 1 2 1 1 |
50. | 10 5 0 0 0 0 |
+------------------------------------------+
.
On Wed, Nov 30, 2011 at 9:36 AM, Nick Cox <[email protected]> wrote:
> You can't get this information given your data structure into a single
> Stata variable. What you seek is a matrix.
>
> If w <= 244, you could try concatenating your variables into a string
> variable holding individuals' history.
>
> But I guess this would be easier after -reshape long-. Then a spell is
> defined as a sequence with all 1s for the same id. See then
>
> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q2/07 SJ 7(2):249--265 (no commands)
> shows how to handle spells with complete control over
> spell specification
>
> tsspell from http://fmwww.bc.edu/RePEc/bocode/t
> 'TSSPELL': module for identification of spells or runs in time series /
> tsspell examines the data, which must be tsset time series, to / identify
> spells or runs, which are contiguous sequences defined / by some
> condition. tsspell generates new variables indicating / distinct spells,
>
> Nick
>
> On Wed, Nov 30, 2011 at 9:24 AM, massimiliano stacchini
> <[email protected]> wrote:
>
>> I have a huge dataset. The rows identify the person ID (i) (i=1,...,n) while in columns there are the reference dates TIME(t) (t=1,...,w). Each cells contain the value 1 or 0 (zero), alternatively.
>>
>> I should create a variable (LENGTH) varying both over ID and TIME.
>> For each i of ID(i) in t of TIME(t), LENGTH should captures the number of continuous (uninterrupted) values which are equal to 1 in the interval of cells starting from the reference data t of TIME and moving backwards to the previous reference dates.
>> In other terms , LENGTH should capture for each (i) of ID and for each (t) of TIME the number of s in T (t-s) identifying cells having values equal to 1 (i.e., the size of the sequence of uninterrupted 1 moving backwards to the previous reference dates).
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/