Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Analysis of event history data
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Analysis of event history data
Date
Tue, 20 Mar 2012 14:14:15 +0000
Never say "one final question"!
-help egen- shows that there are -egen- functions -anycount()-,
-anymatch()-. -anyvalue()-. So
egen ones = anycount(y_*), values(1)
keep if ones
Even if those functions did not exist, you could do this
gen ones = 0
quietly foreach v of var y_* {
replace ones = ones + (`v' == 1)
}
keep if ones
Nick
On Tue, Mar 20, 2012 at 1:28 PM, Kristian Thor Jakobsen <[email protected]> wrote:
> Thanks again, Nick. I figured it out with your help. But I have one final question. Given that my dataset consists of several million observations, I would like to trim the dataset down before I do the -reshape- command in order to avoid wasting time on observations that I would subsequently throw out. Say that I want to keep those observations where y_* is equal to 1 in one or more cases:
>
> Id y_1001 y_1002 y_1003 ... y_1101 area_10 area_11
> 1 1 1 0 1 10 5
>
> I guess I could do the following:
>
> keep if y_1001==1| y_1002==1 etc.
>
> But given that I have around 1000 variables or so where I would need to check for the sufficient condition that would be a quite tedious function. Is there a smart way to get around this?
Nick Cox
> Do spend some time studying the resources for -reshape- including FAQs.
>
> First off, your -y_- cannot be an identifier! It doesn't identify observations.
>
> Second off, you can include -area- in the -reshape- but I guess you will need some extra surgery before and after. I would try a -rename- of the -area*- such as
>
> foreach v of var area* {
> rename `v' `v'01
> }
>
> and then there will be some fill-in afterwards.
>
> Nick
>
> On Mon, Mar 19, 2012 at 12:30 PM, Kristian Thor Jakobsen <[email protected]> wrote:
>> Thanks, Nick. -reshape- is a big help. But what if I have time-varying variables that I would like to carry over as well, but not with same intervals. For example:
>>
>> Id y_1001 y_1002 y_1003 ... y_1101 area_10
>> area_11
>> 1 1 1 0 0 10 5
>>
>> If I do -reshape using y_ as the identifier I would get something like:
>>
>> Id j y_ area_10 area_11
>> 1 1001 1 10 5
>> 1 1002 1 10 5
>> 1 1003 0 10 5
>> .
>> .
>> .1 1101 0 10 5
>>
>> But I would like to have something like:
>>
>> Id j y_ area
>> 1 1001 1 10
>> 1 1002 1 10
>> 1 1003 0 10
>> .
>> .
>> .
>> 1 1101 0 5
>>
>> Is that possible with -reshape-? Or would I have to convert the yearly time-varying variables into weekly first?
>>
>> Thanks again,
>> Kristian
>>
>> -----Oprindelig meddelelse-----
>> Fra: [email protected]
>> [mailto:[email protected]] På vegne af Nick Cox
>> Sendt: 19. marts 2012 12:43
>> Til: [email protected]
>> Emne: Re: st: Analysis of event history data
>>
>> For most Stata purposes your data would indeed be better reshaped to a long data structure or shape or form (some people do say "format", but in a Stata context format implies -format-, etc.).
>>
>> reshape long y_ , i(id) j(time)
>> rename y_ status
>>
>> should do it. See also -tsspell- (SSC) and
>>
>> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata:
>> Identifying spells
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
>> J. Cox
>> Q2/07 SJ 7(2):249--265 (no
>> commands)
>> shows how to handle spells with complete control over
>> spell specification
>>
>> as well as the literature on survival analysis with which you are evidently familiar.
>>
>> Nick
>>
>> On Mon, Mar 19, 2012 at 11:32 AM, Kristian Thor Jakobsen <[email protected]> wrote:
>>
>>> I am trying to do an analysis of transition in and out of public
>>> income transfers. My data is organized roughly the following way:
>>>
>>> Id y_1001 y_1002 y_1003
>>> 1 0 1 0
>>> 2 0 0 0
>>> 3 1 1 0
>>>
>>> This means that I have the weekly status of each individual from 1991
>>> to 2011. But in order to any sort of analysis I would guess that I
>>> had to convert the data into the following way instead (for example
>>> survival
>>> analysis):
>>>
>>> Id Status Time
>>> 1 0 1
>>> 1 1 2
>>> 1 0 3
>>> 2 0 1
>>> 2 0 2
>>> 2 0 3
>>> 3 1 1
>>> 3 1 2
>>> 3 0 3
>>>
>>> Is that correct, and if so, does there exist a smart way to convert
>>> the data from one format into the other? Or can I perhaps use the
>>> data as given?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/