Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: time-series data identified by three variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: time-series data identified by three variables
Date
Fri, 30 Nov 2012 11:24:44 +0000
It's best to think that you are addressing Statalist, and not any individual.
You should be able to work this out. If the answer is all zeros,
evidently there is one and only distinct value of -date- in each group
defined by -by:-. Indeed, it is likely that there is only and only one
observation in each group. I imagine that you want
bysort patient_id illness_id (date): gen duration = date - date[1]
Note that [_n] does no harm, but is unnecessary. The difference
implied by () is however crucial here.
On Fri, Nov 30, 2012 at 9:58 AM, YANNAN SHEN <[email protected]> wrote:
> There is one more thing I need your help with. Within each group where there is a patient return to treat the same disease, I want to calculate the duration between the repeat visit with his first visit .
> I wrote the following code:
>> bysort patient_id illness_id date: gen duration = date[_n]-date[1]
> but it returns all zeros.
> What is wrong?
On Nov 28, 2012, at 4:21 AM, Nick Cox <[email protected]> wrote:
>> You want commands like
>>
>> bysort patient_id illness_id date of visit : egen meansev = mean(severity)
>> by patient_id illness_id : gen repeat = _n - 1
>>
>> as you want to number 0 upwards.
>>
>>
>> Nick
>>
>> On Wed, Nov 28, 2012 at 6:28 AM, yannan shen <[email protected]> wrote:
>>
>>> I am working some panel data of hospital visits and I want to learn
>>> the severity of various disease.
>>> The variables I have in the dataset are: patient_id, illness_id,
>>> date_of_visit, severity
>>> each observation contains: patient_id, illness_id, date_of_visit, severity.
>>>
>>> For each patient (identified by patient_id), I want to know how many
>>> of times he has visited for the same illness (illness_id ).
>>> I use the duple command to to label the observation of patients who
>>> have visited hospital more than once.
>>>
>>>> duplicates tag patient_id illness_id , generate(duple)
>>>
>>> However, duple does not give information for any time series
>>> information. If a patient has 5 visiting records, I want to be able to
>>> know which is the 0th repeat, 1st repeat, 2nd repeat, 3rd repeat, and
>>> 4th repeat...I have a vague feeling that I can order those variables
>>> via date_of_visit but I am still not sure how exactly that can be
>>> done.
>>>
>>> Furthermore, I want to create two new variables: one variable equals
>>> to the average severity of each disease (disease_id) being treated on
>>> the same date_of_visit. The other variable equals the highest severity
>>> of a certain disease being treated on that day. (Ideally, I want to
>>> create additional variables for each observation)
>>>
>>> I have used “bysort” in the past but since now the type is a
>>> combination of illness_id and date_of_visit, I am a little confused.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/