Bellinda Kallimanis <[email protected]> writes in with a
data-management question:
> I have a large data set (panel format) and I am trying to calculate length
> of stay for each individual. The thing is many individuals come in (adm
> asses) and stay then leave (disch) then come back. But to complicate it
> more, some people leave with anticipated return (dis_ret) then come back
> (reentry) and some people are admitted and have follow up but have not been
> discharged. What I would like to do is create a variable that has LOS for
> each separate admission. Below is a sample of my data
>
>
> resident target aa8a LOS (which I want to
> ------------------------------------------------ create)
> 1 30sep2001 adm asses 0
> 1 10oct2001 disch 10
> ------------------------------------------------
> 1 18nov2002 adm asses 0
> 1 22nov2002 disch 4
> ------------------------------------------------
> 1 18jul2004 adm asses 0
> 1 28jul2004 disch 10
> ------------------------------------------------
> 2 07may2003 adm asses 0
> 2 20jul2003 quart review 74
> 2 12oct2003 quart review 158
> 2 04jan2004 quart review 242
> 2 28mar2004 ann asses 326
> 2 20jun2004 quart review 410
> 2 12sep2004 quart review 494
> ------------------------------------------------
> 3 15oct2000 adm asses 0
> 3 23oct2000 dis_ret 8
> ------------------------------------------------
> 3 25oct2000 reentry 0
> 3 22nov2000 disch 28
> ------------------------------------------------
> 4 17oct2000 adm asses 0
> 4 01nov2000 disch 15
> ------------------------------------------------
> 5 15oct2000 adm asses 0
> 5 22nov2000 disch 38
> ------------------------------------------------
>
>
> I created a flag for any type of discharge and I think if I create a unique
> id for each admission that I would be able to calculate the LOS. I'm just
> not sure how to get this admission id [...]
In the above listing of the data, I added the dividing lines between
admissions and discharges just so I could better see the problem.
Bellinda is exactly right when she says "I think if I create a unique id for
each addmission that I would be able to calculate the LOS". I also very much
like that Belinda created a flag variable for any type of discharge.
Now let's do the same thing for any type of admission:
. gen byte isadmit = (aa8a == "adm asses")
With that, we can create the unique admission id that Bellinda wants:
. sort resident target
. by resident: gen admission = sum(isadmit)
To see how this works, look at just a little bit of Bellinda's data. This
time, I'll put the dividing lines (which are meaningless except that they help
to guide the eye) between the resident ids:
resident target aa8a isadmit admission
----------------------------------------------------------------
1 30sep2001 adm asses 1 1
1 10oct2001 disch 0 1
1 18nov2002 adm asses 1 2
1 22nov2002 disch 0 2
1 18jul2004 adm asses 1 3
1 28jul2004 disch 0 3
----------------------------------------------------------------
2 07may2003 adm asses 1 1
2 20jul2003 quart review 0 1
2 12oct2003 quart review 0 1
2 04jan2004 quart review 0 1
2 28mar2004 ann asses 0 1
2 20jun2004 quart review 0 1
2 12sep2004 quart review 0 1
----------------------------------------------------------------
3 15oct2000 adm asses 1 1
etc.
The trick to create any kind of id variable is
1. Create a variable that is 1 at the start of each group and
0 elsewhere.
2. Sum it.
With new variable admission in hand, we can create the LOS:
. sort resident admission target
. by resident admission: gen los = target - target[1]
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/