Modulo a reflection of the time axis, this is in part an FAQ. In fact
. search first date
in Stata yields this reference
FAQ . . . . . . . . . . . . . . . . . . . . . . . Generating the
last date
4/05 How can I generate a variable containing the last of
several dates?
http://www.stata.com/support/faqs/data/lastdate.html
which despite its title does treat the calculation of first (minimum)
dates.
FAQ or not, your question yields to -by:-.
As you don't say here whether your dates are numeric or string
variables, I take the easier option and assume numeric. Then one way to
get the first start is through -egen-:
. egen first_start = min(datestart), by(id)
Another way is from first principles:
. bysort id (datestart): gen first_start = datestart[1]
The major wrinkle seems to be that your first stop cannot precede your
first start.
Thus we clone the first stops, but blank out any dates that don't
qualify:
. gen work = cond(datestop < first_start, . , datestop)
And then proceed as before
. egen first_stop = min(work), by(id)
Or
. bysort id (work): gen firststop = work[1]
To keep just one observation for each -id-:
. by id: keep if _n == 1
For more on the power of -by:-, note that a leisurely tutorial in the
Stata Journal is now in the public domain:
SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step
by: step
Q1/02 SJ 2(1):86-102 (no
commands)
explains the use of the by varlist : construct to tackle
a variety of problems with group structure, ranging from
simple calculations for each of several groups to more
advanced manipulations that use the built-in _n and _N
Visit
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
for a .pdf version.
Nick
[email protected]
Paul O'Brien
The data are longer than that Svend!
id datestart datestop
1 1stJan01
1 12thJan02
2 1stFeb01
2 1stFeb01
2 1stApr04
2 1stApr04
2 1stJan07
3 1stJan03
3 censordate
Two points:
the patient can start drug before she attends our clinic
the patient can stop and start on the same day (it is actually a
hormonal implant, removed at end of life span and another inserted at
same visit).
We want to measure the continuation rate for the first episode of
implant use that we inserted ourselves. Data should look like this
id datestart datestop
1 1st Jan01 12th Jan02
2 1st Feb01 1stApr04
3 1stJan03 censordate
So, we want the first datestart on the same row as the next datestop.
On 3/13/08, Svend Juul <[email protected]> wrote:
Paul wrote:
We have a database of patients on and off a drug in the long form,
some stopping before starting later. I want to do a survival analysis
on the first instance of starting and stopping use under our care, but
have difficulty isolating the first episode of use.
=============================================================
I assume that long form means something like this:
clear
input id timeon timeoff
1 1 3
1 6 7
2 1 5
2 6 9
end
You want to keep the first treatment period for each id:
by id (timeon), sort: generate incl = _n==1
keep if incl==1
sort id timeon
list
+------------------------------+
| id timeon timeoff incl |
|------------------------------|
1. | 1 1 3 1 |
2. | 2 1 5 1 |
+------------------------------+
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/