I stand corrected as far as the multiple observations on each
individual; I missed that. The rest is still valid if you collapse
your data (or clone it, in the other direction, if you need to
keep the multiplicities).
m.p.
Lars Kroll wrote:
I don't think your on the right way Marcello,
id year age firstsex_yr
1 1997 15 .
1 1998 16 16
1 1999 17 16
1 2000 18 16
2 1997 12 .
2 1998 13 .
2 1999 14 11
2 2000 15 11
3 1997 16 12
3 1998 17 12
3 1999 18 12
3 2000 19 12
are single failure multiple obs per subject data, so I would suggest:
gen failure = age==firstsex if firstsex<.
sort persnr year
by persnr, sort : gen enterstudy = year==year[1] // if your first year
// isn't 0 one never
// know...
stset age, id(id) failure(failure==1) exit(failure==1)
enter(enterstudy==1)
Hope this helps,
Lars
Am Sonntag, den 16.10.2005, 21:46 -0400 schrieb Marcello Pagano:
I do not understand your dilemma. Assuming everyone is telling the truth,
what you seem to have is time to first sex is your outcome of interest with
the very Victorian identification of "death" as that time. If someone is 17
at the time of the survey without having had sex, then that is a censored
observation. So your "time" variable is firstsex_yr if sa==1
and age if sa==0. So you need to generate a variable
gen time = age
replace time = firstsex_yr if sa==1
stset time , failure(sa)
Your hazard should be zero for time < = 10, but that
depends on your data. You actually do have information back then, assuming
you have done a decent job of sampling and things have not changed
that much over the years. (By that I mean that if everyone you question
is over 12, say, then their experience in the 0 to 12 time period is
still representative of what is going in those years today.)
Hope this helps,
m.p.
Scott Cunningham wrote:
On Oct 16, 2005, at 9:01 PM, Nick Cox wrote:
Not my field, but your dummy calculation can
be put more succinctly:
gen sa = firstsex_yr <= age
However, safer would be to trap missings:
gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age)
Nick
Nick,
Thanks for helping make the dummies more succinct.
Do you think, though, that it is correct to use "age" as the actual
duration variable? So, for instance, I have a long dataset like this:
id year age firstsex_yr
1 1997 15 .
1 1998 16 16
1 1999 17 16
1 2000 18 16
2 1997 12 .
2 1998 13 .
2 1999 14 11
2 2000 15 11
3 1997 16 12
3 1998 17 12
3 1999 18 12
3 2000 19 12
So, by stsetting the data as so:
. stset age, failure(sa)
where "sa" is an indicator equalling "1" if the person has become
sexually active (signalling "death" in this context) and 0
otherwise. If I stset the data such that "age" is the duration, have
I really made the right decision? Or should I use "year" or should
have some other variable that I create to correspond to time that has
passed? Because I really want to look at ten periods, initially -
from 10 years to 19 years of age. It's a short duration, relatively
speaking, and most "exits" occur at 15-17. So I don't actually have
data for resopndents for those early, pre-survey, ages - ie, 10-12.
So what's the best solution here? Do I create a variable, maybe
"time" or "virgin_time", that takes on a value of 1 to 10, and that
variable matches up to the years that are covered in the data, and
the years not covered?
Is this post making sense? I'm mainly just not sure of the proper
way to execute this stset command to make use of the information I
have in the form I currently have it in.
scott
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/