I'm estimating a hazard model and had some basic questions. The
dataset I'm using is the NLSY97. It's a panel consisting of six
waves, and each year roughly 5500 individuals (after eliminating
various observations). The outcome that I'm interested in is the
exit from virginity. Individuals are not asked questions about sex
until they are 14, but when they are asked, they are asked at what
age they first experienced vaginal intercourse, and that age
oftentimes is prior to the year in which they were first asked about
their sexuality (ie, earlier than 14). So, I have, for all
individuals, an integer corresponding to their age, in years, when
they lost their virginity, or missing data for those who are still
virgins. After pulling the variables, I reshaped the data into a
long panel.
Thinking about the "stset" command, I decided to follow this route.
* generate sexually active dummy equalling 1 if sexually active, and
0 otherwise
gen sa=.
replace sa=0 if firstsex_yr<age
replace sa=1 if firstsex_yr==age
replace sa=1 if firstsex_yr>age
* stset the data
stset age, failure(sa) id(id)
where "age" is the age of the individual in any given year, and
"firstsex_yr" is the age at which the individual first experienced
vaginal intercourse.
What I've basically done, though, is made the person's age to be my
duration variable, but I don't think this is correct. Ideally, I'd
like to simply have some sort of year variable to be the duration
variable, but the problem I'm imaginging is how to handle events that
happened prior to the survey. For instance, I know that some lost
their virginity when they were 10, year that is at best 2 years prior
to the survey for some people, and 4 years prior to the survey for
others. So, it would seem that making "age" the duration variable is
not the appropriate strategy, but I'm not sure of a better solution
at this point. Can someone provide me some suggestions on getting
this data together?