Not my field, but your dummy calculation can
be put more succinctly:
gen sa = firstsex_yr <= age
However, safer would be to trap missings:
gem sa = cond(mi(firstsex_yr, age), ., firstsex_yr <= age)
Nick
[email protected]
Scott Cunningham
> I'm estimating a hazard model and had some basic questions. The
> dataset I'm using is the NLSY97. It's a panel consisting of six
> waves, and each year roughly 5500 individuals (after eliminating
> various observations). The outcome that I'm interested in is the
> exit from virginity. Individuals are not asked questions about sex
> until they are 14, but when they are asked, they are asked at what
> age they first experienced vaginal intercourse, and that age
> oftentimes is prior to the year in which they were first asked about
> their sexuality (ie, earlier than 14). So, I have, for all
> individuals, an integer corresponding to their age, in years, when
> they lost their virginity, or missing data for those who are still
> virgins. After pulling the variables, I reshaped the data into a
> long panel.
>
> Thinking about the "stset" command, I decided to follow this route.
>
> * generate sexually active dummy equalling 1 if sexually active, and
> 0 otherwise
> gen sa=.
> replace sa=0 if firstsex_yr<age
> replace sa=1 if firstsex_yr==age
> replace sa=1 if firstsex_yr>age
>
> * stset the data
> stset age, failure(sa) id(id)
>
> where "age" is the age of the individual in any given year, and
> "firstsex_yr" is the age at which the individual first experienced
> vaginal intercourse.
>
> What I've basically done, though, is made the person's age to be my
> duration variable, but I don't think this is correct. Ideally, I'd
> like to simply have some sort of year variable to be the duration
> variable, but the problem I'm imaginging is how to handle
> events that
> happened prior to the survey. For instance, I know that some lost
> their virginity when they were 10, year that is at best 2
> years prior
> to the survey for some people, and 4 years prior to the survey for
> others. So, it would seem that making "age" the duration
> variable is
> not the appropriate strategy, but I'm not sure of a better solution
> at this point. Can someone provide me some suggestions on getting
> this data together?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/