The attachment of code promised here
didn't make it, fortunately, so you are saved a sermon on
the iniquity of attachments.
Of your questions, I have answers to two.
Nick
[email protected]
Scott Cunningham
> 1. I am occasionally worried that I am replacing variables with
> values that are incorrect. In this example, it is easy to find
> contradictions, though. If someone is sexually active in an earlier
> wave (say 1997) but then later reports that they are no longer
> sexually active (say 2002), then it would mean the person
> reported he
> was not a virgin in 1997 but is a virgin in 2002. How do others of
> you check to make sure you do not have mistakes like this - once you
> have already reshaped the data into a panel, for instance? I
> think I
> do not possess enough of these checks in my programming, in
> fact, and
> am making many mistakes along the way that I'm not catching.
I don't want to start a discussion on Statalist on quite what
is virginity, but unfortunately you seem to need to define exactly
what _you_ understand by it. I don't regard your example here
as contradictory at all as long as virgin means here "not
sexually active". Alternatively, if a person was ever previously
sexually active, I do not see how they can revert to being
a virgin (barring some legalistic redefinition).
More generally, you can check for correctness if you independently
have correct answers or have some rule that guesses correct
answers for you (e.g. a majority vote). I don't see either here.
> 3. Finally, sexual activity has holes, as I said, which if
> there are
> no contradictions (like going from 0 to 1 over time), can be
> corrected by filling all missing observations with a 0 or 1,
> assuming
> the first time a 1 appears is truly the first year the person made
> their sexual debut. What is the best way to fill in a missing value
> in the context of this type of duration modeling? I need to tell
> Stata to make all missing observations a 0, unless a 1 had appeared
> at some point earlier, in which case replace with a 1.
Again, going from 0 to 1 over time does not seem contradictory to me.
The maximum of -sa- seen so far is just
gen max_sa_sofar = .
bysort id (year) : replace max_sa_sofar = max(sa, max_sa_sofar[_n-1])
The way that the -max()- function works is that -max(0,.)- is 0, -max(1,.)-
is 1, etc., so that the usual rule that . is arbitrarily large
is set aside. (This is a feature not a bug.)
This principle is implemented in the -egen- function -record()- from
-egenmore- on SSC, attributable to Kit Baum and S.B. Else.
Thus you just need to copy across from this -max_sa_sofar- variable
whenever -sa- is missing. That still leaves open for discussion whether
this method of imputation is socially or sexually valid, as I doubt.
> I've attached a copy of the code, so that one can know what I'm
> describing if it's not clear. The variables are "person
> identification number," "year of survey," "sexual active," "age of
> respondent at date of interview," "race," "number of partners
> reported that year," and "marital status."
> +-----------------------------------------+
> | id year sa age race rp ms |
> |-----------------------------------------|
> 1. | 5 1997 1 15 1 2 0 |
> 2. | 5 1998 1 16 1 3 0 |
> 3. | 5 1999 . 17 1 0 0 |
> 4. | 5 2000 1 18 1 0 0 |
> 5. | 5 2001 1 19 1 . 0 |
> |-----------------------------------------|
> 6. | 5 2002 1 20 1 4 0 |
> 7. | 9 1997 0 15 1 0 0 |
> 8. | 9 1998 . 16 1 0 0 |
> 9. | 9 1999 . 17 1 0 0 |
> 10. | 9 2000 0 18 1 0 0 |
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/