| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: basic programming tips
respondents questions about sexual behavior every year. The survey
uses extensive skipping and branching requiring the researcher to
search over the various branches and collect the information. I am
trying to determine the proportion of people by sex, race and age in
the survey who reportedly were sexually active ("sa") at any point in
their life prior to the survey. Because of the fact that sometimes
people do not receive a given question based on how they answered
earlier questions or how they answered the same questions in a
previous years, there are holes in my data. I have two questions.
1. I am occasionally worried that I am replacing variables with
values that are incorrect. In this example, it is easy to find
contradictions, though. If someone is sexually active in an earlier
wave (say 1997) but then later reports that they are no longer
sexually active (say 2002), then it would mean the person reported he
was not a virgin in 1997 but is a virgin in 2002. How do others of
you check to make sure you do not have mistakes like this - once you
have already reshaped the data into a panel, for instance? I think I
do not possess enough of these checks in my programming, in fact, and
am making many mistakes along the way that I'm not catching.
2. The NLSY97 has a very difficult skip structure, and for many of
the questions I am interested in, I must comb over the questions
carefully and make sure that I am accounting for every one. For
those of you who work frequently with surveys that have elaborate
skip and branching patterns, how do you efficiently manage the code
such that you can be assured you have not lost people along the way,
or just replaced over values accidentally.
3. Finally, sexual activity has holes, as I said, which if there are
no contradictions (like going from 0 to 1 over time), can be
corrected by filling all missing observations with a 0 or 1, assuming
the first time a 1 appears is truly the first year the person made
their sexual debut. What is the best way to fill in a missing value
in the context of this type of duration modeling? I need to tell
Stata to make all missing observations a 0, unless a 1 had appeared
at some point earlier, in which case replace with a 1.
I've attached a copy of the code, so that one can know what I'm
describing if it's not clear. The variables are "person
identification number," "year of survey," "sexual active," "age of
respondent at date of interview," "race," "number of partners
reported that year," and "marital status."
sc
+-----------------------------------------+
| id year sa age race rp ms |
|-----------------------------------------|
1. | 5 1997 1 15 1 2 0 |
2. | 5 1998 1 16 1 3 0 |
3. | 5 1999 . 17 1 0 0 |
4. | 5 2000 1 18 1 0 0 |
5. | 5 2001 1 19 1 . 0 |
|-----------------------------------------|
6. | 5 2002 1 20 1 4 0 |
7. | 9 1997 0 15 1 0 0 |
8. | 9 1998 . 16 1 0 0 |
9. | 9 1999 . 17 1 0 0 |
10. | 9 2000 0 18 1 0 0 |
|-----------------------------------------|
11. | 9 2001 0 19 1 0 0 |
12. | 9 2002 1 20 1 1 0 |
13. | 10 1997 . 14 1 0 0 |
14. | 10 1998 . 15 1 0 0 |
15. | 10 1999 0 16 1 0 0 |
|-----------------------------------------|
16. | 10 2000 0 17 1 0 0 |
17. | 10 2001 0 18 1 0 0 |
18. | 10 2002 0 19 1 0 0 |
19. | 18 1997 1 15 2 1 0 |
20. | 18 1998 1 16 2 99 0 |
|-----------------------------------------|
21. | 18 1999 1 17 2 3 0 |
22. | 18 2000 1 18 2 5 0 |
23. | 18 2001 . 19 2 . 0 |
24. | 18 2002 1 20 2 10 0 |
25. | 19 1997 . 12 2 0 0 |
|-----------------------------------------|
26. | 19 1998 . 13 2 0 0 |
27. | 19 1999 0 14 2 0 0 |
28. | 19 2000 1 15 2 4 0 |
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/