| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: basic programming tips
For some reason, the first part of this email was cut off. It should
have begun like this:
I am working with the National Longitudinal Survey of Youth 1997
(NLSY97). I suspect that I have some coding errors, and am going
back over my code to find where they may be. The problem, I suspect,
is due to the extensive tree-like branchings and skip patterns which
cause respondents to move throughout a set of questions separately,
before meeting back again later. These different paths will
oftentimes cover the same or similar topics, but questions are worded
slightly differently depending on earlier questions. This has made
it difficult to keep track of everyone, and I think exposes a
weakness in my programming, which is that I do not possess a good set
of failsafes and checks and balances to ensure I am both keeping up
with everyone, and secondly, not imputing the wrong information. I
have 3 questions (mentioned in the previous email), as I'm trying not
only to solve a specific issue, but more importantly, am wanting some
tips and strategies from others who know how to work well with
datasets that use such extensive skip patterns.
On Oct 11, 2006, at 2:20 PM, Scott Cunningham wrote:
respondents questions about sexual behavior every year. The survey
uses extensive skipping and branching requiring the researcher to
search over the various branches and collect the information. I am
trying to determine the proportion of people by sex, race and age
in the survey who reportedly were sexually active ("sa") at any
point in their life prior to the survey. Because of the fact that
sometimes people do not receive a given question based on how they
answered earlier questions or how they answered the same questions
in a previous years, there are holes in my data. I have two
questions.
1. I am occasionally worried that I am replacing variables with
values that are incorrect. In this example, it is easy to find
contradictions, though. If someone is sexually active in an
earlier wave (say 1997) but then later reports that they are no
longer sexually active (say 2002), then it would mean the person
reported he was not a virgin in 1997 but is a virgin in 2002. How
do others of you check to make sure you do not have mistakes like
this - once you have already reshaped the data into a panel, for
instance? I think I do not possess enough of these checks in my
programming, in fact, and am making many mistakes along the way
that I'm not catching.
2. The NLSY97 has a very difficult skip structure, and for many of
the questions I am interested in, I must comb over the questions
carefully and make sure that I am accounting for every one. For
those of you who work frequently with surveys that have elaborate
skip and branching patterns, how do you efficiently manage the code
such that you can be assured you have not lost people along the
way, or just replaced over values accidentally.
3. Finally, sexual activity has holes, as I said, which if there
are no contradictions (like going from 0 to 1 over time), can be
corrected by filling all missing observations with a 0 or 1,
assuming the first time a 1 appears is truly the first year the
person made their sexual debut. What is the best way to fill in a
missing value in the context of this type of duration modeling? I
need to tell Stata to make all missing observations a 0, unless a 1
had appeared at some point earlier, in which case replace with a 1.
I've attached a copy of the code, so that one can know what I'm
describing if it's not clear. The variables are "person
identification number," "year of survey," "sexual active," "age of
respondent at date of interview," "race," "number of partners
reported that year," and "marital status."
sc
+-----------------------------------------+
| id year sa age race rp ms |
|-----------------------------------------|
1. | 5 1997 1 15 1 2 0 |
2. | 5 1998 1 16 1 3 0 |
3. | 5 1999 . 17 1 0 0 |
4. | 5 2000 1 18 1 0 0 |
5. | 5 2001 1 19 1 . 0 |
|-----------------------------------------|
6. | 5 2002 1 20 1 4 0 |
7. | 9 1997 0 15 1 0 0 |
8. | 9 1998 . 16 1 0 0 |
9. | 9 1999 . 17 1 0 0 |
10. | 9 2000 0 18 1 0 0 |
|-----------------------------------------|
11. | 9 2001 0 19 1 0 0 |
12. | 9 2002 1 20 1 1 0 |
13. | 10 1997 . 14 1 0 0 |
14. | 10 1998 . 15 1 0 0 |
15. | 10 1999 0 16 1 0 0 |
|-----------------------------------------|
16. | 10 2000 0 17 1 0 0 |
17. | 10 2001 0 18 1 0 0 |
18. | 10 2002 0 19 1 0 0 |
19. | 18 1997 1 15 2 1 0 |
20. | 18 1998 1 16 2 99 0 |
|-----------------------------------------|
21. | 18 1999 1 17 2 3 0 |
22. | 18 2000 1 18 2 5 0 |
23. | 18 2001 . 19 2 . 0 |
24. | 18 2002 1 20 2 10 0 |
25. | 19 1997 . 12 2 0 0 |
|-----------------------------------------|
26. | 19 1998 . 13 2 0 0 |
27. | 19 1999 0 14 2 0 0 |
28. | 19 2000 1 15 2 4 0 |
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/