Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: basic programming tips


From   Scott Cunningham <[email protected]>
To   [email protected]
Subject   Re: st: basic programming tips
Date   Wed, 11 Oct 2006 14:29:16 -0400

For some reason, the first part of this email was cut off. It should have begun like this:

I am working with the National Longitudinal Survey of Youth 1997 (NLSY97). I suspect that I have some coding errors, and am going back over my code to find where they may be. The problem, I suspect, is due to the extensive tree-like branchings and skip patterns which cause respondents to move throughout a set of questions separately, before meeting back again later. These different paths will oftentimes cover the same or similar topics, but questions are worded slightly differently depending on earlier questions. This has made it difficult to keep track of everyone, and I think exposes a weakness in my programming, which is that I do not possess a good set of failsafes and checks and balances to ensure I am both keeping up with everyone, and secondly, not imputing the wrong information. I have 3 questions (mentioned in the previous email), as I'm trying not only to solve a specific issue, but more importantly, am wanting some tips and strategies from others who know how to work well with datasets that use such extensive skip patterns.

On Oct 11, 2006, at 2:20 PM, Scott Cunningham wrote:


respondents questions about sexual behavior every year. The survey uses extensive skipping and branching requiring the researcher to search over the various branches and collect the information. I am trying to determine the proportion of people by sex, race and age in the survey who reportedly were sexually active ("sa") at any point in their life prior to the survey. Because of the fact that sometimes people do not receive a given question based on how they answered earlier questions or how they answered the same questions in a previous years, there are holes in my data. I have two questions.

1. I am occasionally worried that I am replacing variables with values that are incorrect. In this example, it is easy to find contradictions, though. If someone is sexually active in an earlier wave (say 1997) but then later reports that they are no longer sexually active (say 2002), then it would mean the person reported he was not a virgin in 1997 but is a virgin in 2002. How do others of you check to make sure you do not have mistakes like this - once you have already reshaped the data into a panel, for instance? I think I do not possess enough of these checks in my programming, in fact, and am making many mistakes along the way that I'm not catching.

2. The NLSY97 has a very difficult skip structure, and for many of the questions I am interested in, I must comb over the questions carefully and make sure that I am accounting for every one. For those of you who work frequently with surveys that have elaborate skip and branching patterns, how do you efficiently manage the code such that you can be assured you have not lost people along the way, or just replaced over values accidentally.

3. Finally, sexual activity has holes, as I said, which if there are no contradictions (like going from 0 to 1 over time), can be corrected by filling all missing observations with a 0 or 1, assuming the first time a 1 appears is truly the first year the person made their sexual debut. What is the best way to fill in a missing value in the context of this type of duration modeling? I need to tell Stata to make all missing observations a 0, unless a 1 had appeared at some point earlier, in which case replace with a 1.

I've attached a copy of the code, so that one can know what I'm describing if it's not clear. The variables are "person identification number," "year of survey," "sexual active," "age of respondent at date of interview," "race," "number of partners reported that year," and "marital status."

sc


+-----------------------------------------+
| id year sa age race rp ms |
|-----------------------------------------|
1. | 5 1997 1 15 1 2 0 |
2. | 5 1998 1 16 1 3 0 |
3. | 5 1999 . 17 1 0 0 |
4. | 5 2000 1 18 1 0 0 |
5. | 5 2001 1 19 1 . 0 |
|-----------------------------------------|
6. | 5 2002 1 20 1 4 0 |
7. | 9 1997 0 15 1 0 0 |
8. | 9 1998 . 16 1 0 0 |
9. | 9 1999 . 17 1 0 0 |
10. | 9 2000 0 18 1 0 0 |
|-----------------------------------------|
11. | 9 2001 0 19 1 0 0 |
12. | 9 2002 1 20 1 1 0 |
13. | 10 1997 . 14 1 0 0 |
14. | 10 1998 . 15 1 0 0 |
15. | 10 1999 0 16 1 0 0 |
|-----------------------------------------|
16. | 10 2000 0 17 1 0 0 |
17. | 10 2001 0 18 1 0 0 |
18. | 10 2002 0 19 1 0 0 |
19. | 18 1997 1 15 2 1 0 |
20. | 18 1998 1 16 2 99 0 |
|-----------------------------------------|
21. | 18 1999 1 17 2 3 0 |
22. | 18 2000 1 18 2 5 0 |
23. | 18 2001 . 19 2 . 0 |
24. | 18 2002 1 20 2 10 0 |
25. | 19 1997 . 12 2 0 0 |
|-----------------------------------------|
26. | 19 1998 . 13 2 0 0 |
27. | 19 1999 0 14 2 0 0 |
28. | 19 2000 1 15 2 4 0 |

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index