There are probably better ways, but something like that below should do it.
(Note that I'd normally prefer something more like
generate byte education _yrs = mod(hi_edu, 10) + ///
7 * inrange(hi_edu, 21, 24) + ///
11 * inrange(hi_edu, 31, 35)
because it would be easier to maintain--more self-documenting--but there's an
outside chance that it is somewhat slower in execution, perhaps even
noticeably so if you've got a very large amount of data.)
Joseph Coveney
. clear *
. set more off
. input hhid hi_educ years
hhid hi_educ years
1. 1 11 1
2. 2 21 8
3. 3 17 7
4. 4 16 6
5. 5 24 11
6. 6 31 12
7. 7 32 13
8. 8 13 3
9. 9 22 9
10. end
. generate byte education_yrs = mod(hi_educ, 10) + ///
7 * floor(hi_educ / 20) + ///
4 * floor(hi_educ / 30)
. list, noobs separator(0)
+-----------------------------------+
| hhid hi_educ years educat~s |
|-----------------------------------|
| 1 11 1 1 |
| 2 21 8 8 |
| 3 17 7 7 |
| 4 16 6 6 |
| 5 24 11 11 |
| 6 31 12 12 |
| 7 32 13 13 |
| 8 13 3 3 |
| 9 22 9 9 |
+-----------------------------------+
. exit
Ronnie Babigumira wrote:
I have an interesting data management problem. My data look like this
[see below]
Where hi_educ is the highest level of education for household. From this I
would like to extract the number of years of schooling.
Now, for values below 17, the years of schooling is the last digit
for values between 21 and 24, it is 7 + the last digit
for values between 31 and 35 it is 11 + the last digit
What I would like to end up with is something like this
hhid hi_educ years
1 11 1
2 21 8
3 17 7
4 16 6
5 24 11
6 31 12
7 32 13
8 13 3
9 22 9
I am stuck here
gen str3 test = ""
replace test = substr(string(hi_educ), -1,.) if inrange(hi_educ,11,17)
I would appreciate any help
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/