Hi, I want to get the household ID from a DHS women file.
But somehow, substr() is not working properly.
My identification variables are string formats (with spaces
between them somethimes).
I go:
. use hhid using amhr41rt
. so hhid
. ta hhid in 1/10
Case |
Identificati |
on | Freq. Percent Cum.
-------------+-----------------------------------
1 2 | 1 10.00 10.00
1 3 | 1 10.00 20.00
1 5 | 1 10.00 30.00
1 6 | 1 10.00 40.00
1 7 | 1 10.00 50.00
1 8 | 1 10.00 60.00
1 9 | 1 10.00 70.00
1 10 | 1 10.00 80.00
1 11 | 1 10.00 90.00
1 12 | 1 10.00 100.00
-------------+-----------------------------------
Total | 10 100.00
. ta hhid in 5970/5980
Case |
Identificati |
on | Freq. Percent Cum.
-------------+-----------------------------------
260 19 | 1 9.09 9.09
260 20 | 1 9.09 18.18
260 21 | 1 9.09 27.27
260 22 | 1 9.09 36.36
260 23 | 1 9.09 45.45
260 24 | 1 9.09 54.55
260 25 | 1 9.09 63.64
260 26 | 1 9.09 72.73
260 27 | 1 9.09 81.82
260 28 | 1 9.09 90.91
260 29 | 1 9.09 100.00
-------------+-----------------------------------
Total | 11 100.00
. g le=length(hhid)
. ta le
le | Freq. Percent Cum.
------------+-----------------------------------
12 | 5,980 100.00 100.00
------------+-----------------------------------
Total | 5,980 100.00
. ta hh
(output omitted)
Max observation: 260 29
. use caseid using amir41rt, clear
. so ca
. ta c in 1/10
Case |
Identification | Freq. Percent Cum.
----------------+-----------------------------------
1 3 1 | 1 10.00 10.00
1 5 3 | 1 10.00 20.00
1 7 2 | 1 10.00 30.00
1 8 2 | 1 10.00 40.00
1 9 4 | 1 10.00 50.00
1 10 2 | 1 10.00 60.00
1 10 3 | 1 10.00 70.00
1 11 2 | 1 10.00 80.00
1 12 2 | 1 10.00 90.00
1 13 2 | 1 10.00 100.00
----------------+-----------------------------------
Total | 10 100.00
. ta c in 6420/6430
Case |
Identification | Freq. Percent Cum.
----------------+-----------------------------------
260 18 7 | 1 9.09 9.09
260 21 2 | 1 9.09 18.18
260 22 2 | 1 9.09 27.27
260 22 4 | 1 9.09 36.36
260 24 4 | 1 9.09 45.45
260 25 2 | 1 9.09 54.55
260 25 3 | 1 9.09 63.64
260 26 2 | 1 9.09 72.73
260 26 3 | 1 9.09 81.82
260 27 3 | 1 9.09 90.91
260 29 4 | 1 9.09 100.00
----------------+-----------------------------------
Total | 11 100.00
. ta caseid
(output omitted)
max obs: 260 29 4
. g le=length(c)
. ta le
le | Freq. Percent Cum.
------------+-----------------------------------
15 | 6,430 100.00 100.00
------------+-----------------------------------
Total | 6,430 100.00
. g str9 hhid = substr(c, 1, 12)
. ta hhid
(output omitted)
Max obs: 260
What is the problem?
Is there a way to get the correct hhid?
Best regards.
Amadou.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/