The one thing that looks unwise is
g str9 hhid = substr(c, 1, 12)
Stata will have difficulties in putting
character strings of length 12 in a -str9-
variable.
In (British) English, we say that you
can't get a quart in a pint pot. This
is a similar case. (Note for those
with rational units: 1 quart = 2 pints.)
g hhid = substr(c,1,12)
should be sufficient in Stata 8 up.
Nick
[email protected]
[email protected]
> Hi, I want to get the household ID from a DHS women file.
> But somehow, substr() is not working properly.
> My identification variables are string formats (with spaces
> between them somethimes).
>
>
> I go:
>
> . use hhid using amhr41rt
>
> . so hhid
>
> . ta hhid in 1/10
>
> Case |
> Identificati |
> on | Freq. Percent Cum.
> -------------+-----------------------------------
> 1 2 | 1 10.00 10.00
> 1 3 | 1 10.00 20.00
> 1 5 | 1 10.00 30.00
> 1 6 | 1 10.00 40.00
> 1 7 | 1 10.00 50.00
> 1 8 | 1 10.00 60.00
> 1 9 | 1 10.00 70.00
> 1 10 | 1 10.00 80.00
> 1 11 | 1 10.00 90.00
> 1 12 | 1 10.00 100.00
> -------------+-----------------------------------
> Total | 10 100.00
>
> . ta hhid in 5970/5980
>
> Case |
> Identificati |
> on | Freq. Percent Cum.
> -------------+-----------------------------------
> 260 19 | 1 9.09 9.09
> 260 20 | 1 9.09 18.18
> 260 21 | 1 9.09 27.27
> 260 22 | 1 9.09 36.36
> 260 23 | 1 9.09 45.45
> 260 24 | 1 9.09 54.55
> 260 25 | 1 9.09 63.64
> 260 26 | 1 9.09 72.73
> 260 27 | 1 9.09 81.82
> 260 28 | 1 9.09 90.91
> 260 29 | 1 9.09 100.00
> -------------+-----------------------------------
> Total | 11 100.00
>
> . g le=length(hhid)
>
> . ta le
>
> le | Freq. Percent Cum.
> ------------+-----------------------------------
> 12 | 5,980 100.00 100.00
> ------------+-----------------------------------
> Total | 5,980 100.00
>
> . ta hh
> (output omitted)
> Max observation: 260 29
>
>
> . use caseid using amir41rt, clear
>
> . so ca
>
> . ta c in 1/10
>
> Case |
> Identification | Freq. Percent Cum.
> ----------------+-----------------------------------
> 1 3 1 | 1 10.00 10.00
> 1 5 3 | 1 10.00 20.00
> 1 7 2 | 1 10.00 30.00
> 1 8 2 | 1 10.00 40.00
> 1 9 4 | 1 10.00 50.00
> 1 10 2 | 1 10.00 60.00
> 1 10 3 | 1 10.00 70.00
> 1 11 2 | 1 10.00 80.00
> 1 12 2 | 1 10.00 90.00
> 1 13 2 | 1 10.00 100.00
> ----------------+-----------------------------------
> Total | 10 100.00
>
> . ta c in 6420/6430
>
> Case |
> Identification | Freq. Percent Cum.
> ----------------+-----------------------------------
> 260 18 7 | 1 9.09 9.09
> 260 21 2 | 1 9.09 18.18
> 260 22 2 | 1 9.09 27.27
> 260 22 4 | 1 9.09 36.36
> 260 24 4 | 1 9.09 45.45
> 260 25 2 | 1 9.09 54.55
> 260 25 3 | 1 9.09 63.64
> 260 26 2 | 1 9.09 72.73
> 260 26 3 | 1 9.09 81.82
> 260 27 3 | 1 9.09 90.91
> 260 29 4 | 1 9.09 100.00
> ----------------+-----------------------------------
> Total | 11 100.00
>
> . ta caseid
> (output omitted)
> max obs: 260 29 4
>
>
> . g le=length(c)
>
> . ta le
>
> le | Freq. Percent Cum.
> ------------+-----------------------------------
> 15 | 6,430 100.00 100.00
> ------------+-----------------------------------
> Total | 6,430 100.00
>
> . g str9 hhid = substr(c, 1, 12)
>
> What is the problem?
> Is there a way to get the correct hhid?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/