Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Grouping income variables- RECODE COMMAND
From
"Antonio Rodriguez Andres" <[email protected]>
To
<[email protected]>
Subject
RE: st: Grouping income variables- RECODE COMMAND
Date
Tue, 4 Feb 2014 13:29:54 +0200
Nıck
In the ESS in 2006, the total household income is grouped into 12 categories associated with different weekly, monthly, and annual ranges. For instance for letter J (less than 1 800 euros), R (1800 TO UNDER 3600), etc.
tab hinctnt
Household's |
total net |
income, all |
sources | Freq. Percent Cum.
------------+-----------------------------------
J | 1,348 4.07 4.07
R | 1,353 4.08 8.15
C | 1,968 5.94 14.10
M | 3,067 9.26 23.35
F | 3,000 9.06 32.41
S | 2,934 8.86 41.27
K | 2,733 8.25 49.52
P | 2,682 8.10 57.62
D | 4,432 13.38 71.00
H | 1,962 5.92 76.92
U | 608 1.84 78.76
N | 396 1.20 79.95
Refusal | 3,958 11.95 91.90
Don't know | 2,549 7.70 99.60
No answer | 134 0.40 100.00
------------+-----------------------------------
Total | 33,124 100.00
ab hinctnt, nolabel
Household's |
total net |
income, all |
sources | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,348 4.07 4.07
2 | 1,353 4.08 8.15
3 | 1,968 5.94 14.10
4 | 3,067 9.26 23.35
5 | 3,000 9.06 32.41
6 | 2,934 8.86 41.27
7 | 2,733 8.25 49.52
8 | 2,682 8.10 57.62
9 | 4,432 13.38 71.00
10 | 1,962 5.92 76.92
11 | 608 1.84 78.76
12 | 396 1.20 79.95
77 | 3,958 11.95 91.90
88 | 2,549 7.70 99.60
99 | 134 0.40 100.00
------------+-----------------------------------
Total | 33,124 100.00
First of all, I recode the household income variable using mıd-points. The problem is defining a midpoint for the open ended top category. For that purpose, I follow Hout (2004).
*Create income midpoints
recode hinctnt (1=900) (2=2700) (3=4800) (4=9000) (5=15000) (6=21000) (7=27000) (8= 33000) (9=48000) (10=75000) (11=105000) (12= 175200) , gen(hincome)
replace hincome=. if hinctnt==77 | hinctnt==88 | hinctnt==99 // I recode hinctnt= 77 & 88 & 99 (Don’t Know, Refusal, No answer) as missing values
gen lhincome=log(hincome)
I also need to include in my regression a dummy variable for the mıssing values corresponding to income. I type in Stata.
gen missinc=0
replace missinc=1 if missing(hincome)
When estimating the following model, the dummy variable for missing values for income is dropped but ıt has to be in my model. Is there anything wrong with the Stata code?
xtm. xtmixed dprt age age2 gender married separated divorced widowed eduyrs ichldhm interaction missinc lhincome ihealth iuemp5yr iuemp12m rgdp06[pw=dweight] || cntry: gender , mle
note: missinc omitted because of collinearity
(29900 missing values generated)
Obtaining starting values by EM:
Mixed-effects regression Number of obs = 7603
Group variable: cntry Number of groups = 20
Obs per group: min = 156
avg = 380.1
max = 698
Wald chi2(15) = 1601.80
Log pseudolikelihood = -20437.471 Prob > chi2 = 0.0000
(Std. Err. adjusted for 20 clusters in cntry)
------------------------------------------------------------------------------
| Robust
dprt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .076018 .026928 2.82 0.005 .0232401 .1287959
age2 | -.0009855 .0002563 -3.84 0.000 -.0014879 -.0004831
gender | -.3724799 .1183211 -3.15 0.002 -.6043849 -.1405749
married | -.6450081 .1533621 -4.21 0.000 -.9455923 -.3444239
separated | .5868276 .2951732 1.99 0.047 .0082988 1.165356
divorced | .1042908 .1962848 0.53 0.595 -.2804203 .4890018
widowed | 1.208098 .2994334 4.03 0.000 .6212191 1.794976
eduyrs | -.0146007 .0143999 -1.01 0.311 -.042824 .0136225
ichldhm | .1518147 .1852086 0.82 0.412 -.2111874 .5148168
interaction | -.3089155 .2124631 -1.45 0.146 -.7253355 .1075044
missinc | 0 (omitted)
lhincome | -.6049375 .0732486 -8.26 0.000 -.7485022 -.4613728
ihealth | -1.672027 .0842643 -19.84 0.000 -1.837182 -1.506872
iuemp5yr | .2910945 .1074106 2.71 0.007 .0805737 .5016153
iuemp12m | .3892144 .1335323 2.91 0.004 .1274958 .650933
rgdp06 | 6.49e-06 .0000108 0.60 0.547 -.0000146 .0000276
_cons | 17.25528 .9936399 17.37 0.000 15.30778 19.20278
----------------------------------------------------
Antonio
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Sunday, February 02, 2014 10:59 AM
To: [email protected]
Subject: Re: st: Grouping income variables- RECODE COMMAND
Your -recode- mapped 1,...,11 to 1,...,11, which makes precisely no progress with the main problem. As I understand what you want, you need something more like
recode hinctnt 1=40 2=70 3=130 ...
Nick
[email protected]
On 1 February 2014 19:43, Antonio Rodriguez Andres <[email protected]> wrote:
> Nıck
>
> You are right. But ıf I type the following code
>
> recode hinctnt (1=1 "1st interval") (2=2 "2nd interval") (3=3 "3rd
> interval") (4=4 "4th interval") (5=5 "5th interval") (6=6 "6th
> interval") (7=7 "7th interval") (8=8 "8th interval") (9=9 "9th
> interval") (10=10 "10th interval") (11=11 "11th interval") (12=12
> "12th interval") (.=.m "Missing") (77=.r "Refusal") (88=.d "Don't
> Know") (99=.s "Not answer"), gen (ihinctnt)
>
> I generate a new variable ihinctnt. Then I tabulated and I compute
> summary statistics. But these are not incomes. I should specify the
> upper and lower linıt for each interval. How can I do it
>
>
> tab ihinctnt, missing
>
> RECODE of
> hinctnt
> (Household's
> total net
> income, all
> sources) Freq. Percent Cum.
>
> 1st interval 1,663 3.87 3.87
> 2nd interval 1,561 3.63 7.50
> 3rd interval 2,262 5.26 12.76
> 4th interval 3,676 8.55 21.31
> 5th interval 3,545 8.24 29.55
> 6th interval 3,293 7.66 37.21
> 7th interval 3,010 7.00 44.21
> 8th interval 2,871 6.68 50.89
> 9th interval 4,707 10.95 61.83
> 10th interval 2,058 4.79 66.62
> 11th interval 644 1.50 68.12
> 12th interval 428 1.00 69.11
> Don't Know 3,540 8.23 77.34
> Missing 5,037 11.71 89.06
> Refusal 4,525 10.52 99.58
> Not answer 180 0.42 100.00
>
> Total 43,000 100.00
>
> . summ ihinctnt
>
> Variable Obs Mean Std. Dev. Min Max
>
> ihinctnt 29718 6.156504 2.75604 1 12
>
> . summ ihinctnt,d
>
> RECODE of hinctnt (Household's total net income, all sources)
>
> Percentiles Smallest
> 1% 1 1
> 5% 1 1
> 10% 2 1 Obs 29718
> 25% 4 1 Sum of Wgt. 29718
>
> 50% 6 Mean 6.156504
> Largest Std. Dev. 2.75604
> 75% 9 12
> 90% 10 12 Variance 7.595757
> 95% 10 12 Skewness -.080652
> 99% 12 12 Kurtosis 2.098037
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Saturday, February 01, 2014 9:17 PM
> To: [email protected]
> Subject: Re: st: Grouping income variables- RECODE COMMAND
>
> The numeric values of -hinctnt- don't exceed 99. They are evidently numeric codes, not incomes. So, why you are surprised at your results?
> You have to -recode- your data before you can classify them. And that means the -recode- command.
> Nick
> [email protected]
>
>
> On 1 February 2014 18:14, Antonio Rodriguez Andres <[email protected]> wrote:
>> Here you can see the basic description of the income variable
>>
>> tab hinctnt
>>
>> Household's |
>> total net |
>> income, all |
>> sources | Freq. Percent Cum.
>> ------------+-----------------------------------
>> J | 1,663 4.38 4.38
>> R | 1,561 4.11 8.49
>> C | 2,262 5.96 14.45
>> M | 3,676 9.68 24.13
>> F | 3,545 9.34 33.47
>> S | 3,293 8.67 42.15
>> K | 3,010 7.93 50.08
>> P | 2,871 7.56 57.64
>> D | 4,707 12.40 70.04
>> H | 2,058 5.42 75.46
>> U | 644 1.70 77.15
>> N | 428 1.13 78.28
>> Refusal | 4,525 11.92 90.20
>> Don't know | 3,540 9.32 99.53
>> No answer | 180 0.47 100.00
>> ------------+-----------------------------------
>> Total | 37,963 100.00
>>
>>
>> sum hinctnt, d
>>
>> Household's total net income, all sources
>> -------------------------------------------------------------
>> Percentiles Smallest
>> 1% 1 1
>> 5% 2 1
>> 10% 3 1 Obs 37963
>> 25% 5 1 Sum of Wgt. 37963
>>
>> 50% 7 Mean 22.67271
>> Largest Std. Dev. 31.57352
>> 75% 10 99
>> 90% 77 99 Variance 996.8872
>> 95% 88 99 Skewness 1.378759
>> 99% 88 99 Kurtosis 2.984444
>>
>> .
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Saturday, February 01, 2014 7:52 PM
>> To: [email protected]
>> Subject: Re: st: Grouping income variables- RECODE COMMAND
>>
>> Your code shows you using the -recode()- function, which is quite different from the -recode- command. In Stata functions and commands are different!
>>
>> I think that to comment helpfully we need to see more about your
>> -hinctnt-, for example, the results of
>>
>> . su hinctnt, detail
>>
>> Your categories are not disjoint as (e.g.) the definitions [70, 120] and [120, 230] leave ambiguous what happens with 120. Alternatively, your notation here confuses the meaning of [ ] and ( ).
>> Nick
>> [email protected]
>>
>>
>> On 1 February 2014 17:29, Antonio Rodriguez Andres <[email protected]> wrote:
>>> Dear Stata users,
>>>
>>> I have to group the income variable in different intervals. In the
>>> original dataset, the household income variable is grouped İnto 12
>>> categories
>>>
>>> J <40
>>> R [40,70]
>>> C [70, 120]
>>> M [120, 230]
>>> F [230, 350]
>>> S
>>> K
>>> P
>>> D
>>> H
>>> U [1730, 2310)
>>> N > 2310
>>>
>>> I want to group J and R categories <70 Euros, and create dummy
>>> variables for all income groups. That is the Stata ouput. I used the
>>> recode command But it does not work
>>>
>>> gen hinc_gr=recode(hinctnt, 70, 120, 230, 350, 460, 580, 690, 1150,
>>> 1730,
>>> 2310)
>>> (13282 missing values generated)
>>>
>>> . tab hinc_gr
>>>
>>> hinc_gr | Freq. Percent Cum.
>>> ------------+-----------------------------------
>>> 70 | 29,718 100.00 100.00
>>> ------------+-----------------------------------
>>> Total | 29,718 100.00
>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/