|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Chi-square test for Categorical Data Analysis
[This email did not appear to go through the first time, and I
apologize if this is the second time you are seeing it.]
Hugh,
Another possibility is to approximate a continuous measure of income
by using the midpoint of the range as the actual value. So, for
example, the first value would be $12,500, the second would be
$75,000, etc.
To estimate the midpoint of the uppermost, open-ended distribution
($500,001 or more), you can calculate the Pareto Curve from Pareto's
Law of Income Distribution (see p. 874 of reference below). Of
course, both steps require making assumptions about the distribution
of income in your sample, so proceed with caution.
Then you can calculate and report estimated group means for A and B
and the difference between them, and you can also do the usual
difference-of-means hypothesis testing.
Below is a short program I wrote for a categorical income measure
with 11 categories, where the 10th category is $150,000-$199,999 and
the 11th category is $200,000 or more. I did my best to accurately
implement the formula, but I am still a novice Stata programmer and
not 100% sure it is correct. (I am certain it could be written more
elegantly and parsimoniously, but that will have to wait for another
day.)
David
* Pareto's curve calculation
tempvar income_a income_b income_c income_d income_v
gen double `income_a'=log10(175000) /* Log of midpoint of interval
before open ended category */
gen double `income_b'=log10(200000) /* Log of lower limit of open
interval category */
count if inlist(income,10,11)
local c_numerator=r(N)
count if !mi(income)
local valid_n=r(N)
gen double `income_c'=log10(100*(`c_numerator'/`valid_n')) /* log of
sum of pr. for open intvl & preceding intvl */
count if inlist(income,11)
local d_numerator=r(N)
gen double `income_d'=log10(100*(`d_numerator'/`valid_n')) /* log of
sum of pr. for open intvl only */
gen double `income_v'=(`income_c'-`income_d')/(`income_b'-`income_a')
scalar income_p=round(200000*`income_v'/(`income_v'-1))
recode income 1=5000 2=15000 3=27500 4=42500 5=57500 6=72500 7=90000
8=112500 9=137500 10=175000 11=99, gen(income_midpt)
replace income_midpt=income_p if income_midpt==99
Parker, R. N., & Fenwick, R. (1983). The Pareto curve and its utility
for open-ended income distributions in survey research. Social
Forces, Vol. 61, No. 3, 872-885.
http://www.jstor.org/view/00377732/di010900/01p0014t/0
At 4:40 PM -0400 9/18/07, Hugh Colaco wrote:
Participants were asked about their income level and had to choose one
from below. Assume the income ranges are:-
$0 - $25,000
$25,001 - $50,000
$50,001 - $100,000
$100,001 - $150,000
$150,001 - $200,000
$200,001 - $500,000
$500,001 or more
Rather than report so many income ranges, I would now like to report
just two, based on the median of all 150 participants. So, I will have
4 groups in all (i.e. Group A below median income, Group A equal to or
above the median income, Group B below median income, Group B equal to
or above the median income).
--
David Radwin, Principal Analyst // [email protected]
Office of Student Research, University of California, Berkeley
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/