| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Median calculation for interval data
I have written an ado file to calculate a version of the median for
interval data as described below. A synopsis: when there are many
observations with the median value, we may believe there is some
information in the distribution of observations above, below, and
within the median value.
My question for Statalist: is there an existing Stata ado file that I
could have used?
I would also appreciate any comments on the method that I used.
Thanks,
Chris
Method: Calculate the fraction of the median interval above the
median interval's lower bound necessary to have half of all
observations above and half below, assuming that in the median
interval the observations are evenly distributed. Thus, if there are
25 observations above the median interval and 75 observations below
it, 80 observations in it, and the median interval is [10, 15), then
the "median" is
10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.
capture program drop intervalMedian
program define intervalMedian, rclass
syntax if, ///
lowlim(varname numeric) /// Lower limit of interval
hholds(varname numeric) // Number of households in this interval
preserve
marksample touse
keep if `touse'
keep `lowlim' `hholds'
tempvar runSum /// The runing sum
markMed /// 0, -1, 2 marker for below, median, above
upper // upper limit of interval
// Get the upper limit for each one
sort `lowlim'
generate `upper' = `lowlim'[_n+1]
// Find the median interval
generate `runSum' = sum(`hholds') // Final observation is total
local halfObs = `runSum'[_N]/2 // The index of the median
generate `markMed' = `runSum' - `halfObs' // Negative below median
interval
replace `markMed' = cond(`markMed'<0,0,2) // Marks at & above median
interval
sort `markMed' `lowlim' // Already in this order, but Stata doesn't
know
by `markMed': replace `markMed' = -1 if _n==1 & `markMed'==2 // The
median
// Collect values necessary for calculation (could be 1 line instead
of 3)
sort `markMed' // The median interval is now the first observation
local countBelow = `runSum' - `hholds' // # below median interval
local intervalBelow = `halfObs' - `countBelow' // # below median in
interval
local theMedian = (`intervalBelow'/`hholds')*(`upper' - `lowlim') +
`lowlim'
return local median `theMedian'
display "The median: `theMedian'"
restore
end
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/