One specific comment is that using -preserve- is
a bit heavy-handed for a problem like this.
A more general comment is that
gearing your program closely to the immediate details of
the problem -- even on a cosmetic level like option names
such as -hholds()- -- makes it more difficult to see the
generic problem -- and to reuse this code later for some
related problem.
I would base this more on what -summarize-
will do for you in any case. As I understand it,
you have a variable and what are in effect frequency
weights. The frequency weights are in your data
the number of households in each interval.
su myvar [fw=freq], detail
leaves behind r(p50).
Then you can count, e.g.
su freq if myvar < r(p50), meanonly
local lower = r(sum)
su freq if myvar = r(p50), meanonly
local equal = r(sum)
su freq if myvar > r(p50), meanonly
local upper = r(sum)
A program might start looking like this:
program imedian, rclass
version 8
syntax varname(numeric) [fweight aweight] [if] [in] ///
, lower(varname numeric)
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
quietly {
tempvar wt
tempname median fu fe fl
su `varlist' if `touse', detail
scalar `median' = r(p50)
gen double `wt' = `exp' if `touse'
su `wt' if `varlist' > r(p50) & `touse', meanonly
scalar `fu' = r(sum)
su `wt' if `varlist' == r(p50) & `touse', meanonly
scalar `fe' = r(sum)
su `wt' if `varlist' < r(p50) & `touse', meanonly
scalar `fl' = r(sum)
levelsof `lower' if `touse', local(levels)
tokenize `levels'
local i = 1
while `median' > ``i'' {
local ++i
}
local lo = ``i''
local ++i
local hi = ``i''
scalar `median' = <your formula>
}
di as txt "median: " as res `median'
return scalar median = `median'
end
Nick
[email protected]
Chris Ruebeck
I have written an ado file to calculate a version of the median for
interval data as described below. A synopsis: when there are many
observations with the median value, we may believe there is some
information in the distribution of observations above, below, and
within the median value.
My question for Statalist: is there an existing Stata ado
file that I
could have used?
I would also appreciate any comments on the method that I used.
Method: Calculate the fraction of the median interval above the
median interval's lower bound necessary to have half of all
observations above and half below, assuming that in the median
interval the observations are evenly distributed. Thus, if
there are
25 observations above the median interval and 75 observations below
it, 80 observations in it, and the median interval is [10, 15), then
the "median" is
10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.
capture program drop intervalMedian
program define intervalMedian, rclass
syntax if, ///
lowlim(varname numeric) /// Lower limit of interval
hholds(varname numeric) // Number of
households in this interval
preserve
marksample touse
keep if `touse'
keep `lowlim' `hholds'
tempvar runSum /// The runing sum
markMed /// 0, -1, 2 marker for below, median, above
upper // upper limit of interval
// Get the upper limit for each one
sort `lowlim'
generate `upper' = `lowlim'[_n+1]
// Find the median interval
generate `runSum' = sum(`hholds') // Final observation is total
local halfObs = `runSum'[_N]/2 // The index of the median
generate `markMed' = `runSum' - `halfObs' // Negative
below median
interval
replace `markMed' = cond(`markMed'<0,0,2) // Marks at &
above median
interval
sort `markMed' `lowlim' // Already in this order, but
Stata doesn't
know
by `markMed': replace `markMed' = -1 if _n==1 &
`markMed'==2 // The
median
// Collect values necessary for calculation (could be 1
line instead
of 3)
sort `markMed' // The median interval is now the first
observation
local countBelow = `runSum' - `hholds' // # below
median interval
local intervalBelow = `halfObs' - `countBelow' // #
below median in
interval
local theMedian = (`intervalBelow'/`hholds')*(`upper' -
`lowlim') +
`lowlim'
return local median `theMedian'
display "The median: `theMedian'"
restore
end
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/