One specific comment is that using -preserve- is
a bit heavy-handed for a problem like this.
A more general comment is that
gearing your program closely to the immediate details of
the problem -- even on a cosmetic level like option names
such as -hholds()- -- makes it more difficult to see the
generic problem -- and to reuse this code later for some
related problem.
I would base this more on what -summarize-
will do for you in any case. As I understand it,
you have a variable and what are in effect frequency
weights. The frequency weights are in your data
the number of households in each interval.
su myvar [fw=freq], detail
leaves behind r(p50).
Then you can count, e.g.
su freq if myvar < r(p50), meanonly
local lower = r(sum)
su freq if myvar = r(p50), meanonly
local equal = r(sum)
su freq if myvar > r(p50), meanonly
local upper = r(sum)
A program might start looking like this:
program imedian, rclass
version 8
syntax varname(numeric) [fweight aweight] [if] [in] ///
, lower(varname numeric)
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
quietly {
tempvar wt
tempname median fu fe fl
su `varlist' if `touse', detail
scalar `median' = r(p50)
gen double `wt' = `exp' if `touse'
su `wt' if `varlist' > r(p50) & `touse', meanonly
scalar `fu' = r(sum)
su `wt' if `varlist' == r(p50) & `touse', meanonly
scalar `fe' = r(sum)
su `wt' if `varlist' < r(p50) & `touse', meanonly
scalar `fl' = r(sum)
levelsof `lower' if `touse', local(levels)
tokenize `levels'
local i = 1
while `median' > ``i'' {
local ++i
}
local lo = ``i''
local ++i
local hi = ``i''
scalar `median' = <your formula>
}
di as txt "median: " as res `median'
return scalar median = `median'
end
Nick
[email protected]
Chris Ruebeck
> I have written an ado file to calculate a version of the median for
> interval data as described below. A synopsis: when there are many
> observations with the median value, we may believe there is some
> information in the distribution of observations above, below, and
> within the median value.
>
> My question for Statalist: is there an existing Stata ado
> file that I
> could have used?
>
> I would also appreciate any comments on the method that I used.
> Method: Calculate the fraction of the median interval above the
> median interval's lower bound necessary to have half of all
> observations above and half below, assuming that in the median
> interval the observations are evenly distributed. Thus, if
> there are
> 25 observations above the median interval and 75 observations below
> it, 80 observations in it, and the median interval is [10, 15), then
> the "median" is
>
> 10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.
>
> capture program drop intervalMedian
> program define intervalMedian, rclass
> syntax if, ///
> lowlim(varname numeric) /// Lower limit of interval
> hholds(varname numeric) // Number of
> households in this interval
> preserve
> marksample touse
> keep if `touse'
> keep `lowlim' `hholds'
> tempvar runSum /// The runing sum
> markMed /// 0, -1, 2 marker for below, median, above
> upper // upper limit of interval
>
> // Get the upper limit for each one
> sort `lowlim'
> generate `upper' = `lowlim'[_n+1]
>
> // Find the median interval
> generate `runSum' = sum(`hholds') // Final observation is total
> local halfObs = `runSum'[_N]/2 // The index of the median
> generate `markMed' = `runSum' - `halfObs' // Negative
> below median
> interval
> replace `markMed' = cond(`markMed'<0,0,2) // Marks at &
> above median
> interval
> sort `markMed' `lowlim' // Already in this order, but
> Stata doesn't
> know
> by `markMed': replace `markMed' = -1 if _n==1 &
> `markMed'==2 // The
> median
>
> // Collect values necessary for calculation (could be 1
> line instead
> of 3)
> sort `markMed' // The median interval is now the first
> observation
> local countBelow = `runSum' - `hholds' // # below
> median interval
> local intervalBelow = `halfObs' - `countBelow' // #
> below median in
> interval
> local theMedian = (`intervalBelow'/`hholds')*(`upper' -
> `lowlim') +
> `lowlim'
> return local median `theMedian'
> display "The median: `theMedian'"
>
> restore
> end
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/