Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Median calculation for interval data


From   Chris Ruebeck <[email protected]>
To   [email protected]
Subject   Re: st: RE: Median calculation for interval data
Date   Mon, 12 Mar 2007 09:24:20 -0400

Thanks for the help.

Chris

On Mar 9, 2007, at 6:13 AM, Nick Cox wrote:

One specific comment is that using -preserve- is
a bit heavy-handed for a problem like this.

A more general comment is that
gearing your program closely to the immediate details of
the problem -- even on a cosmetic level like option names
such as -hholds()- -- makes it more difficult to see the
generic problem -- and to reuse this code later for some
related problem.

I would base this more on what -summarize-
will do for you in any case. As I understand it,
you have a variable and what are in effect frequency
weights. The frequency weights are in your data
the number of households in each interval.

su myvar [fw=freq], detail

leaves behind r(p50).

Then you can count, e.g.

su freq if myvar < r(p50), meanonly
local lower = r(sum)
su freq if myvar = r(p50), meanonly
local equal = r(sum)
su freq if myvar > r(p50), meanonly
local upper = r(sum)

A program might start looking like this:

program imedian, rclass
	version 8
	syntax varname(numeric) [fweight aweight] [if] [in] ///
	, lower(varname numeric)

	marksample touse
	qui count if `touse'
	if r(N) == 0 error 2000

	quietly {
		tempvar wt
		tempname median fu fe fl

		su `varlist' if `touse', detail
		scalar `median' = r(p50)

		gen double `wt' = `exp' if `touse'
		su `wt' if `varlist' > r(p50) & `touse', meanonly
		scalar `fu' = r(sum)
		su `wt' if `varlist' == r(p50) & `touse', meanonly
		scalar `fe' = r(sum)
		su `wt' if `varlist' < r(p50) & `touse', meanonly
		scalar `fl' = r(sum)

		levelsof `lower' if `touse', local(levels)
		tokenize `levels'
		local i = 1
		while `median' > ``i'' {
			local ++i
		}
		local lo = ``i''
		local ++i
		local hi = ``i''

		scalar `median' = <your formula>
	}

	di as txt "median: " as res `median'
	return scalar median = `median'
end 	

Nick
[email protected]

Chris Ruebeck

I have written an ado file to calculate a version of the median for
interval data as described below.  A synopsis: when there are many
observations with the median value, we may believe there is some
information in the distribution of observations above, below, and
within the median value.

My question for Statalist: is there an existing Stata ado
file that I
could have used?

I would also appreciate any comments on the method that I used.

Method: Calculate the fraction of the median interval above the
median interval's lower bound necessary to have half of all
observations above and half below, assuming that in the median
interval the observations are evenly distributed.  Thus, if
there are
25 observations above the median interval and 75 observations below
it, 80 observations in it, and the median interval is [10, 15), then
the "median" is

	10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.

capture program drop intervalMedian
program define intervalMedian, rclass
	syntax if, ///
		  lowlim(varname numeric) /// Lower limit of interval
		  hholds(varname numeric) // Number of
households in this interval
	preserve
	marksample touse
	keep if `touse'
	keep `lowlim' `hholds'
	tempvar runSum /// The runing sum
		  markMed /// 0, -1, 2 marker for below, median, above
		  upper // upper limit of interval

	// Get the upper limit for each one
	sort `lowlim'
	generate `upper' = `lowlim'[_n+1]

	// Find the median interval
	generate `runSum' = sum(`hholds') // Final observation is total
	local halfObs = `runSum'[_N]/2 // The index of the median
	generate `markMed' = `runSum' - `halfObs' // Negative
below median
interval
	replace `markMed' = cond(`markMed'<0,0,2) // Marks at &
above median
interval
	sort `markMed' `lowlim' // Already in this order, but
Stata doesn't
know
	by `markMed': replace `markMed' = -1 if _n==1 &
`markMed'==2 // The
median

	// Collect values necessary for calculation (could be 1
line instead
of 3)
	sort `markMed' // The median interval is now the first
observation
	local countBelow = `runSum' - `hholds' // # below
median interval
	local intervalBelow = `halfObs' - `countBelow' // #
below median in
interval
	local theMedian = (`intervalBelow'/`hholds')*(`upper' -
`lowlim') +
`lowlim'
	return local median `theMedian'
	display "The median: `theMedian'"

	restore
end
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index