There are several slightly different recipes for
this line. Tukey used similar ideas around the
time of his Exploratory data analysis (1977),
and there is an older literature going back
at least to the 1940s. The key point of most
of the recipes I have seen is that they
are amenable to hand calculation, insofar as the x and y
medians of each group can be determined by
eye on a scatter plot for modest sample sizes.
So in a sense I think it's arguable that the
method has been superseded by quantile regression.
It is indeed not (guaranteed to be) exactly the
same as quantile regression. (However, is it
true that a quantile regression necessarily passes
through (median of x, median of y)? I doubt it.)
I am not aware of a Stata implementation.
Still, it is possible to make a hack at one.
*! NJC 1.0.0 9 February 2005
program resline
version 8
syntax varlist(min=2 max=2) [if] [in] [, * ]
quietly {
marksample touse
count if `touse'
if r(N) == 0 error 2000
tokenize `varlist'
args y x
tempvar cut
egen `cut' = cut(`x') if `touse', group(3)
su `y' if `cut' == 0, detail
local y0 = r(p50)
su `y' if `cut' == 1, detail
local y1 = r(p50)
su `y' if `cut' == 2, detail
local y2 = r(p50)
su `x' if `cut' == 0, detail
local x0 = r(p50)
su `x' if `cut' == 1, detail
local x1 = r(p50)
su `x' if `cut' == 2, detail
local x2 = r(p50)
local slope = ((`y2') - (`y0')) / ((`x2') - (`x0'))
if `slope' == . {
di as err "no go: slope indeterminate"
exit 498
}
local intercept = ((`y2') + (`y1') + (`y0')) / 3
if `intercept' == . {
di as err "no go: intercept indeterminate"
exit 498
}
}
di
di as txt "slope" "{col 12}" as res %12.3f `slope'
local b : di %4.3f `slope'
di as txt "y summary" "{col 12}" as res %12.3f `intercept'
local a : di %4.3f `intercept'
local X1 : di %4.3f `x1'
twoway function resistant = ///
`intercept' + `slope' * (x - `x1'), ///
range(`x') t1(`y' = `a' + `b' * (`x' - `x1')) ///
|| scatter `y' `x' if `touse', `options'
end
e.g. resline mpg weight
Nick
[email protected]
Faith Anne
> I need to calculate a specific type of line through a two-variable
> dataset. In exploratory data analysis, what I need is called a
> resistant line. In my high school classes, we called it a
> median-median line. The way it's calculated is to divide the data into
> three groups, find the x-median and y-median values (called the
> summary point) for each group, and then use those three summary points
> to determine the line. The outer two summary points determine the
> slope, and an average of all of them determines the intercept.
>
> As far as I can tell, this isn't quite the same as the quantile
> regression command, because the resistant line doesn't necessarily go
> through the median of the whole dataset. In the resistant line
> calculation, you ignore all information besides the summary points, so
> you don't actually take into account the absolute deviations and try
> to minimize them. Someone please correct me if I have misunderstood
> this!
>
> I'm aware of the pros and cons of this method as compared to least
> squares linear regression, but I am required to do this analysis and
> compare it to least squares. Minitab can do this through its menu of
> EDA commands, but I'm deeply frustrated with Minitab's data management
> and graphing, so I'd really like to know how to do this with Stata.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/