Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: difference in medians . Raw vs calculated


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: difference in medians . Raw vs calculated
Date   Sun, 20 Jan 2013 11:29:48 -0500

You are welcome, Richard.  By the way, I was wrong about p50: Stata computes the "right" median of 2180 for all the commands that I tried: -tabstat-,  -centile-, and -summarize, detail-.  


Steve

On Jan 19, 2013, at 10:29 PM, Richard Hiscock wrote:

Steve
thankyou for your quick & very helpful response
Im sorry that I posted the cid analysis as given as it should have been the cid weight,by(foreign) unpaired median

cheers Richard


On 20 Jan 2013, at 13:02, Steve Samuels <[email protected]> wrote:

> 
> Richard-
> 
> Thanks for illustrating your problem with an accessible data set. Too
> few posters do.  That said, nothing strange is going on here.
> 
> 1. -cendif- estimates the "generalized Hodges-Lehmann median
> difference", which is the median of possible draws of two observations,
> one from each population. This is not the same as the "difference in
> medians".
> 
> 2. The output for -cid- clearly states that the command is computing a
> difference in means, not medians.
> 
> 3. Example 1 in the -help- for -qreg- discusses why the estimated regression coefficient might not be the difference in medians.
> 
> 4. Roger Newson's -bpmedian- package (SSC) estimates a Bonett-Price CI for
> the median.
> 
> By the way, the p50 for a group is _not_ necessarily the sample median:
> 
> .tab weight if foreign
> 
> 
>    Weight |
>    (lbs.) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>     1,760 |          1        4.55        4.55
>     1,830 |          1        4.55        9.09
>     1,930 |          1        4.55       13.64
>     1,980 |          1        4.55       18.18
>     1,990 |          1        4.55       22.73
>     2,020 |          1        4.55       27.27
>     2,040 |          1        4.55       31.82
>     2,050 |          1        4.55       36.36
>     2,070 |          1        4.55       40.91
>     2,130 |          1        4.55       45.45
>     2,160 |          1        4.55       50.00
>     2,200 |          1        4.55       54.55
>     2,240 |          1        4.55       59.09
>     2,280 |          1        4.55       63.64
>     2,370 |          1        4.55       68.18
>     2,410 |          1        4.55       72.73
>     2,650 |          1        4.55       77.27
>     2,670 |          1        4.55       81.82
>     2,750 |          1        4.55       86.36
>     2,830 |          1        4.55       90.91
>     3,170 |          1        4.55       95.45
>     3,420 |          1        4.55      100.00
> ------------+-----------------------------------
>     Total |         22      100.00
> 
> 
> Notice n = 22, an even number of observations, so the median is not
> unique. By convention, it is the midpoint between the two middle observations,
> the  11th and 12th, which is, for this data. (2160 +2200)/2 = 2180.
> But it could be any value between 2160 and 2200. 
> 
> Steve
> 
> Steven J. Samuels
> Consultant in Statistics
> 18 Cantine's Island
> Saugerties NY 12477 USA
> Voice: 845-246-0774
> 
>> On Jan 19, 2013, at 8:00 PM, Richard Hiscock wrote:
>> 
>> I wish to derive 95%CI for difference in medians and noticed that difference in raw median values between groups didn't equal that calculated using packages cendif (R.Newson) and cid (P.Royston) Clearly Im missing something and would be grateful for an explanation.
>> 
>> I suspect it relates to a transformation performed prior to calculation of the difference & subsequent back transformation to original units.
>> 
>> However it is hard to present raw unit median values and the the difference in medians (& CI) which are not the same. In my data set (plasma protein assay) the raw difference in medians is 0.5 whereas the difference calculated by cid or cendif is 0.33 making it hard to explain to readers.
>> 
>> Thanks for any advice
>> 
>> 
>> 
>> Illustrated using the auto data set:
>> 
>> 
>> 
>> Use auto
>> 
>> tabstat weight, by(foreign) stats(p50)
>> 
>> 
>> 
>> Summary for variables: weight by categories of: foreign (Car type)
>> 
>> 
>> 
>> foreign |       p50
>> 
>> ---------+----------
>> 
>> Domestic |      3360
>> 
>> Foreign |      2180
>> 
>> ---------+----------
>> 
>> Total |      3190
>> 
>> --------------------
>> 
>> 
>> 
>> *difference = 1180
>> 
>> 
>> 
>> 
>> 
>> . cendif weight, by(foreign)
>> 
>> Y-variable: weight (Weight (lbs.))
>> 
>> Grouped by: foreign (Car type)
>> 
>> Group numbers:
>> 
>> 
>> 
>> Car type |      Freq.     Percent        Cum.
>> 
>> ------------+-----------------------------------
>> 
>> Domestic |         52       70.27       70.27
>> 
>> Foreign |         22       29.73      100.00
>> 
>> ------------+-----------------------------------
>> 
>>   Total |         74      100.00
>> 
>> Transformation: Fisher's z
>> 
>> 95% confidence interval(s) for percentile difference(s)
>> 
>> between values of weight in first and second groups:
>> 
>> Percent    Pctl_Dif     Minimum     Maximum
>> 
>>     50        1095         750        1330
>> 
>> 
>> 
>> . cid weight,by(foreign) unpaired
>> 
>> 
>> 
>> Normal-based confidence interval for difference in  means by foreign
>> 
>> 
>> 
>> Variable |     Obs     Estimate    Std. Err.       [95% Conf. Interval]
>> 
>> ---------+-------------------------------------------------------------
>> 
>> weight |      74     1001.206    160.2876        681.6788    1320.734
>> 
>> 
>> 
>> . qreg weight foreign
>> 
>> Iteration  1:  WLS sum of weighted deviations =  34840.693
>> 
>> 
>> 
>> Iteration  1: sum of abs. weighted deviations =      34860
>> 
>> note:  alternate solutions exist
>> 
>> Iteration  2: sum of abs. weighted deviations =      34620
>> 
>> note:  alternate solutions exist
>> 
>> Iteration  3: sum of abs. weighted deviations =      34580
>> 
>> 
>> 
>> Median regression                                    Number of obs =        74
>> 
>> Raw sum of deviations    48860 (about 3180)
>> 
>> Min sum of deviations    34580                     Pseudo R2     =    0.2923
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> 
>>   weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>> 
>> -------------+----------------------------------------------------------------
>> 
>>  foreign |      -1150   223.2969    -5.15   0.000    -1595.134   -704.8659
>> 
>>    _cons |       3350   121.7526    27.51   0.000     3107.291    3592.709
>> 
>> ------------------------------------------------------------------------------
>> //www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index