Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?

From	guhjy@kmu.edu.tw
To	statalist@hsphsun2.harvard.edu
Subject	Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
Date	Sun, 27 May 2012 09:02:58 +0800

ACR (urinary albumin creatinine ratio, i.e. urinary albumin [Xi]
divided by urinary creatinine [Yi]) is used to standardize for urinary
concentration to ensure comparability of albuminuria among individual
patients (http://en.wikipedia.org/wiki/Microalbuminuria). I am using
ACR as the dependent or independent variable in multiple linear
regressions. However, "ratio of means" and "mean of ratios (ACR
[Xi/Yi] in this case)" are both biased estimates for the population
ratio [X/Y] (Mean of ratios or ratio of means or both?:
http://www.sciencedirect.com/science/article/pii/S0378375801001811).
In view of these problems and the many pitfalls of ratios mentioned in
many references, is it better to use X (or Y) to adjust for Y (or X)
in regressions (despite its clinical usefulness in individual
decisions)?

Thank you.
Jinn-Yuh



2012/5/26 Nick Cox <njcoxstata@gmail.com>:
> The paper you cite is for a very specific problem with a very specific
> generating process. That is the nub of the matter. You need to specify
> what you are estimating with what model. The best way to approach this
> is probably through simulation of what happens with sample size of
> concern to you and plausible assumptions.
>
> There are many takes on this, however. For example, if a ratio is
> badly behaved, then the best way to analyse data may be not to use
> ratios. If a goal is misconceived, establishing which lousy method of
> attempting that goal is least bad is not a good question.
>
> These are banal generalities. One implication is that is you may need
> to disclose more specific details about what you want to do to get
> better advice.
>
> Nick
>
> On Sat, May 26, 2012 at 3:31 PM,  <guhjy@kmu.edu.tw> wrote:
>
>> It it true that "ratio of means" is less biased than "mean of ratios"
>> (Comparing Ratio Estimators Based on Systematic Samples:
>> http://www.isrt.ac.bd/sites/default/files/jsrissues/v40n2/v40n2p1.pdf)?
>
> 2012/5/26 Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>:
>
>>> They estimate two different quantities - you decide which one you want:
>>>
>>> *******************************************
>>> webuse census2, clear
>>>
>>> // ratio of means
>>> ratio (deathrate: death/pop)
>>> * or, more transparently
>>> mean death pop
>>> di _b[death]/_b[pop]
>>>
>>> // mean of ratio
>>> g deathrate = death/pop
>>> reg deathrate
>>> * or, more transparently
>>> mean deathrate
>>> *******************************************
>
> On Sat, May 26, 2012 at 12:19 AM,  <guhjy@kmu.edu.tw> wrote:
>
>>>> My point is that the mean and se are different between that obtained
>>>> by the "ratio" (which is supposedly to be more accurate) and the
>>>> "regress" command. Thus, the results obtained by the "regress" command
>>>> may be invalid. My question is: how to analyze ratios as the dependent
>>>> or independent variables in regression if the mean and se of (Xi/Yi)
>>>> is incorrect.
>>>> For example:
>>>>
>>>> . webuse census2, clear
>>>> (1980 Census data by state)
>>>>
>>>> .
>>>> . gen drate1=death/pop
>>>>
>>>> .
>>>> . reg drate1
>>>>
>>>>      Source |       SS       df       MS              Number of obs =      50
>>>> -------------+------------------------------           F(  0,    49) =    0.00
>>>>       Model |           0     0           .           Prob > F      =       .
>>>>    Residual |  .000083179    49  1.6975e-06           R-squared     =  0.0000
>>>> -------------+------------------------------           Adj R-squared =  0.0000
>>>>       Total |  .000083179    49  1.6975e-06           Root MSE      =   .0013
>>>>
>>>> ------------------------------------------------------------------------------
>>>>      drate1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>>>> -------------+----------------------------------------------------------------
>>>>       _cons |    .008436   .0001843    45.78   0.000     .0080657    .0088063
>>>> ------------------------------------------------------------------------------
>>>>
>>>> .
>>>> . ratio (deathrate: death/pop)
>>>>
>>>> Ratio estimation                    Number of obs    =      50
>>>>
>>>>    deathrate: death/pop
>>>>
>>>> --------------------------------------------------------------
>>>>             |             Linearized
>>>>             |      Ratio   Std. Err.     [95% Conf. Interval]
>>>> -------------+------------------------------------------------
>>>>   deathrate |   .0087368   .0002052      .0083244    .0091492
>>>> --------------------------------------------------------------
>
> 2012/5/26 Steve Samuels <sjsamuels@gmail.com>:
>
>>>>> Rich Goldstein's nice summary contains a reference to Dick Kronmal's article:
>>>>>
>>>>> Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard
>>>>>  revisited. Journal of the Royal Statistical Society. Series A (Statistics in
>>>>>  Society), 379-392.
>>>>>
>>>>> Dick's thinking (and title) were inspired by:
>>>>>
>>>>> Tanner, J. M. (1949). Fallacy of per-weight and per-surface area standards,
>>>>> and their relation to spurious correlation. Journal of Applied Physiology, 2(1), 1-15.
>>>>>
>>>>> Happily, Tanner's article is available online:
>>>>>
>>>>> http://0-jap.physiology.org.library.pcc.edu/content/2/1/1.full.pdf+html
>
> Nick Cox
>
>>>>> Your opening statement is more nearly incorrect than correct. In
>>>>> general, X / Y is indeterminate whenever Y is 0; if X and Y are
>>>>> normally distributed that is an event with probability 0 (which still
>>>>> means possible) but the ratio is otherwise well defined.
>>>>>
>>>>> If Y is ever 0 in your data then the ratio X / Y is unlikely to make
>>>>> scientific sense and so the question of what you can and can't do with
>>>>> it statistically doesn't really arise.
>>>>>
>>>>> I don't think there is a simple answer to whether you should use
>>>>> ratios in regression. Often it is scientifically natural; often it is
>>>>> pretty dangerous.
>>>>>
>>>>> For one statement of various pitfalls see list member RIchard
>>>>> Goldstein on ratios:
>>>>>
>>>>> http://biostat.mc.vanderbilt.edu/wiki/pub/Main/BioMod/goldstein.ratios.pdf
>>>>>
>>>>> Better advice might depend on your giving more details on what you
>>>>> want to, mentioning the scientific or medical context as well.
>
> On Fri, May 25, 2012 at 5:36 AM,  <guhjy@kmu.edu.tw> wrote:
>
>>>>>> The ratio of two normally distributed variables (X and Y) has no mean
>>>>>> or variance.
>>>>>> 1. Why is it valid that the "ratio" command estimates the mean and se of ratios?
>>>>>> 2. Is it valid to use the individual ratios (i.e. Xi/Yi) in the
>>>>>> dependent or independent part of a regression model?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: David Hoaglin <dchoaglin@gmail.com>

References:
- st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: guhjy@kmu.edu.tw
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: Nick Cox <njcoxstata@gmail.com>
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: Steve Samuels <sjsamuels@gmail.com>
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: guhjy@kmu.edu.tw
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: guhjy@kmu.edu.tw
- Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
  - From: Nick Cox <njcoxstata@gmail.com>

Prev by Date: Re: st: Regression question
Next by Date: st: Cross-sectional correlation test for pooled panel data
Previous by thread: Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
Next by thread: Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
Index(es):
- Date
- Thread