Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | guhjy@kmu.edu.tw |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model? |
Date | Sun, 27 May 2012 09:02:58 +0800 |
ACR (urinary albumin creatinine ratio, i.e. urinary albumin [Xi] divided by urinary creatinine [Yi]) is used to standardize for urinary concentration to ensure comparability of albuminuria among individual patients (http://en.wikipedia.org/wiki/Microalbuminuria). I am using ACR as the dependent or independent variable in multiple linear regressions. However, "ratio of means" and "mean of ratios (ACR [Xi/Yi] in this case)" are both biased estimates for the population ratio [X/Y] (Mean of ratios or ratio of means or both?: http://www.sciencedirect.com/science/article/pii/S0378375801001811). In view of these problems and the many pitfalls of ratios mentioned in many references, is it better to use X (or Y) to adjust for Y (or X) in regressions (despite its clinical usefulness in individual decisions)? Thank you. Jinn-Yuh 2012/5/26 Nick Cox <njcoxstata@gmail.com>: > The paper you cite is for a very specific problem with a very specific > generating process. That is the nub of the matter. You need to specify > what you are estimating with what model. The best way to approach this > is probably through simulation of what happens with sample size of > concern to you and plausible assumptions. > > There are many takes on this, however. For example, if a ratio is > badly behaved, then the best way to analyse data may be not to use > ratios. If a goal is misconceived, establishing which lousy method of > attempting that goal is least bad is not a good question. > > These are banal generalities. One implication is that is you may need > to disclose more specific details about what you want to do to get > better advice. > > Nick > > On Sat, May 26, 2012 at 3:31 PM, <guhjy@kmu.edu.tw> wrote: > >> It it true that "ratio of means" is less biased than "mean of ratios" >> (Comparing Ratio Estimators Based on Systematic Samples: >> http://www.isrt.ac.bd/sites/default/files/jsrissues/v40n2/v40n2p1.pdf)? > > 2012/5/26 Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>: > >>> They estimate two different quantities - you decide which one you want: >>> >>> ******************************************* >>> webuse census2, clear >>> >>> // ratio of means >>> ratio (deathrate: death/pop) >>> * or, more transparently >>> mean death pop >>> di _b[death]/_b[pop] >>> >>> // mean of ratio >>> g deathrate = death/pop >>> reg deathrate >>> * or, more transparently >>> mean deathrate >>> ******************************************* > > On Sat, May 26, 2012 at 12:19 AM, <guhjy@kmu.edu.tw> wrote: > >>>> My point is that the mean and se are different between that obtained >>>> by the "ratio" (which is supposedly to be more accurate) and the >>>> "regress" command. Thus, the results obtained by the "regress" command >>>> may be invalid. My question is: how to analyze ratios as the dependent >>>> or independent variables in regression if the mean and se of (Xi/Yi) >>>> is incorrect. >>>> For example: >>>> >>>> . webuse census2, clear >>>> (1980 Census data by state) >>>> >>>> . >>>> . gen drate1=death/pop >>>> >>>> . >>>> . reg drate1 >>>> >>>> Source | SS df MS Number of obs = 50 >>>> -------------+------------------------------ F( 0, 49) = 0.00 >>>> Model | 0 0 . Prob > F = . >>>> Residual | .000083179 49 1.6975e-06 R-squared = 0.0000 >>>> -------------+------------------------------ Adj R-squared = 0.0000 >>>> Total | .000083179 49 1.6975e-06 Root MSE = .0013 >>>> >>>> ------------------------------------------------------------------------------ >>>> drate1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] >>>> -------------+---------------------------------------------------------------- >>>> _cons | .008436 .0001843 45.78 0.000 .0080657 .0088063 >>>> ------------------------------------------------------------------------------ >>>> >>>> . >>>> . ratio (deathrate: death/pop) >>>> >>>> Ratio estimation Number of obs = 50 >>>> >>>> deathrate: death/pop >>>> >>>> -------------------------------------------------------------- >>>> | Linearized >>>> | Ratio Std. Err. [95% Conf. Interval] >>>> -------------+------------------------------------------------ >>>> deathrate | .0087368 .0002052 .0083244 .0091492 >>>> -------------------------------------------------------------- > > 2012/5/26 Steve Samuels <sjsamuels@gmail.com>: > >>>>> Rich Goldstein's nice summary contains a reference to Dick Kronmal's article: >>>>> >>>>> Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard >>>>> revisited. Journal of the Royal Statistical Society. Series A (Statistics in >>>>> Society), 379-392. >>>>> >>>>> Dick's thinking (and title) were inspired by: >>>>> >>>>> Tanner, J. M. (1949). Fallacy of per-weight and per-surface area standards, >>>>> and their relation to spurious correlation. Journal of Applied Physiology, 2(1), 1-15. >>>>> >>>>> Happily, Tanner's article is available online: >>>>> >>>>> http://0-jap.physiology.org.library.pcc.edu/content/2/1/1.full.pdf+html > > Nick Cox > >>>>> Your opening statement is more nearly incorrect than correct. In >>>>> general, X / Y is indeterminate whenever Y is 0; if X and Y are >>>>> normally distributed that is an event with probability 0 (which still >>>>> means possible) but the ratio is otherwise well defined. >>>>> >>>>> If Y is ever 0 in your data then the ratio X / Y is unlikely to make >>>>> scientific sense and so the question of what you can and can't do with >>>>> it statistically doesn't really arise. >>>>> >>>>> I don't think there is a simple answer to whether you should use >>>>> ratios in regression. Often it is scientifically natural; often it is >>>>> pretty dangerous. >>>>> >>>>> For one statement of various pitfalls see list member RIchard >>>>> Goldstein on ratios: >>>>> >>>>> http://biostat.mc.vanderbilt.edu/wiki/pub/Main/BioMod/goldstein.ratios.pdf >>>>> >>>>> Better advice might depend on your giving more details on what you >>>>> want to, mentioning the scientific or medical context as well. > > On Fri, May 25, 2012 at 5:36 AM, <guhjy@kmu.edu.tw> wrote: > >>>>>> The ratio of two normally distributed variables (X and Y) has no mean >>>>>> or variance. >>>>>> 1. Why is it valid that the "ratio" command estimates the mean and se of ratios? >>>>>> 2. Is it valid to use the individual ratios (i.e. Xi/Yi) in the >>>>>> dependent or independent part of a regression model? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/