Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?
Date
Sat, 26 May 2012 16:20:48 +0100
The paper you cite is for a very specific problem with a very specific
generating process. That is the nub of the matter. You need to specify
what you are estimating with what model. The best way to approach this
is probably through simulation of what happens with sample size of
concern to you and plausible assumptions.
There are many takes on this, however. For example, if a ratio is
badly behaved, then the best way to analyse data may be not to use
ratios. If a goal is misconceived, establishing which lousy method of
attempting that goal is least bad is not a good question.
These are banal generalities. One implication is that is you may need
to disclose more specific details about what you want to do to get
better advice.
Nick
On Sat, May 26, 2012 at 3:31 PM, <[email protected]> wrote:
> It it true that "ratio of means" is less biased than "mean of ratios"
> (Comparing Ratio Estimators Based on Systematic Samples:
> http://www.isrt.ac.bd/sites/default/files/jsrissues/v40n2/v40n2p1.pdf)?
2012/5/26 Tirthankar Chakravarty <[email protected]>:
>> They estimate two different quantities - you decide which one you want:
>>
>> *******************************************
>> webuse census2, clear
>>
>> // ratio of means
>> ratio (deathrate: death/pop)
>> * or, more transparently
>> mean death pop
>> di _b[death]/_b[pop]
>>
>> // mean of ratio
>> g deathrate = death/pop
>> reg deathrate
>> * or, more transparently
>> mean deathrate
>> *******************************************
On Sat, May 26, 2012 at 12:19 AM, <[email protected]> wrote:
>>> My point is that the mean and se are different between that obtained
>>> by the "ratio" (which is supposedly to be more accurate) and the
>>> "regress" command. Thus, the results obtained by the "regress" command
>>> may be invalid. My question is: how to analyze ratios as the dependent
>>> or independent variables in regression if the mean and se of (Xi/Yi)
>>> is incorrect.
>>> For example:
>>>
>>> . webuse census2, clear
>>> (1980 Census data by state)
>>>
>>> .
>>> . gen drate1=death/pop
>>>
>>> .
>>> . reg drate1
>>>
>>> Source | SS df MS Number of obs = 50
>>> -------------+------------------------------ F( 0, 49) = 0.00
>>> Model | 0 0 . Prob > F = .
>>> Residual | .000083179 49 1.6975e-06 R-squared = 0.0000
>>> -------------+------------------------------ Adj R-squared = 0.0000
>>> Total | .000083179 49 1.6975e-06 Root MSE = .0013
>>>
>>> ------------------------------------------------------------------------------
>>> drate1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
>>> -------------+----------------------------------------------------------------
>>> _cons | .008436 .0001843 45.78 0.000 .0080657 .0088063
>>> ------------------------------------------------------------------------------
>>>
>>> .
>>> . ratio (deathrate: death/pop)
>>>
>>> Ratio estimation Number of obs = 50
>>>
>>> deathrate: death/pop
>>>
>>> --------------------------------------------------------------
>>> | Linearized
>>> | Ratio Std. Err. [95% Conf. Interval]
>>> -------------+------------------------------------------------
>>> deathrate | .0087368 .0002052 .0083244 .0091492
>>> --------------------------------------------------------------
2012/5/26 Steve Samuels <[email protected]>:
>>>> Rich Goldstein's nice summary contains a reference to Dick Kronmal's article:
>>>>
>>>> Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard
>>>> revisited. Journal of the Royal Statistical Society. Series A (Statistics in
>>>> Society), 379-392.
>>>>
>>>> Dick's thinking (and title) were inspired by:
>>>>
>>>> Tanner, J. M. (1949). Fallacy of per-weight and per-surface area standards,
>>>> and their relation to spurious correlation. Journal of Applied Physiology, 2(1), 1-15.
>>>>
>>>> Happily, Tanner's article is available online:
>>>>
>>>> http://0-jap.physiology.org.library.pcc.edu/content/2/1/1.full.pdf+html
Nick Cox
>>>> Your opening statement is more nearly incorrect than correct. In
>>>> general, X / Y is indeterminate whenever Y is 0; if X and Y are
>>>> normally distributed that is an event with probability 0 (which still
>>>> means possible) but the ratio is otherwise well defined.
>>>>
>>>> If Y is ever 0 in your data then the ratio X / Y is unlikely to make
>>>> scientific sense and so the question of what you can and can't do with
>>>> it statistically doesn't really arise.
>>>>
>>>> I don't think there is a simple answer to whether you should use
>>>> ratios in regression. Often it is scientifically natural; often it is
>>>> pretty dangerous.
>>>>
>>>> For one statement of various pitfalls see list member RIchard
>>>> Goldstein on ratios:
>>>>
>>>> http://biostat.mc.vanderbilt.edu/wiki/pub/Main/BioMod/goldstein.ratios.pdf
>>>>
>>>> Better advice might depend on your giving more details on what you
>>>> want to, mentioning the scientific or medical context as well.
On Fri, May 25, 2012 at 5:36 AM, <[email protected]> wrote:
>>>>> The ratio of two normally distributed variables (X and Y) has no mean
>>>>> or variance.
>>>>> 1. Why is it valid that the "ratio" command estimates the mean and se of ratios?
>>>>> 2. Is it valid to use the individual ratios (i.e. Xi/Yi) in the
>>>>> dependent or independent part of a regression model?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/