Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Standard error of a ratio of two random variables


From   "Feiveson, Alan H. (JSC-SK311)" <[email protected]>
To   <[email protected]>
Subject   RE: st: Standard error of a ratio of two random variables
Date   Tue, 13 Jan 2009 10:58:28 -0600

Sergiy - If your data set does not have both X and Y observed jointly,
you can't directly estimate the correlation or calculate obesrved values
of the ratio. But you can still use the delta method or theory to derive
the distribution, or at least the moments of the ratio if you assume
independence or some known dependence structure. Also, as far as the
distributions go, X is positive and Y is a positive integer (clearly not
normally distributed) so you can do better in a simulation than to
assume both X and Y are normal. For example, perhaps you could assume Y
is an offset Poisson (starting at 1, rather than zero) and X is
lognormal. But you still have to know what the dependence structure is
unless you assume independence.

Al Feiveson

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sergiy
Radyakin
Sent: Tuesday, January 13, 2009 10:36 AM
To: [email protected]
Subject: Re: st: Standard error of a ratio of two random variables

Dear All,

thank you for your responses. Austin's solution to remove rho helped a
lot, but I am a bit confused with the other alternative that he
proposed, namely "keep the rho, but divide by N". What would N be in the
case, where the mean of X is be computed on one domain, and mean of Y on
another? (e.g. because of the missings). For example, X is income, and Y
is family size, and I were interested in per capita income, but income
was less often reported than family size. My observation is that Stata
stops with error "no observations" in case when domains of X and Y do
not overlap at all, but there is nothing in the formula that prevents me
from computing (approx) moments of X/Y in this case. Should I restrict
computations to the common domain where both X and Y are not missing for
some reason?

Alan, thank you for the comments regarding not existance of moments in
case the the X and Y are N(0,s2). In my case the expectations of X and Y
are guaranteed to be non-zero as they are the numbers of people with
non-exotic characteristics (like attending primary school).

Thank you,
    Sergiy Radyakin



On Tue, Jan 13, 2009 at 10:56 AM, Stas Kolenikov <[email protected]>
wrote:
> The variance of the ratio of two normals does not exist if the 
> denominator has a mean of zero. Then yes we get a Cauchy distribution 
> with all the ugly properties. If the mean is not zero, you are not 
> dividing by zero very often, although I don't really know whether that

> distribution of that ratio is generally known.
>
> Back to Sergiy's question -- to gauge the differences between the 
> variance estimation methods, you can also try -svy, jackknife- and see

> what it produces; the differences on -sysuse auto- dataset should be 
> somewhere in the fourth or fifth decimal point. But I think Austin 
> nailed down the analytical problem with an extra `rho'.
>
> Of course you are shooting sparrows with a cannon -- you probably 
> could've achieved anything you needed using -nlcom-.
>
> On 1/13/09, Feiveson, Alan H. (JSC-SK311) <[email protected]>
wrote:
>> It should also be noted that this delta method is just an 
>> approximation
>>  - so it is not surprising that it might disagree with a simulaiton 
>> by  10% or more. Also, technically the variance of a ratio of two 
>> normally  distributed random variables doesn't even exist! Therefore 
>> even a  simulation, if carried out long enough would produce 
>> arbitarily high  "SE" values. The saving grace is that if the 
>> variance of the denominator  is much smaller than the mean, we can 
>> "get away" with these  approximations for practical usage.
>>
>>  Al Feiveson
>>
>>
>>  -----Original Message-----
>>  From: [email protected]
>>  [mailto:[email protected]] On Behalf Of Jeph 
>> Herrin
>>  Sent: Tuesday, January 13, 2009 7:11 AM
>>  To: [email protected]
>>  Subject: Re: st: Standard error of a ratio of two random variables
>>
>>
>>  For what it's worth, a simulation may give the most accurate answer:
>>
>>    quietly ci X
>>    local mux=r(mean)
>>    local sex=r(se)
>>    quietly ci Y
>>    local muy=r(mean)
>>    local sey=r(se)
>>    corr X Y
>>    local rho=r(rho)
>>    matrix b= (`mux' \ `muy')
>>    matrix V= (1, `rho' \ `rho', 1)
>>    matrix sd =(`sex' \ `sey')
>>
>>    clear
>>    set obs 100000
>>    drawnorm X Y, means(b) corr(V) sds(sd)
>>    gen ratio=X/Y
>>    sum ratio
>>
>>  The SD of the variable -ratio- should be the SE of X/Y.
>>
>>  Jeph
>>
>>
>>
>>
>>  Sergiy Radyakin wrote:
>>  > Dear All,
>>  >
>>  >    I need to find a standard error of Z a ratio of two random
>>  > variables X and Y: Z=X/Y, where the means and SEs for X and Y are

>> > known, as well as their corr coefficient. (In my particular case X 
>> and
>>
>>  > Y are numbers of people that have such and such characteristics). 
>> I am
>>
>>  > using delta-method according to [
>>  > http://www.math.umt.edu/patterson/549/Delta.pdf ] (see page 
>> 2(38)). I  > then use svy:ratio to check my results and they don't 
>> match.  I wonder
>>
>>  > if I am doing something wrong, or is it any kind of 
>> precision-related  > problem (the difference is about 7%, i.e. 1.7959

>> vs. 1.6795), or is my
>>
>>  > check simply wrong and not applicable in this case.
>>  >
>>  >    I would appreciate if someone could look into the code and let
me
>>  > know why the results are different.
>>  >
>>  > Thank you,
>>  >     Sergiy Radyakin
>>  >
>>  > Below is a do file that one can Ctrl+C/Ctrl+V to Stata's command
line:
>>  >
>>  > **** BEGIN OF RV_RATIO.DO ****
>>  > sysuse auto, clear
>>  > generate byte www=1
>>  > svyset [pw=www]
>>  >
>>  > capture program drop st_error_of_ratio program define  > 
>> st_error_of_ratio, rclass
>>  >       syntax varlist(min=2 max=2)
>>  >       svy: mean `varlist'
>>  >       matrix B=e(b)
>>  >       local mux=B[1,1]
>>  >       local muy=B[1,2]
>>  >       matrix V=e(V)
>>  >       local sigma2x=V[1,1]
>>  >       local sigma2y=V[2,2]
>>  >       local sigmax_sigmay=V[1,2]
>>  >              svyset
>>  >       corr `varlist' [aw`=r(wexp)']
>>  >       local rho=r(rho)
>>  >       return scalar se = sqrt((`mux')^2*`sigma2y'/(`muy')^4 +
>>  > `sigma2x'/(`muy')^2 - 2*`mux'/(`muy')^3*`rho'*`sigmax_sigmay')
>>  > end
>>  >       st_error_of_ratio price length
>>  >              display as text "Estimated SE=" as result r(se)
>>  >       svy: ratio price / length
>>  > **** END OF RV_RATIO.DO ****
>>  > *
>>  > *   For searches and help try:
>>  > *   http://www.stata.com/help.cgi?search
>>  > *   http://www.stata.com/support/statalist/faq
>>  > *   http://www.ats.ucla.edu/stat/stata/
>>  >
>>  *
>>  *   For searches and help try:
>>  *   http://www.stata.com/help.cgi?search
>>  *   http://www.stata.com/support/statalist/faq
>>  *   http://www.ats.ucla.edu/stat/stata/
>>
>>  *
>>  *   For searches and help try:
>>  *   http://www.stata.com/help.cgi?search
>>  *   http://www.stata.com/support/statalist/faq
>>  *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name Small print: 
> I use this email account for mailing lists only.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index