RE: Re: st: RE: AW: ratio function

Date   Mon, 26 Apr 2010 10:36:00 +0200

thank you, it helps a lot!

just small correction:

the last command should by "nlcom [y2008]_cons/[y2009]_cons" I assume


My previous example was flawed, Because males and females could be
present in the same PSU (famid), the degrees of freedom for -suest-
did not equal the sum of the d.f. for the separate regressions.  Here
is another  example with proper strata. Note that the degrees now add

webuse income, clear
gen  year = 2008 + (famid>68)
tab year
svyset famid, strata(year)
svy: reg inc if year==2008
estimates store y2008
svy: reg inc if year==2009
estimates store y2009
suest y2008 y2009
matrix list e(V)   //results from different years are independent
nlcom  _b[y2008:_cons]/_b[y2009:_cons]
On Fri, Apr 23, 2010 at 10:40 AM, Steve Samuels <[email protected]> wrote:
> Just use -suest-.  Use the fact that -reg- without an argument is
> equivalent to estimating the mean.
> ********************
> webuse income, clear
> svyset famid
> svy: reg inc if male
> estimates store Male
> svy: reg inc if !male
> estimates store Female
> suest Male Female
> nlcom _b[Male:_cons]/_b[Female:_cons]
> *************************
> Steve
2010/4/23 Roman Kasal <[email protected]>:
>> yes, the year is another survey (different time; the years cannot be pooled because of degrees of freedom) and is included in strata.
>> so there is no solution for this case? just manually?
>>> "svy: mean wage, over(year)" is not equal "svy: mean wage if year==2009"
>> The "if statement" is incorrect, unless year was a stratification
>> variable that you identified to Stata.
>> -nlcom- after -svy: mean-, over(year),   is the proper approach.
2010/4/23 Roman Kasal <[email protected]>:
>>> ok, for this purpose I agree, that is ok...but what about if I want to calculate SE of Mean in years 2009 and 2008 and then ratio with SE of the means?
>>> the problem is that CI of
>>> "svy: mean wage, over(year)" is not equal "svy: mean wage if year==2009"
>>> for the year 2009 because of different degrees of freedom (SE's are equal), the first command gives wrong CI.
>>> is any elegant solution to handle this in Stata with "nlcom" or do I have to calculate it manually?
>>> thank you
>>> The degrees of freedom are correct.  See any sampling text.
>>> Briefly: To identify a subpopulation, each observation in the sample
>>> receives a 0-1  indicator variable d.   If X is the numerator variable
>>>  and Y is the denominator variable, the numerator for the ratio of is
>>> the sum *over the entire sample*  of Z_x = d *X and the denominator is
>>> the sum of Z_y = d * Y.  The standard errors are based on variability
>>> in the Z's, including the zero values.
>>> By the way, the standard errors formulas are valid only if the
>>> expected number of observations in a subpopulation is at least 20.
>>> Steve
2010/4/22 Roman Kasal <[email protected]>:
>>>> thank you for the code, but I have found a problem:
>>>> if I calculate over(foreign) the bound are enumerated with "e(N_psu)-e(N_strata)" degrees of freedom, but not for each foreign (degrees of freedom are for whole dataset) and this is wrong I assume.
>>>> thank you
>>>> Roman
>>>> Perhaps we misunderstand what you are asking for. I  We have been
>>>> assuming that you  want the ratio of the means of two variables
>>>> ("columns"?) measured possibly on the same person.  Perhaps you want
>>>> the ratio of the means of one variable for two subpopulations.   Both
>>>> analyses will ignore missing values.
>>>> If this is not what you desire, then please demonstrate by hand what
>>>> you do want on a small, non-survey data set.. Also I'd like to know
>>>> which R function does what are asking for
>>>>  The following do file computes the ratio of means with CI and then
>>>> does the same for the log ratio and transforms to the original scale.
>>>> -Steve
>>>> **************************CODE BEGINS**************************
>>>> capture program drop _all
>>>> program antilog
>>>> local lparm  el(r(b),1,1)
>>>> local se    sqrt(el(r(V),1,1))
>>>> local bound  invttail(e(df_r),.025)*`se'
>>>> local parm  exp(`lparm')
>>>> local ll  exp(`lparm'  - `bound')
>>>> local ul  exp( `lparm' + `bound')
>>>> di  "parm =" `parm'  "    ll = " `ll'  "   ul = " `ul'
>>>> end
>>>> sysuse auto, clear
>>>> svyset _n
>>>> svy: mean mpg, over(foreign)
>>>> nlcom (myratio1: _b[Domestic]/_b[Foreign])   //ratio
>>>> nlcom (myratio2: log(_b[Domestic]/_b[Foreign]))   // log ratio
>>>> // Confidence interval of last -nlcom- on antilog scale
>>>> antilog
>>>> ***************************CODE ENDS***************************
>>>> .
On Fri, Apr 2, 2010 at 2:37 AM, Roman Kasal <[email protected]> wrote:
>>>>> I don't how to do it when you want to find out ratio between
>>>>> years, male X female, ...? So there is no solution? Just to keep N,mean,
>>>>> SE, degrees of freedom, N_strata, N_psu, .... and calculate it manually?
>>>>> I think it is not appropriate solution, at least to have it as an
>>>>> option. I think there is missing a lot with complex survey in Stata and
>>>>> complex survey is needed for almost every survey research, even freeware
>>>>> R-project is better equipped :(
>>>>> so have a hope Stata will get it soon....immediately we are buying it
>>>>> again :)
>>>>> And it should.   Data (x,y) (1,2) (2,4) (3,6) (100,.)    will give an
>>>>> entirely different view of the data if the unpaired observation is
>>>>> included in a mean or ratio calculation.  Or consider data with x
>>>>> missing in half the pairs and y missing in the other half; the ratio
>>>>> of means would be meaningless.
>>>>> The formulas for standard errors for ratios  assume that the data are
>>>>> paired. Formally, they are based on the residual MSE of a regression
>>>>> of y on x through the origin. You cannot do that regression with
>>>>> unpaired data.
>>>>> If your concern is missing data, the solution is to impute the missing
>>>>> values before analysis.
>>>>> Steve
