Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: trying to compare means and using xi and xi3 for survey data
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: trying to compare means and using xi and xi3 for survey data
Date
Tue, 5 Jul 2011 08:38:57 -0400
Hitesh Chandwani <[email protected]>:
I doubt there is a coding error, and your interpretation is
correct--but this all may be clearer if you use an example everyone
can share:
webuse nhanes2, clear
svy:mean height, over(race)
svy:reg height black orace
xi:svy:reg height i.race
replace finalwgt=0 if orace
svy:reg height black orace
xi:svy:reg height i.race
On Tue, Jul 5, 2011 at 7:44 AM, Hitesh Chandwani
<[email protected]> wrote:
> Steven,
>
> I used the following commands:
>
> . char insured_pub_pvt_un[omit]2
>
> . xi: svy: regress totchg_num i.insured_pub_pvt_un
>
>
> And got the following output:
>
> i.insured_pub~n _Iinsured_p_0-4 (naturally coded; _Iinsured_p_2 omitted)
> (running regress on estimation sample)
>
> Survey: Linear regression
>
> Number of strata = 75 Number of obs = 103817
> Number of PSUs = 966 Population size = 469088.57
> Design df = 891
> F( 3, 889) = .
> Prob > F = .
> R-squared = 0.0106
>
> ------------------------------------------------------------------------------
> | Linearized
> totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> _Iinsured_~0 | (dropped)
> _Iinsured_~1 | 6504.334 915.0348 7.11 0.000 4708.46 8300.209
> _Iinsured_~3 | -3015.988 705.0121 -4.28 0.000 -4399.666 -1632.31
> _Iinsured_~4 | 1070.352 1961.327 0.55 0.585 -2779.007 4919.711
> _cons | 13894.47 837.4082 16.59 0.000 12250.95 15538
> ------------------------------------------------------------------------------
>
> I think the fact that the "0" group was dropped again has something to
> do with the fact that all observations in this group have pweights set
> to zero. The way I interpret the output is that the coefficients are
> the differences in mean between the omitted group (group 2) and the
> other groups (1, 3, and 4, respectively) with the corresponding
> t-statistic values being a comparison of means with the omitted group.
>
> Is this interpretation accurate?
>
> Regards,
> Hitesh
>
>
>
>
> On Tue, Jul 5, 2011 at 7:30 AM, Hitesh Chandwani
> <[email protected]> wrote:
>> Hi Steven,
>>
>> There is no evident coding error that I can see. If I use the
>> -,noomit- option, how do I interpret the results? The coefficients are
>> clearly the means, but what do the t-values indicate?
>>
>> xi, noomit: svy: reg totchg_num i.insured_pub_pvt_un , nocons
>> (running regress on estimation sample)
>>
>> Survey: Linear regression
>>
>> Number of strata = 75 Number of obs = 103817
>> Number of PSUs = 966 Population size = 469088.57
>> Design df = 891
>> F( 4, 888) = .
>> Prob > F = .
>> R-squared = 0.1513
>>
>> ------------------------------------------------------------------------------
>> | Linearized
>> totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>> _Iinsured_~0 | (dropped)
>> _Iinsured_~1 | 20398.81 1171.304 17.42 0.000 18099.97 22697.64
>> _Iinsured_~2 | 13894.47 837.4082 16.59 0.000 12250.95 15538
>> _Iinsured_~3 | 10878.49 844.9702 12.87 0.000 9220.121 12536.85
>> _Iinsured_~4 | 14964.83 1801.761 8.31 0.000 11428.64 18501.02
>> ------------------------------------------------------------------------------
>>
>> Regards,
>> Hitesh
>>
>>
>> On Tue, Jul 5, 2011 at 12:34 AM, Steven Samuels <[email protected]> wrote:
>>>
>>> I suspect a coding error.
>>>
>>> Suppose insure_cat is your original insurance variable. Have you looked at
>>>
>>> *******************************
>>> bys insure_cat: sum totchg_num
>>>
>>> *****************************
>>> Have you tabulated each insurance indicator against insure_cat?
>>>
>>> In any case, direct survey approaches are:
>>> ************************
>>> svy: mean totchg_num, over(insure_cat)
>>> xi, noomit: svy: reg totch_num i.insure_cat, nocons //pre-Stata 11
>>> svy: reg totch_num ibn.insure_cat, nocons //Stata 11 +
>>> ************************
>>>
>>>
>>> Steve
>>>
>>>
>>> Steven J. Samuels
>>> Consultant in Statistics
>>> 18 Cantine's Island
>>> Saugerties, NY 12477 USA
>>> Voice: 845-246-0774
>>> Fax: 206-202-4783
>>> [email protected]
>>>
>>> On Jul 4, 2011, at 5:02 PM, Hitesh Chandwani wrote:
>>>
>>> Hello Statalisters,
>>>
>>> I am using cost survey data and have 2 questions:
>>>
>>> 1) Comparison of means
>>>
>>> Using the svy: mean procedure, I can get means of cost for all
>>> categories of a particular variable. But since this variable is not
>>> dichotomous, using -test- or -lincom- as a postestimation command to
>>> compare the means, doesn't yield any results. What I thought of was
>>> dummy coding the categories and then running a regression. Instead of
>>> manually creating dummy variables, I decided to use -xi-; which brings
>>> me to my next question,
>>>
>>> 2) -xi- and -xi3- will both omit one category as a reference
>>> category..which is fine. But, in my output, after omitting the first
>>> category, another category is indicated as (dropped). Moreover, there
>>> is still no value for the F-statistic.
>>>
>>> Firstly, is my approach correct? And secondly, why are 2 categories
>>> being dropped?
>>>
>>> (One explanation that I could come up with for the 2 dropped
>>> categories is that the pweight for the observations in the omitted
>>> category " _Iinsured_p_0" is set to zero and hence Stata needs to use
>>> another category as reference)
>>>
>>> The following is my syntax as well as output:
>>>
>>>
>>> xi: svy: regress totchg_num i.insured_pub_pvt_un
>>> i.insured_pub~n _Iinsured_p_0-4 (naturally coded; _Iinsured_p_0 omitted)
>>> (running regress on estimation sample)
>>>
>>> Survey: Linear regression
>>>
>>> Number of strata = 75 Number of obs = 103817
>>> Number of PSUs = 966 Population size = 469088.57
>>> Design df = 891
>>> F( 3, 889) = .
>>> Prob > F = .
>>> R-squared = 0.0106
>>>
>>> ------------------------------------------------------------------------------
>>> | Linearized
>>> totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval]
>>> -------------+----------------------------------------------------------------
>>> _Iinsured_~1 | 6504.334 915.0348 7.11 0.000 4708.46 8300.209
>>> _Iinsured_~2 | (dropped)
>>> _Iinsured_~3 | -3015.988 705.0121 -4.28 0.000 -4399.666 -1632.31
>>> _Iinsured_~4 | 1070.352 1961.327 0.55 0.585 -2779.007 4919.711
>>> _cons | 13894.47 837.4082 16.59 0.000 12250.95 15538
>>> ------------------------------------------------------------------------------
>>>
>>> . test _Iinsured_p_1 _Iinsured_p_2 _Iinsured_p_3 _Iinsured_p_4
>>>
>>> Adjusted Wald test
>>>
>>> ( 1) _Iinsured_p_1 = 0
>>> ( 2) _Iinsured_p_2 = 0
>>> ( 3) _Iinsured_p_3 = 0
>>> ( 4) _Iinsured_p_4 = 0
>>> Constraint 2 dropped
>>>
>>> F( 3, 889) = 23.78
>>> Prob > F = 0.0000
>>>
>>> Any help in understanding this issue will be greatly appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/