Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Comparing overlapping groups
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: Comparing overlapping groups
Date
Tue, 2 Oct 2012 11:05:54 -0400
Dear Fred,
If the 4 definitions were mutually exclusive subsets, you could use a
regression that has indicator variables for FM2, FM3, and FM4 (the
constant term would handle FM1, or you could include an indicator for
FM1 and turn off the constant). The result would be equivalent to a
one-way analysis of variance with 4 groups.
Since the definitions overlap (though you have not said how many of
the overlaps are present in your data or the numbers of observations
in the overlaps --- if all 2442 observations meet at least one of the
4 definitions, you could have as many as 15 subgroups), you could
start with a regression model that has indicators for FM2, FM3, and
FM4. The constant will give you an average for FM1, and the
coefficients of the three indicators will give incremental effects,
relative to FM1. The results may not be satisfactory, and they may be
difficult to interpret. A better approach, along the lines of main
effects and interactions, would also include indicators for each of
the subsets that involve 2 or more of the definitions. Then, for
example, you could get an estimate of the level of phq_sss among
people who meet only FM1, an increment for people who meet both FM1
and FM2, and further increments for people who meet FM1, FM2, and FM3
and people who meet all 4 definitions.
I hope this discussion is helpful.
David Hoaglin
On Tue, Oct 2, 2012 at 10:06 AM, Fred Wolfe
<[email protected]> wrote:
> Dear Statalisters,
>
> I am analyzing a medical condition (FM) that has 4 different
> definitions for the same condition. A person can be in 1 or more of
> four definition defined groups (FM1, FM2, FM3, FM4). There are 2442
> observations.
>
> I am interested the value of a dependent variable, phq_sss, according
> to each group definition.
>
> For the first two definitions, I get these results
>
> . regress phq_sss i.wsp
>
> Source | SS df MS Number of obs = 2442
> -------------+------------------------------ F( 1, 2440) = 605.51
> Model | 7621.27967 1 7621.27967 Prob > F = 0.0000
> Residual | 30711.1417 2440 12.5865335 R-squared = 0.1988
> -------------+------------------------------ Adj R-squared = 0.1985
> Total | 38332.4214 2441 15.7035729 Root MSE = 3.5478
>
> ------------------------------------------------------------------------------
> phq_sss | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.wsp | 6.247731 .2538992 24.61 0.000 5.74985 6.745611
> _cons | 2.728905 .0751615 36.31 0.000 2.581518 2.876292
> ------------------------------------------------------------------------------
>
> . regress phq_sss i.mwsp
>
> Source | SS df MS Number of obs = 2442
> -------------+------------------------------ F( 1, 2440) = 229.25
> Model | 3292.19831 1 3292.19831 Prob > F = 0.0000
> Residual | 35040.2231 2440 14.3607472 R-squared = 0.0859
> -------------+------------------------------ Adj R-squared = 0.0855
> Total | 38332.4214 2441 15.7035729 Root MSE = 3.7896
>
> ------------------------------------------------------------------------------
> phq_sss | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.mwsp | 10.37138 .6849863 15.14 0.000 9.028161 11.71459
> _cons | 3.144753 .0771774 40.75 0.000 2.993413 3.296093
> ------------------------------------------------------------------------------
>
> There are two additions definitions that are not shown.
>
> So the difference for group members as opposed to none groups members
> in the two analyses above is:
> wsp 6.2
> mwsp 10.4
> (there will be 2 other groups).
>
> My question is, how do i tell if the results are statistically
> different between the 4 groups, given the overlapping membership in
> the groups. I have a feeling that some sort of permutation test is the
> way to get such an answer. I'd appreciate suggestions.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/