I too have been struggling with trying to "duplicate" ANOVA on complex
designs with -xtmixed-, and I also greatly appreciate Yulia's
contribution. As the Stata FAQ says - when the design is balanced , you
get the same Chidq/df as the F-ratio in ANOVA. So the next issue might
be how to decide between the two approaches when the design is
unbalanced, especially for small designs? In this case, what is ANOVA
actually testing? How serious is the error in assuming F-distributions
for some tests, when they're really not? On the other hand (I think)
inference under -xtmixed- assumes all variances are known - so there is
an error trade-off. Then, of course there is the question of how do do
multiple comparisons.
Al Feiveson
Is the error in ANOVA of assuming F-distributions when they're really
not
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Airey
Sent: Thursday, May 22, 2008 10:11 PM
To: [email protected]
Subject: Re: st: xtmixed syntax [anova]
.
Thank you very much for this level of detail, Yulia. This is a terrific
explanation. I now have your article too. I was far from the last
syntax, but now with your explanations, I might be able to figure out
similar syntax de novo. I will try out both approaches and report back
(another person is tackling the same model with SAS Proc Mixed) so it
will make a good comparison.
-Dave
On May 22, 2008, at 2:51 PM, [email protected] wrote:
> David Airey <[email protected]> asks about how to obtain
> variance components by using -xtmixed- for an experimental design with
> both nested and crossed factors:
>
>> I have a variance components mixed model that I am have trouble with
>> syntax using xtmixed. I'm interested in a genetic polymorphism effect
>> (think of this as simple treatment effect) on gene expression, where
>> gene expression is measured on set of genes where genes are actually
>> measured by multiple probes.
>>
>> two group allelic fixed effect A
>> random sample effect S, one sample per subject, subjects nested in
>> allelic group, 80 subjects random gene effect G, measured on all
>> samples, so crossed with group, typically 20-50 genes random probe
>> effect P, nested in gene, < 5 probes per gene, different probes per
>> gene
>>
>> ...
>
> Before demonstrating the actual syntax of -xtmixed- for David's
> example, let me first briefly discuss a more general idea behind using
> -xtmixed- with random-effects experimental designs. A more detailed
> discussion is given in Marchenko (2006) (in fact, David's example is
> similar to the example described in section 4.6 of the article).
>
> There are several ways of using -xtmixed- to obtain variance
> components from a random-effects experimental design. One is what we
> call the brute- force way of fitting the design. It is
> straightforward. There are also other alternative ways of obtaining
> the same results. These approaches are more efficient in terms of
> memory usage and speed but require using the alternate formulation of
> the design as a multilevel model that may be difficult to find for
> certain designs.
>
> With the brute-force way (or design-matrix-based approach), the first
> step is to construct the design matrix for random effects,
> corresponding to the design. Such a random-effects design matrix will
> be comprised of columns containing indicator variables corresponding
> to the levels of all random effects. -xtmixed- accommodates the
> specification of this matrix via the special group identifier -_all-
> and the factor notation -
> R._varname_- (see
> [XT] xtmixed for details). Thus, all we need to do prior to using -
> xtmixed-
> is to identify all random effects (or, equivalently, all variance
> components)
> associated with the design.
>
> Returning to David's example, probes are nested within genes, and
> groups are crossed with genes and probes. Since the group effect is
> fixed, we have the following variance components for this design:
> variability due to genes, variability due to probes (nested within
> genes), variability due to the interaction of groups and genes,
> variability due to the interaction of groups with probes (nested
> within genes), and subject variability (residual variance). Our four
> random effects are genes, probes (nested within genes), group-gene
> interaction, and group-probe-(nested-within-genes) interaction.
> Note that when a fixed factor is interacted with a random factor, the
> interaction term is random.
>
> Suppose that variables Group, Gene, and Probe contain information
> about groups, genes, and probes, respectively. For this design, we
> also need to create the following interaction terms:
>
> . egen GeXPr = group(Gene Probe)
> . egen GrXGe = group(Group Gene)
> . egen GrXGeXPr = group(Group Gene Probe)
>
> Now, the brute-force way of using -xtmixed- for this design is
> straightforward: we simply list all of the random-effects variables as
> separate equations using the -_all: R._varname_- notation. The
> corresponding syntax is:
>
> . xi: xtmixed depvar i.Group || _all: R.Gene || _all: R.GeXPr ||
> _all: R.GrXGe ///
> || _all: R.GrXGeXPr, variance
>
> In the above syntax, -_all: R.Gene-, for example, tells -xtmixed- to
> include indicators corresponding to levels of variable Gene into the
> design matrix.
> The variability within subjects (the lowest-level variability) is
> estimated by the residual variance, reported by -xtmixed-.
>
> The advantage of the brute-force way is that it is straightforward
> once the relevant random-effects terms are identified. However, for
> David's example, the above syntax requires creating a matrix with
> 50+50*5+2*50+2*50*5 = 900 columns (50 is the number of genes, 5 is the
> number of probes, and 2 is the number of groups). We can avoid this
> by using the more efficient way of obtaining the same results by using
> an alternative syntax of - xtmixed-.
>
> First, consider the terms Gene, GrXGe, and GrXGeXPr. Since
> interaction terms can be viewed as nested terms, we can obtain
> variance components for these three terms more efficiently from a
> three-level model with genes defining the first level, groups defining
> the second level, and probes defining the third level. The
> corresponding syntax of -xtmixed- is
>
> . xi: xtmixed depvar i.Group || Gene: || Group: || Probes:, variance
>
> What we did not take into account in the above syntax is the fact that
> probes are nested within genes. As described in example 7 of [XT]
> xtmixed and section 4.4 of the cited article, the nesting can be
> accommodated by viewing the levels of the nested effects (Probes) as
> random coefficients for nesting level (Genes) and specifying the
> exchangeable covariance structure for these random coefficients.
> Taking this into account, the final syntax for
> -xtmixed-
> is
>
> . xi: xtmixed depvar i.Group || Gene: R.Probes, cov(exchangeable) ///
> || Group: || Probes:, variance
>
> By using the above syntax, we also significantly reduced the column
> dimension of the design matrix (from 900 in the brute-force method to
> 5+1+1=7).
>
> Reference:
>
> Marchenko, Y. 2006. Estimating variance components in Stata. The
> Stata Journal, 6(1): 1-22.
>
> The link to the article is
> http://www.stata-journal.com/article.html?article=st0095
>
>
> -- Yulia
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/