I personally would tackle the problem using a form of regression with
dummy variables, rather than the -anova- command. This has several advantages:
It gives me more control over what is being estimated
It presents the results as estimates with CI, rather than as general tests
Using the robust option, particularly in conjunction with - xtgee-
controls for non-sphericity, heterogeneity etc, rather than just testing
for a problem.
Starting with the original data set, a possible series of commands might be:
set matsize 500
* Check that x is indeed equal to y[1]
bys id (t): assert x[1] == x
bys id (t): assert y[1] == x
* This being so, the first time period can go.
* Set up dummy variables
* main effects of the groups
for num 1/3: gen gX = g == X if g ~= .
* main effects of time
* (not strictly needed, as only 2 time points in the model)
for num 1/3: gen tX = t == X if t ~= .
* treatment-time interaction
* (assuming time effect linear, with zero effect at time 1)
for num 1/3: gen gX_t = gX*(t-1)
* Set up the -xt- structure
iis id
tis t
* Produce some estimates & graphs
version 7: qnorm y
tab y
* Clearly not Normal, so we may need to use something more subtle, such as
* interval regression or ordered logistic regression.
* However, that is not the immediate problem.
xtgraph y , group(g) xlab( 1 2 3) ylab offset(.05) list
* Pretty the graph up using version 8 graphics
preserve
tempfile myfile
xtgraph y , group(g) list savdat("`myfile'")
/* Insert gr8 commands to taste */
restore
* Now the main analysis
* Constant effects of treatment
regress y x g2 g3 t if t > 1, cluster(id)
* Treatment effect increasing linearly with time
regress y x g2_t g3_t t if t > 1, cluster(id)
* Both models suggest that group 2 typically has lower scores.
* However the first model seems to fit the data better.
In practice, I would probably stop there & report the
constant effects model; however, xt models can also be fitted.
e.g.
xtgls y x g2 g3 t if t > 1, corr(ar1)
xtgee y x g2 g3 t if t > 1, cluster(id)
etc.
I would be interested in comments on which
approach is most appropriate for a data set such as this
=========================
Paul T Seed ([email protected])
Division of Reproductive Health, Endocrinology and Development
Guy's Kings and St. Thomas' School of Medicine, King's College London,
St Thomas' Hospital,
Lambeth Palace Road,
London SE1 7EH