[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Re: vanelteren

From	Joseph Coveney <[email protected]>
To	Statalist <[email protected]>
Subject	Re: st: Re: vanelteren
Date	Mon, 01 Aug 2005 01:19:35 +0900
Ricardo Ovaldia wrote:

--- Joseph Coveney wrote:

>> That the medians of the pooled data are identical
> wouldn't bother me so much as 
> the difference between the asymptotic and
> permutation p-values with 256 dams.  
> 
> Take a look at what -xtreg percent mm, i(dam) fe-
> gives you (and also take a 
> look at, say, -pnorm- on the residuals, for
> starters).  I'm guessing that 
> -xtreg, fe- (which would have been my first choice
> with this design) gives you 
> the same take-home message as what -vaneltern- does.

Yes, I first used -xtreg- and obtained a p-value of
0.018. Although the residual plot did not look to bad
using -pnorm-, they look bad on the -qnorm- plot. I
was concerned that I was not meeting the normality
assumption, therefore I opted to use a non-parametric
test. I also tried to find a transformation, but the
ones I selected found did not performed any better.

--------------------------------------------------------------------------------

Good enough.  I'm glad that things worked out with -vanelteren-, then.

I wouldn't necessarily write off -xtreg, fe- completely, though.  Some work by 
Lisa Sullivan and Ralph D'Agostino Sr.* indicates that the power of a t-test on 
differences of paired ordered-categorical data is still pretty good, even with 
small samples.  The normality assumption isn't very well met in their case.  

The do-file below suggests that the findings of Sullivan and D'Agostino can be 
extended beyond the paired t-test.  -vanelteren- and -xtreg, fe- are compared 
in a simulation of an arrangement with 40 variably sized clusters of two to 
twelve that are divided into two comparison groups.  The do-file creates a 
skewed distribution of ordered categorical data with five categories.

The performance of -xtreg, fe- isn't too shabby for hypothesis testing with two 
levels of the grouping variable--Null: 55 / 1000 replicates in the simulation 
(-vanelteren-) versus 45 / 1000 (-xtreg, fe-) at a nominal 5% Type I error 
rate; Alternative:  222 / 1000 versus 216 / 1000.  I would expect the findings 
to generally hold up with tenfold the replicates.  

It might be worthwhile to see how well -xtreg, fe- holds up with smaller 
variable cluster sizes (we know what T = 2 is from Sullivan and D'Agostino), 
cluster numbers (smaller samples) and levels of ordered categories (down to 
four or even three).  Not to suggest -xtreg, fe- for *estimation* here.

Joseph Coveney

*L. M.  Sullivan & R. B. D'Agostino Sr., Robustness and power of analysis of 
covariance applied to ordinal scaled data as arising in randomized controlled 
trials. _Statistics in Medicine_ 22(8):1317-34, 2003.


clear
set more off
set seed `=date("2005-08-02", "ymd")'
set obs 12
forvalues i = 1/12 {
    generate float a`i' = 0.5 + 0.5 * (_n == `i')
    local varlist `varlist' latent_variable`i'
}
mkmat a*, matrix(A)
local one_eighth = 1/8
forvalues i = 1/6 {
    local null_means `null_means' 0 0 
    local alternative_means `alternative_means' 0 `one_eighth'   
}
*
capture program drop simem
program define simem, rclass
    syntax namelist, MEANS(numlist)
    drawnorm `namelist', means(`means') corr(A) n(40) clear
    generate byte stratum = _n
    generate byte number_of_replicates = 2 + floor(uniform() * 10)
    reshape long latent_variable, i(stratum) j(observation)
    drop if observation > number_of_replicates
    generate byte manifest_variable = 1
    scalar lowest_cutpoint = 1 / (2 + 4 + 8 + 16)
    foreach multiple in 2 4 8 16 {
        quietly replace manifest_variable = manifest_variable + ///
          (norm(latent_variable) > (1 - `multiple' * ///
          scalar(lowest_cutpoint)))
    }
    generate byte grouping_variable = mod(observation, 2)
    vanelteren manifest_variable, by(grouping_variable) ///
      strata(stratum)
    return scalar vanelteren = r(p)
    xtreg manifest_variable grouping_variable, i(stratum) fe
    return scalar xtregfe = Ftail(e(df_b), e(df_r), e(F))
end
*
simulate vanelteren = r(vanelteren) xtregfe = r(xtregfe), ///
  reps(1000) nodots: simem `varlist', means(`null_means')
generate byte positive_vanelteren = vanelteren < 0.05
generate byte positive_xtregfe = xtregfe < 0.05
summarize positive_*
simulate vanelteren = r(vanelteren) xtregfe = r(xtregfe), ///
  reps(1000) nodots: simem `varlist', means(`alternative_means')
generate byte positive_vanelteren = vanelteren < 0.05
generate byte positive_xtregfe = xtregfe < 0.05
summarize positive_*
simem `varlist', means(`alternative_means')
predict residuals, e
pause on
version 7: kdensity residuals, norm
pause
version 7: pnorm residuals
pause
version 7: qnorm residuals
exit

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: RE: RE: how to use multiple datasets?
Next by Date: Re: st: Poisson regression
Previous by thread: Re: st: Re: vanelteren
Next by thread: st: Correctly identifying families in a household survey
Index(es):
- Date
- Thread