Hi ,
Carlo..thanks for your reply.My main problem is the skewness and small
sample size. In the summary stats I posted the N is large as it is for
the whole sample but when I analyse subsamples there are some every
small samples. i.e less than 20.
The bootstrap seems like a good idea. Can I do something as simple as
bootstrap r(t) reps(1000) saving(c:\), ttest var, by(catvar) unpaired unequal
or is it something more involved as below?
bootstrap r(mean) if catvar=="cat1", reps(1000):sum var
matrix mu_1=e(b)
matrix sterrsq_1=e(V)
bootstrap r(mean) if catvar=="cat2", reps(1000):sum var
matrix mu_2=e(b)
matrix sterrsq_2=e(V)
scalar Z=((mu_1[1,1]- mu_2[1,1])/sqrt(sterrsq_1[1,1]+ sterrsq_2[1,1]))
scalar p=(1-normal(abs(z)))*2
di "z-value: "[Z]
di "p = "[p]
thanks very much for your help
regards
rich
2008/9/27 Carlo Lazzaro <[email protected]>:
>
> Dear Rich,
> about your concerns about bad-behaved ttest, why don't try the following
> steps:
>
> bootstrap your untransformed data;
> take a look at the resulting sampling distribution; perform a bootstrap
> ttest; calculate how many times the t_bootstrap is >= t_original and =<
> t_original contrast the obtained bootstrap p_value with the original one
> ---------------------------begin example-----------------------------------
> set obs 100
> g A=10*(uniform())
> g B=15*(uniform())
> swilk A B // Prob>z_A=0.00030; Prob>z_B=0.00032 // Both A and B are not
> normal ttest A == B, unpaired unequal //t = -5.6293 and Pr(|T| > |t|) =
> 0.0000 return list scalar t=r(t) summarize A, mean replace A=A-r(mean) +
> 6.198467 summarize B, mean replace B=B-r(mean) + 6.198467 sum A B bootstrap
> r(t), reps(10000) saving(C:\Documents and
> Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta, every(1)
> replace)verbose : ttest A == B, unpaired unequal save "C:\Documents and
> Settings\carlo\Documenti\Statistiche\Stata\Richard_preboot.dta", replace use
> "C:\Documents and
> Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta", clear count if
> _bs_1>=5.6293 //= 0 count if _bs_1<=-5.6293 //= 0 //bootstrap
> p-value=(0+0)/10000=0 confirm the p-value calculated on the grounds of the
> bad-behaved ttest.
> ------------------------------end example-----------------------------------
>
>
> About adding an arbitrary constraining or constant in the occurence ob log
> transformed data, I would refer you to a debate on this list held at the end
> of the last March and raised by a question on this topic. To sum up the
> results of the abomentioned debate, the answer was negative.
>
> However, so called shifted log transformation (that is, adding a constant
> before taking logs in order to make the retention of zeros in the data
> feasible), are reported in the literature concerning health care programmes
> cost comparison (please see, for a thorough review and many useful comments
> on this issue Barber JA, Thompson SG. Analysis of cost data in randomized
> trials: an application of the non-parametric bootstrap. Statist. Med. 2000;
> 19:3219-3236). As usual, the main problem is in your way back (that is, in
> back transforming from log in the original metric: that's a reason why I
> prefer non-parametric bootstrap for analysing skewed cost data).
>
> HTH and Kind Regards.
>
> Enjoy your W-E,
>
> Carlo
> -----Messaggio originale-----
> Da: [email protected]
> [mailto:[email protected]] Per conto di Richard Harvey
> Inviato: sabato 27 settembre 2008 10.15
> A: [email protected]
> Oggetto: st: ttest and log transformation
>
> Hi all,
>
> I hope I can ask a fairly basic stats question. I have a variable that
> i need to compare across two groups.
> the summary stats for the variable NAN across the groups is as below.
> The negative values are legitimate.
>
> group | N mean p50 max
> min skewness kurtosis
>
> group1 | 2537 -77535 5278 19051350
> -46844688 -11.23 311.1
> group2 | 3031 -211373 4620 4609996
> -32617714 -11.18 185.6
> Total | 5568 -150391 4958 19051350
> -46844688 -11.33 278.4
>
> If a do a ttest on the log transformed data, is it appropriate to add
> an arbitrary constraint to make the negative values positive? Is the
> ttest indeed any good for this data, or should I be looking at some
> non parametric tests.
>
> to make the numbers more manageble is divide by 1000,000 and the
> summary stats look like this
>
> group N mean p50 max
> min skewness kurtosis
>
> group1 2537 -.07753 .005278 19.05 -46.84
> -11.23 311.1
> group2 3031 -.2114 .00462 4.61
> -32.62 -11.18 185.6
> Total 5568 -.1504 .004958 19.05
> -46.84 -11.33 278.4
>
> Is it right to perform ttest on ln((NAN/1000000)+50) ? changing the
> constant i add dosent seem to make a difference.
>
> stats on ln((NAN/100000)+50) is as below
>
> group N mean p50 max
> min
> skewness kurtosis
>
> group1 2537 4.604 4.605 4.78 3.973
> -17.21 527.4
> group2 3031 4.603 4.605 4.65
> 4.21 12.74 242.9
> Total 5568 4.604 4.605 4.78 3.973
> -15.94 469
>
> There is still a large negative skewness coefficient. To me this
> looks like not a situation for a ttest and I should be looking at
> some non parametric test. Is that right?
>
> The results from the ttest using the unpaired and unequal option,
> using the untransformed and using ln((NAN/100000)+50) are as below
>
> transformation t p 95%
> CI
> None 3.25 .0011
> 53205.45-214470.8
> log(50+var) 2.75 .0060
> .000367 - .002185 ( I understand this has to be back transformed)
>
> a ranksum test on the logtransformed NAN shows a z of 3.3999 with a p
> of .0007.on the untransformed NAN it is 3.396 with p of .0007
>
> so overall, there dosent seem to be any change in the conclusions,
> what ever test I use. But is the ttest procedure appropriate?
>
> You help is much appreciated.
> --
> thanks for your time
> rich
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
thanks for your time
rich
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/