Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: somersd resampling question
From
Roger Newson <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: somersd resampling question
Date
Mon, 1 Nov 2010 11:33:03 +0000
Hi Al. As I understand it (correct me if I'm wrong), you have 2
multinomial lists of frequencies of an ordinal multinomial yariable for
2 groups of independent observations, and aim to measure ordinal
correlation between membership of Group A (instead of Group B) and the
ordinal variable. I will call the Group A membership indicator -groupa-,
the ordinal variable -y-, and the cell frequency variable -cfreq-, and
assume that you start with a dataset with 1 observation per table cell,
sorted (and keyed uniquely) by -groupa- and -y-.
Normally, I would estimate Somers' D of -y- with respect to -groupa- by
typing
somersd groupa y [fwei=cfreq], tdist transf(z)
which calculates a standard delta-jackknife asymmetric confidence
interval, using the t-distribution and the Fisher z-transform. However,
if you want to use the bootstrap or some other resampling method, then
the -expgen- package, downloadable from SSC, can expand your dataset to
have 1 observation per unit (whatever kind of unit -groupa- and -y- were
measured on). As in:
expgen =cfreq, sortedby(group) copyseq(unit)
where -unit- is the sequence number of the unit within its cell. After
-expgen- has run, the dataset in memory will have 1 observation per
unit, and will be sorted (and keyed uniquely) by -groupa-, -y- and
-unit-. You can then use the bootstrap, or any other resampling method.
As in:
bootstrap, reps(1000): somersd groupa y
I hope this helps.
Best wishes
Roger
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 29/10/2010 20:41, Feiveson, Alan H. (JSC-SK311) wrote:
Hi Roger, Thanks for the idea of setting up artificial clusters, but I don't see how this can be done with two multinomial lists. Anyway, for anyone who might be interested, I've done a small simulation with 23 categories (because that's what I have) and various combinations of sample sizes in each list. It turns out that the ratio of the empirical se to the somersd-calculated SE depends almost completely on the minimum of the two sample sizes and is closer to 1 when the minimum sample size is small.
Each row in the data below corresponds to 1000 simulated multinomial data sets with randomly generated independent cell probabilities - fixed over all 1000 data sets within a row, but varying from row to row.
Try plotting rat (= se_emp/se_calc) against nmin [= min(n1,n2)].
By the way, the purpose of all this is to come up with a quantifiable measure of how similar the distributions are with respect to their general patterns as opposed to actual values, such as might reflected by a chi-squared statistic.
Al Feiveson
n1 n2 se_calc se_emp nmin rat set
60 30 .1439008 .1321683 30 .9184684 1
120 30 .1519339 .1160367 30 .7637313 1
120 60 .1367752 .1034096 60 .7560548 1
240 30 .1501265 .120686 30 .8038954 1
240 60 .1672979 .0987834 60 .5904641 1
240 120 .1612942 .1094221 120 .6784011 1
480 30 .1448482 .121629 30 .8396998 1
480 60 .1544797 .1151996 60 .7457264 1
480 120 .157679 .1038079 120 .6583494 1
480 240 .1655068 .0882562 240 .5332483 1
960 30 .1471903 .1238696 30 .8415608 1
960 60 .1492855 .1071405 60 .7176883 1
960 120 .1490777 .1053668 120 .7067916 1
960 240 .144429 .0809639 240 .5605789 1
960 480 .1958908 .0645837 480 .3296922 1
60 30 .1457042 .1229061 30 .8435318 2
120 30 .1521924 .1159594 30 .7619262 2
120 60 .1486831 .1267989 60 .8528129 2
240 30 .1444352 .1168832 30 .8092432 2
240 60 .1460266 .1109937 60 .7600925 2
240 120 .1626369 .0910218 120 .5596629 2
480 30 .1431084 .127222 30 .8889909 2
480 60 .1533591 .10581 60 .6899495 2
480 120 .1673665 .0932405 120 .5571038 2
480 240 .1370986 .0833428 240 .6079037 2
960 30 .1434537 .1124708 30 .7840216 2
960 60 .1532602 .1213565 60 .7918329 2
960 120 .1626063 .0967448 120 .5949637 2
960 240 .1578968 .0861469 240 .5455902 2
960 480 .1544878 .0632528 480 .4094355 2
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Roger Newson
Sent: Friday, October 29, 2010 12:01 PM
To: [email protected]
Subject: Re: st: somersd resampling question
Resampling is valid with -somersd-, as long as the units resampled are
clusters rather than non-independent observations within clusters. In
your case, if you start with frequency counts and want to use a
resampling method, then you will presumably have to expand the dataset
(using -expgen-, -reshape- or some similar command) to get the units to
be resampled.
I hope this helps.
Best wishes
Roger
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 29/10/2010 17:04, Feiveson, Alan H. (JSC-SK311) wrote:
Hi - I want to use Kendall's Tau-a to characterize similarity between two multinomial samples. My question is whether the resampling in -somersd- to get standard errors is valid when comparing two multinomial samples, since technically the "obervations" (i.e. frequency counts) are not mutually independent. Anyone have an opinion on this?
Thanks
Al Feiveson
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/