|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Confidence interval for a median with weighted, clustered data
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
Re: st: Confidence interval for a median with weighted, clustered data |
Date |
Tue, 23 Dec 2008 18:47:01 -0500 |
As is apparent from Roger's argument, his method will estimate a CI
for any quantile, not just the median. All that is necessary is to
add the corresponding centile() option to the -cendif- command. The
median is the default for -cendif-, so the centile() option does not
appear in his code.
-Steve
On Dec 23, 2008, at 12:49 PM, Steven Samuels wrote:
In correspondence, Roger Newson ([email protected]) described
to me a method for computing a CI for a median with weighted,
clustered, data. With his permission, I am posting his description
and the do-file which accompanied it. (I have slightly edited both
to delete some extraneous material.) The do file utilizes Roger's
program -cendif- (part of -somersd- available from SSC) and his
commands -keyby- ,-xcontract-, and -expgen- .
-Steve
From Roger Newson:
I would use the following method to make cendif calculate
confidence intervals for a median (possibly with clusters and/or
weights). This method involves the concept of between-scenario rank
statistics, exemplified by the Gini coefficient and the population
attributable risk. (See Subsection 5.3 of Newson (2006).)
I would use my own expgen package, downloadable from SSC, to
duplicate every observation in the sample, replacing each
observation with a pair of identical observations, and generating a
copy identifier variable named scenario, equal to 1 for the first
member of a pair and equal to 2 for the second member of a pair. I
would then set the value of the variable whose median I wanted to
know to zero in all the second members of all pairs (where
scenario==2). I would then use cendif, with the options
by(scenario) funtype(vonmises)
and with a cluster() option putting both members of each pair in
the same cluster, to calculate a median difference between the
subsamples with scenario==1 and scenario==2. This should definitely
produce a valid confidence interval, with or without weights, for
the median difference between individuals sampled from the world as
we know it (Scenario 1) and a hypothetical fantasy world in which
the variable whose median we are estimating is always zero
(Scenario 2). This median difference is (of course) simply the
median of the variable whose median we wanted to know.
To illustrate my method I have run an example do-file, appended.
I have set options
transf(iden) tdist
which, based on my own simulations so far, are usually a good idea
when estimating median differences. I also set options
by(scenario) funtype(vonmises)
and the appropriate cluster options. To produce an "unclustered"
estimate, I have used the option cluster(make), implying 1 cluster
for each car model. (Therefore, each observation in the old auto
data corresponds to a cluster of 2 observations in the duplicated
dataset.) To produce a "clustered" estimate, I have used the option
cluster(firm), where firm is a derived variable, containing the
first word of make. This cluster variable implies that we are
sampling firms from a population of firms, instead of sampling car
models from a population of car models. For both the clustered and
the unclustered case, I have produced an unweighted estimate, and
also a weighted estimate, using pweights to weight non-American
cars higher, and therefore producing a higher median mpg.
References
Newson R. Confidence intervals for rank statistics: Somers' D and
extensions. The Stata Journal 2006; 6(3): 309-334. Download
pre-publication draft from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm
**************************CODE BEGINS**************************
#delim ;
version 10.1;
*
Demo of confidence interval for median using cendif.
This program uses the SSC packages somersd, keyby, xcontract, and
expgen.
*;
clear;
set memory 1m;
sysuse auto, clear;
keyby foreign make;
gene firm=word(make,1);
lab var firm "Firm";
gene pwt=foreign + 0.25*!foreign;
lab var pwt "Probability weight";
desc;
xcontract foreign pwt, list(,);
xcontract firm, list(,);
preserve;
expgen =2, copy(scenario);
replace mpg=0 if scenario==2;
*
Unclustered analyses
*;
cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises)
cluster(make);
cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype
(vonmises) cluster(make);
*
Analyses clustered by firm
*;
cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises)
cluster(firm);
cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype
(vonmises) cluster(firm);
restore;
exit;
***************************CODE ENDS***************************;
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/