|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svy and pweight postestimation tools
---
Carissa, rather than hand-waving about an "obvious" fact, I thought I
should write down the argument in detail. The main point is:
Probability weights can be fractional whereas frequency weights are
integers. However for a given set of probability weights, frequency
weights can be constructed which are equivalent to the original
probability weights, to any chosen degree of accuracy.
The first four facts below are the ones obvious"from inspection of
the equations for estimating population parameters from survey data.
See, for example, equations 3.6-2 and 3.6-3 in the Korn and Graubard
book and equations 5.20 and 11.10 in Sharon Lohr, 1999, Sampling:
Design and Analysis. Duxbury Press.
1. The estimating equations with probability weights are functions
of weighted means of individual observations. Each of these means is
a ratio of weighted sums.
2. In each of these sums, an observation is multiplied by its
probability weight.
3.If the probability weights are integers, the equations are
identical to those that would be set up for frequency-weighted
observations with the same weights.
4. If probability weights are multiplied by the same constant,
estimating equations and, hence, the estimators, do not change. This
is because the equations are based on means, and the constant will
cancel out in numerator and denominator.
5. If the original probability weight is rounded to the nearest k-th
decimal place (e.g. nearest 10-th) and then multiplied by 10^k, the
result is an integer, which can be used as a frequency weight. Thus
it is possible to use frequency weights which are equivalent to the
original probability weights to any degree of accuracy desired.
(Thanks to Austin Nichols for pointing this out in a previous
Statalist post.)
6. If k is sufficiently large, estimates based on these rounded,
multiplied probability weights will be equal to the estimates based
on the original weights, again to any desired degree of accuracy
Warning:
The computer algorithms based on frequency weights assume that the
sample size is equal to the sum of the weights. If you use the
converted probability weights, this sum will be 10^k times the
population size. Therefore standard errors and confidence intervals
based on these weights will be invalid. Also, too high a value of k
might cause underflow problems if a program computes standard errors.
-Steve
On Jan 18, 2009, at 1:49 PM, Steven Samuels wrote:
Carissa,
I think that the legitimacy is "obvious" from inspection of the
formulas for weighted data. Still, here's a demonstration that -
lroc- with frequency weights produces the same area under the ROC
curve as a properly probability weighted estimate. I computed
probability-weighted versions of the ROC and AUC with Roger
Newson's programs -somersd- and -senspec-, available at SSC. -
somersd- computes the AUC (he calls it the "c" statistic); and -
senspec- produces sensitivities and specificities for all cut
points. Both take pweights and -somersd- will take a cluster
variable, so that you can compute a proper CI for the area under
the curve. I had to add a zero-zero point to Roger's results before
plotting. If you want to completely satisfy your committee, just
use the probability-weighted versions. Be sure to zap gremlins
before trying this code.
-Steve
**************************CODE BEGINS**************************
sysuse auto,clear
****************************************************
* Frequency weighted analysis
****************************************************
logistic foreign mpg [fw=rep78]
predict phat0
lroc [fw=rep78]
****************************************************
* Probability weights
****************************************************
svyset _n [pweight=rep78]
quietly svy: logistic foreign mpg
predict phat
somersd foreign phat [pweight=rep78], tr(c)
matrix b = e(b)
local auc = b[1,1]
di "Area under the Curve: " %6.5f `auc'
****************************************************
* Graph ROC Curve with probability weights
****************************************************
senspec foreign phat [pweight=rep78], sensitivity(sens) specificity
(spec)
** Add zero-zero to graph
tempfile t1
save `t1'
clear
input spec sens
1 0
end
append using `t1'
gen ispec=1-spec
twoway (scatter sens ispec , sort(sens ispec) connect(L) mlab
(mpg)) (line sens sens)
***************************CODE ENDS***************************
On Jan 17, 2009, at 5:23 PM, Carissa Moffat Miller wrote:
Steve,
I was able to create the ROC curves using your advice about
converting the pweights to fweights. However, now a dissertation
committee member has asked me to justify (provide documentation)
of the legitimacy of doing such a conversion. Is the conversion
just to put the pweight in a format that will be accepted by the
ROC command and artificially calling it an "fweight"?
I was not able to find this specific issue addressed in the below
reference and I have not been able to find another reference. Do
you have any suggested citations?
Carissa
From: [email protected]
Subject: Re: st: svy and pweight postestimation tools
Date: Sun, 23 Nov 2008 12:13:01 -0500
To: [email protected]
Carissa, consider ROC curves (the classification tables are not very
useful in my experience). ROC curves show the trade-off between
sensitivity and specificity. You would usually want population
estimates of these probabilities, so ignoring the weights
wouldn't be
wise.
My previous post describes how you can compute residuals. These are
inherently unweighted, because observations with the same covariate
pattern will have the same predicted value, and so have only two
values of residuals (for events and non-events). If you are
comparing mean residuals, you might choose to weight them. See Korn
& Graubard, Analysis of Health Surveys, Wiley, 1999, pp 105-115.
-Steve
On Nov 23, 2008, at 10:40 AM, Carissa Moffat Miller wrote:
Steve and Joao,
Thank you for your suggestions and the information. I had
found the goodness of fit measure do file from your discussions
(svylogitgof)
and thought there might be something similar for the estat clas or
residuals for svy.
All I was trying to say in my note is that the strata and
PSUs account for so little difference in the outcome that if it
were possible
to run residuals or classification tables using just pweights, I
wanted to keep
that option open. Such as:
xi: logistic aepart i.agecat i.Incomequ i.HIGHEDUC female
[pweight=FAWT]
But it appears that I will have the same issues. Thank you
so much for your responses and help.
Carissa
2008/11/22 Steven Samuels :
--
Carissa:
-help logistic postestimation- will show you which commands are
available
after -svy: logistic-. The -esttat clas- command is not one of
them in
Stata 9 or 10. -predict- with a -residuals- option is valid in
Stata 10.1
but not in Stata 9. You _can_ compute your own weighted survey -
linktest-
of fit.
predict hat, xb
gen hat2 = hat*hat
svy: logistic aepart hat hat2 //link test is the significance
of phat2
You can also construct ROC Curves. Use -logistic- with fweights,
the survey
weights rounded to the nearest integer. See the thread at:
http://www.stata.com/statalist/archive/2007-08/
msg00739.html#_jmp0_ .
-Steve
On Nov 21, 2008, at 11:45 AM, Carissa Moffat Miller wrote:
StataList:
I am conducting logistic regression for a complex survey design
using
Stata version 9. I have found in your past discussions and the
user manuals
that many postestimation tests are not appropriate with svy
commands. I have
not found discussion on classification tables and residuals and
have been
unable to get the following commands to work either with an svy
command or
by just using the pweights in Stata.
I have been able to get these to work in another software
program using
the weights, but I'm concerned it isn't appropriately applied.
Can someone
tell me: 1) if these tests are appropriate with complex survey
data or just
pweights, and 2) if so,what are the commands or where would I
find them? or
3) if not appropriate, a reference I might cite?
(Note: The strata and PSUs, when analyzed separately, provide
design
effects almost equal to
1 so the effects in my model are almost entirely from the
weighting. So, I
could get results -except for standard errors - using just the
weights.)
Cheers, Carissa
Syntax and error messages:
svyset APSU [pweight=FAWT], strata (ASTRATUM)
xi: svy: logistic aepart i.agecat i.Incomequ i.HIGHEDUC employed
female
urban
estat clas
{ERROR}: invalid subcommand clas
predict r, residuals
summarize r, detail
{ERROR}: option residuals not allowed
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
----------------------------------------
Joao Ricardo Lima, D.Sc.
Professor
UFPB-CCA-DCFS
Fone: +553138923914
Skype: joao_ricardo_lima
----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/