[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy and pweight postestimation tools

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: svy and pweight postestimation tools
Date	Thu, 22 Jan 2009 08:31:48 -0500

---

Carissa, rather than hand-waving about an "obvious" fact, I thought Ishould write down the argument in detail. The main point is:Probability weights can be fractional whereas frequency weights areintegers. However for a given set of probability weights, frequencyweights can be constructed which are equivalent to the originalprobability weights, to any chosen degree of accuracy.

The first four facts below are the ones obvious"from inspection ofthe equations for estimating population parameters from survey data.See, for example, equations 3.6-2 and 3.6-3 in the Korn and Graubardbook and equations 5.20 and 11.10 in Sharon Lohr, 1999, Sampling:Design and Analysis. Duxbury Press.

1. The estimating equations with probability weights are functionsof weighted means of individual observations. Each of these means isa ratio of weighted sums.2. In each of these sums, an observation is multiplied by itsprobability weight.

3.If the probability weights are integers, the equations areidentical to those that would be set up for frequency-weightedobservations with the same weights.

4. If probability weights are multiplied by the same constant,estimating equations and, hence, the estimators, do not change. Thisis because the equations are based on means, and the constant willcancel out in numerator and denominator.

5. If the original probability weight is rounded to the nearest k-thdecimal place (e.g. nearest 10-th) and then multiplied by 10^k, theresult is an integer, which can be used as a frequency weight. Thusit is possible to use frequency weights which are equivalent to theoriginal probability weights to any degree of accuracy desired.(Thanks to Austin Nichols for pointing this out in a previousStatalist post.)

6. If k is sufficiently large, estimates based on these rounded,multiplied probability weights will be equal to the estimates basedon the original weights, again to any desired degree of accuracy


Warning:

The computer algorithms based on frequency weights assume that thesample size is equal to the sum of the weights. If you use theconverted probability weights, this sum will be 10^k times thepopulation size. Therefore standard errors and confidence intervalsbased on these weights will be invalid. Also, too high a value of kmight cause underflow problems if a program computes standard errors.


-Steve

On Jan 18, 2009, at 1:49 PM, Steven Samuels wrote:


Carissa,

I think that the legitimacy is "obvious" from inspection of theformulas for weighted data. Still, here's a demonstration that -lroc- with frequency weights produces the same area under the ROCcurve as a properly probability weighted estimate. I computedprobability-weighted versions of the ROC and AUC with RogerNewson's programs -somersd- and -senspec-, available at SSC. -somersd- computes the AUC (he calls it the "c" statistic); and -senspec- produces sensitivities and specificities for all cutpoints. Both take pweights and -somersd- will take a clustervariable, so that you can compute a proper CI for the area underthe curve. I had to add a zero-zero point to Roger's results beforeplotting. If you want to completely satisfy your committee, justuse the probability-weighted versions. Be sure to zap gremlinsbefore trying this code.


-Steve

**************************CODE BEGINS**************************
sysuse auto,clear
****************************************************
* Frequency weighted analysis
****************************************************
logistic foreign mpg [fw=rep78]
predict phat0
lroc [fw=rep78]

****************************************************
* Probability weights
****************************************************
svyset _n [pweight=rep78]
quietly svy: logistic foreign mpg
predict phat


somersd foreign phat [pweight=rep78], tr(c)
matrix b = e(b)
local auc = b[1,1]
di   "Area under the Curve: " %6.5f `auc'


****************************************************
*  Graph ROC Curve with probability weights
****************************************************

senspec foreign phat [pweight=rep78], sensitivity(sens) specificity(spec)


** Add zero-zero to graph
tempfile t1
save `t1'
clear
input spec sens
1 0
end
append using `t1'
gen ispec=1-spec

twoway (scatter sens ispec , sort(sens ispec) connect(L) mlab(mpg)) (line sens sens)

***************************CODE ENDS***************************

On Jan 17, 2009, at 5:23 PM, Carissa Moffat Miller wrote:


Steve,

I was able to create the ROC curves using your advice aboutconverting the pweights to fweights. However, now a dissertationcommittee member has asked me to justify (provide documentation)of the legitimacy of doing such a conversion. Is the conversionjust to put the pweight in a format that will be accepted by theROC command and artificially calling it an "fweight"?

I was not able to find this specific issue addressed in the belowreference and I have not been able to find another reference. Doyou have any suggested citations?


Carissa

From: [email protected]
Subject: Re: st: svy and pweight postestimation tools
Date: Sun, 23 Nov 2008 12:13:01 -0500
To: [email protected]

Carissa, consider ROC curves (the classification tables are not very
useful in my experience). ROC curves show the trade-off between
sensitivity and specificity. You would usually want population

estimates of these probabilities, so ignoring the weightswouldn't be

wise.

My previous post describes how you can compute residuals. These are
inherently unweighted, because observations with the same covariate
pattern will have the same predicted value, and so have only two
values of residuals (for events and non-events). If you are
comparing mean residuals, you might choose to weight them. See Korn
& Graubard, Analysis of Health Surveys, Wiley, 1999, pp 105-115.

-Steve

On Nov 23, 2008, at 10:40 AM, Carissa Moffat Miller wrote:



Steve and Joao,



Thank you for your suggestions and the information. I had
found the goodness of fit measure do file from your discussions
(svylogitgof)
and thought there might be something similar for the estat clas or
residuals for svy.



All I was trying to say in my note is that the strata and
PSUs account for so little difference in the outcome that if it
were possible
to run residuals or classification tables using just pweights, I
wanted to keep
that option open. Such as:



xi: logistic aepart i.agecat i.Incomequ i.HIGHEDUC female
[pweight=FAWT]



But it appears that I will have the same issues. Thank you
so much for your responses and help.



Carissa





2008/11/22 Steven Samuels :

--

Carissa:

-help logistic postestimation- will show you which commands are
available
after -svy: logistic-. The -esttat clas- command is not one of
them in
Stata 9 or 10. -predict- with a -residuals- option is valid in
Stata 10.1
but not in Stata 9. You _can_ compute your own weighted survey -
linktest-
of fit.

predict hat, xb
gen hat2 = hat*hat
svy: logistic aepart hat hat2 //link test is the significance
of phat2

You can also construct ROC Curves. Use -logistic- with fweights,
the survey
weights rounded to the nearest integer. See the thread at:
http://www.stata.com/statalist/archive/2007-08/
msg00739.html#_jmp0_ .

-Steve


On Nov 21, 2008, at 11:45 AM, Carissa Moffat Miller wrote:


StataList:

I am conducting logistic regression for a complex survey design
using
Stata version 9. I have found in your past discussions and the
user manuals
that many postestimation tests are not appropriate with svy
commands. I have
not found discussion on classification tables and residuals and
have been
unable to get the following commands to work either with an svy
command or
by just using the pweights in Stata.

I have been able to get these to work in another software
program using
the weights, but I'm concerned it isn't appropriately applied.
Can someone
tell me: 1) if these tests are appropriate with complex survey
data or just
pweights, and 2) if so,what are the commands or where would I
find them? or
3) if not appropriate, a reference I might cite?

(Note: The strata and PSUs, when analyzed separately, provide
design
effects almost equal to
1 so the effects in my model are almost entirely from the
weighting. So, I
could get results -except for standard errors - using just the
weights.)

Cheers, Carissa


Syntax and error messages:

svyset APSU [pweight=FAWT], strata (ASTRATUM)
xi: svy: logistic aepart i.agecat i.Incomequ i.HIGHEDUC employed
female
urban

estat clas

{ERROR}: invalid subcommand clas

predict r, residuals
summarize r, detail

{ERROR}: option residuals not allowed



*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




--
----------------------------------------
Joao Ricardo Lima, D.Sc.
Professor
UFPB-CCA-DCFS
Fone: +553138923914
Skype: joao_ricardo_lima
----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: svy and pweight postestimation tools
  - From: Carissa Moffat Miller <[email protected]>
- Re: st: svy and pweight postestimation tools
  - From: Steven Samuels <[email protected]>

Prev by Date: RE: st: RE: lorenz curve
Next by Date: Re: st: RE: lorenz curve
Previous by thread: Re: st: svy and pweight postestimation tools
Next by thread: Re: st: svy and pweight postestimation tools
Index(es):
- Date
- Thread