Steve and Joseph,
Thank you both for your continued devotion to this problem. Here is the code
I used for the comparative models GLM in Stata vs GEE in SAS.
**** GLM in Stata:****
. glm rx tx [aweight = iptw], family(gaussian) link(identity) vce(cluster
personnumber)
(sum of wgt is 4.1810e+05)
Iteration 0: log pseudolikelihood = -459638.49
Generalized linear models No. of obs =
188831
Optimization : ML Residual df =
188829
Scale parameter =
7.617026
Deviance = 1438315.445 (1/df) Deviance =
7.617026
Pearson = 1438315.445 (1/df) Pearson =
7.617026
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
AIC =
4.868274
Log pseudolikelihood = -459638.4948 BIC =
-855694
(Std. Err. adjusted for 7868 clusters in
personnumber)
----------------------------------------------------------------------------
--
| Robust
rx | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
tx | 2.476136 .4841721 5.11 0.000 1.527176 3.425096
_cons | 1.145776 .0553768 20.69 0.000 1.03724 1.254313
----------------------------------------------------------------------------
--
**** GEE in SAS:*****
PROC GENMOD DATA=DATA;
CLASS PERSONNU MONTH;
MODEL RX = TX;
WEIGHT IPTW;
REPEATED subject=PERSONNU / type=exch;
run;
GEE (in SAS) IPTW
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits
Square Pr > ChiSq
Intercept 1 1.1458 0.0094 1.1273 1.1642
14816.1 <.0001
SCCENROL 1 2.4761 0.0128 2.4511 2.5011
37695.3 <.0001
Scale 1 4.1067 0.0067 4.0936 4.1198
Date: Tue, 21 Jul 2009 12:22:35 -0400
From: [email protected]
Subject: Re: st: multiple weights per person in GEE?
- --
Thank you, Joseph. I think that you are right. After I posted, I did some
comparisons of aweights and pweights in -glm-. I saw one extreme case
(weight = response) in which the standard errors differed by a factor of
2-4, but in the other comparisons, the standard errors were within 20% of
each other. I saw nothing like Ariel's discrepancies.
In the absence of a -vce- or -cluster- option, pweights lead to robust
standard errors, whereas aweights lead to OIM (Observed Information
Matrix) standard errors. Perhaps it's time for me to purchase Hardin
and Hilbe's book on Generalized Linear Models.
Here was the extreme case:
sysuse auto,clear
glm rep78 mpg [aweight=rep78], family(poisson) glm rep78 mpg
[pweight=rep78], family(poisson)
- -Steve
On Tue, Jul 21, 2009 at 11:05 AM, Joseph Coveney<[email protected]>
wrote:
> Steve Samuels wrote:
>
> The WEIGHT used by SAS GENMOD is a dispersion parameter weight, which
> should be equivalent to Stata's -aweight-. Weights for IPTW should
> be proportional to probability weights, as in the reference Joseh
> quotes. Ariel did not show us her Stata commands and output, as the
> Statalist FAQ specify, but it's quite likely that she is comparing
> different things.
>
> ----------------------------------------------------------------------
> ----------
>
> You make a good point. Without the specifics from Ariel it will be
> impossible to delve into what might be behind the large differences in
> the regression coefficient standard errors between SAS and Stata.
>
> I wonder whether it has much to do with weights, though. In the
> do-file below, which explores -glm- with varying within-panel weights,
> -pweight- (-iweight-) and -aweight- yield identical coefficients and
> standard errors. The same obtains with -iweight- and -pweight- with
> -xtgee- in the dataset having constant within-panel weights. (There
> is no -aweight- in -xtgee-.)
>
> More important, the M. A. Hernán in the _Stata Journal_ article that I
> cited is the same Miguel Hernán in the SAS literature that Ariel
> cites. I'm guessing that if there were a caveat arising from
> different meanings of "weight" between the two packages in this
> context, then it would have been brought out by the advocates of the
> marginal structural model approach (or the journal's reviewrs) who are
> familiar with both. I suspect that a comparison of the output
> (deviance and pseudolikelihood, coefficients and their standard
> errors, etc.) from following SAS code on the PROC IMPORTed CSV dataset
> file produced by the do-file below would indicate that SAS is handling
> the weights much as Stata does with
> -pweight- (-iweight-).
>
> PROC GENMOD DATA=DUMMY DESCENDING;
> CLASS PATIENT TREATMENT TIME;
> MODEL SCORE=TREATMENT TIME / DIST=BIN;
> REPEATED SUBJECT=PATIENT / CORR=IND;
> SCWGT WEIGHT;
> RUN;
>
> I'd wager that the coefficient standard errors would be essentially
> the same as those from Stata. They might differ in the third
> significant figure, or so, because of the convergence properties of
> the algorithms used by Stata and SAS for fitting these models, default
> tolerance settings, and so on. But I doubt that they would show the
> 0.484 versus 0.013 difference that Ariel reports in SEs.
>
> Like you say, without the specifics, we can only guess at the problem.
> In my experience, GLMs are very sensitive to collinearity among the
> predictors, for example. PROC GENMOD is reported* to be coy about
> convergence failures when it encounters compelete or quasicomplete
> separation with binomial data, for another example. It might pay
> Ariel to take a closer look at the SAS log (and at the iteration log in
Stata) in the discrepant case if he hasn't already done so.
>
> Joseph Coveney
>
> * Paul D. Allison, Convergence Failures in Logistic Regression. Paper
360-2008.
> _SAS Global Forum 2008_
> www2.sas.com/proceedings/forum2008/360-2008.pdf
>
>
> clear *
> set more off
> version 10.1
>
> matrix define Mu = J(1, 5, 0.5)
> matrix define Sigma = J(5, 5, 0.5) + I(5) * 0.5 // findit ovbd //
> findit ridder ovbd , stub(score) means(Mu) corr(Sigma) ///
> seed(`=date("2009-07-21", "YMD")') n(100) clear generate byte
> treatment = 0
>
> tempfile tmpfil0
> save `tmpfil0'
>
> matrix input Mu = (0.4 0.45 0.5 0.55 0.6) ovbd , stub(score) means(Mu)
> corr(Sigma) n(100) clear generate byte treatment = 1 append using
> `tmpfil0'
>
> generate int patient = _n
> reshape long score, i(patient) j(time) generate double weight =
> runiform()
>
> outsheet patient treatment time score weight using dummy.csv, comma
> names
>
> insheet patient treatment time score weight using dummy.csv, comma
> names clear
>
> tabulate time, generate(time)
> tabulate treatment, generate(treatment)
>
> log using comparem.log, text
>
> glm score treatment1 time1-time4 [iweight=weight], cluster(patient)
> ///
> family(binomial) link(logit) nolog
>
> glm score treatment1 time1-time4 [aweight=weight], cluster(patient)
> ///
> family(binomial) link(logit) nolog
>
> glm score treatment1 time1-time4 [pweight=weight], cluster(patient)
> ///
> family(binomial) link(logit) nolog
>
> bysort patient (time): replace weight = weight[1]
>
> xtgee score treatment1 time1-time4 [iweight=weight], i(patient) ///
> family(binomial) link(logit) corr(independent) robust nolog
>
> xtgee score treatment1 time1-time4 [pweight=weight], i(patient) ///
> family(binomial) link(logit) corr(independent) robust nolog
>
> glm score treatment1 time1-time4 [pweight=weight], cluster(patient)
> ///
> family(binomial) link(logit) nolog
>
> log close
>
> exit
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
<http://www.stata.com/help.cgi?search>
> * http://www.stata.com/support/statalist/faq
<http://www.stata.com/support/statalist/faq>
> * http://www.ats.ucla.edu/stat/stata/
<http://www.ats.ucla.edu/stat/stata/>
>
- --
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/