--
Thank you, Joseph. I think that you are right. After I posted, I did
some comparisons of aweights and pweights in -glm-. I saw one extreme
case (weight = response) in which the standard errors differed by a
factor of 2-4, but in the other comparisons, the standard errors were
within 20% of each other. I saw nothing like Ariel's discrepancies.
In the absence of a -vce- or -cluster- option, pweights lead to robust
standard errors, whereas aweights lead to OIM (Observed Information
Matrix) standard errors. Perhaps it's time for me to purchase Hardin
and Hilbe's book on Generalized Linear Models.
Here was the extreme case:
sysuse auto,clear
glm rep78 mpg [aweight=rep78], family(poisson)
glm rep78 mpg [pweight=rep78], family(poisson)
-Steve
On Tue, Jul 21, 2009 at 11:05 AM, Joseph Coveney<[email protected]> wrote:
> Steve Samuels wrote:
>
> The WEIGHT used by SAS GENMOD is a dispersion parameter weight, which
> should be equivalent to Stata's -aweight-. Weights for IPTW should
> be proportional to probability weights, as in the reference Joseh
> quotes. Ariel did not show us her Stata commands and output, as the
> Statalist FAQ specify, but it's quite likely that she is comparing
> different things.
>
> --------------------------------------------------------------------------------
>
> You make a good point. Without the specifics from Ariel it will be impossible
> to delve into what might be behind the large differences in the regression
> coefficient standard errors between SAS and Stata.
>
> I wonder whether it has much to do with weights, though. In the do-file below,
> which explores -glm- with varying within-panel weights, -pweight- (-iweight-)
> and -aweight- yield identical coefficients and standard errors. The same
> obtains with -iweight- and -pweight- with -xtgee- in the dataset having constant
> within-panel weights. (There is no -aweight- in -xtgee-.)
>
> More important, the M. A. Hernán in the _Stata Journal_ article that I cited is
> the same Miguel Hernán in the SAS literature that Ariel cites. I'm guessing
> that if there were a caveat arising from different meanings of "weight" between
> the two packages in this context, then it would have been brought out by the
> advocates of the marginal structural model approach (or the journal's reviewrs)
> who are familiar with both. I suspect that a comparison of the output (deviance
> and pseudolikelihood, coefficients and their standard errors, etc.) from
> following SAS code on the PROC IMPORTed CSV dataset file produced by the do-file
> below would indicate that SAS is handling the weights much as Stata does with
> -pweight- (-iweight-).
>
> PROC GENMOD DATA=DUMMY DESCENDING;
> CLASS PATIENT TREATMENT TIME;
> MODEL SCORE=TREATMENT TIME / DIST=BIN;
> REPEATED SUBJECT=PATIENT / CORR=IND;
> SCWGT WEIGHT;
> RUN;
>
> I'd wager that the coefficient standard errors would be essentially the same as
> those from Stata. They might differ in the third significant figure, or so,
> because of the convergence properties of the algorithms used by Stata and SAS
> for fitting these models, default tolerance settings, and so on. But I doubt
> that they would show the 0.484 versus 0.013 difference that Ariel reports in
> SEs.
>
> Like you say, without the specifics, we can only guess at the problem. In my
> experience, GLMs are very sensitive to collinearity among the predictors, for
> example. PROC GENMOD is reported* to be coy about convergence failures when it
> encounters compelete or quasicomplete separation with binomial data, for another
> example. It might pay Ariel to take a closer look at the SAS log (and at the
> iteration log in Stata) in the discrepant case if he hasn't already done so.
>
> Joseph Coveney
>
> * Paul D. Allison, Convergence Failures in Logistic Regression. Paper 360-2008.
> _SAS Global Forum 2008_
> www2.sas.com/proceedings/forum2008/360-2008.pdf
>
>
> clear *
> set more off
> version 10.1
>
> matrix define Mu = J(1, 5, 0.5)
> matrix define Sigma = J(5, 5, 0.5) + I(5) * 0.5
> // findit ovbd
> // findit ridder
> ovbd , stub(score) means(Mu) corr(Sigma) ///
> seed(`=date("2009-07-21", "YMD")') n(100) clear
> generate byte treatment = 0
>
> tempfile tmpfil0
> save `tmpfil0'
>
> matrix input Mu = (0.4 0.45 0.5 0.55 0.6)
> ovbd , stub(score) means(Mu) corr(Sigma) n(100) clear
> generate byte treatment = 1
> append using `tmpfil0'
>
> generate int patient = _n
> reshape long score, i(patient) j(time)
> generate double weight = runiform()
>
> outsheet patient treatment time score weight using dummy.csv, comma names
>
> insheet patient treatment time score weight using dummy.csv, comma names clear
>
> tabulate time, generate(time)
> tabulate treatment, generate(treatment)
>
> log using comparem.log, text
>
> glm score treatment1 time1-time4 [iweight=weight], cluster(patient) ///
> family(binomial) link(logit) nolog
>
> glm score treatment1 time1-time4 [aweight=weight], cluster(patient) ///
> family(binomial) link(logit) nolog
>
> glm score treatment1 time1-time4 [pweight=weight], cluster(patient) ///
> family(binomial) link(logit) nolog
>
> bysort patient (time): replace weight = weight[1]
>
> xtgee score treatment1 time1-time4 [iweight=weight], i(patient) ///
> family(binomial) link(logit) corr(independent) robust nolog
>
> xtgee score treatment1 time1-time4 [pweight=weight], i(patient) ///
> family(binomial) link(logit) corr(independent) robust nolog
>
> glm score treatment1 time1-time4 [pweight=weight], cluster(patient) ///
> family(binomial) link(logit) nolog
>
> log close
>
> exit
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/