This question is probably an easy one, but I am baffled...
I am trying to run GEE models for subsets of my sample separately: white
girls, black girls, white boys, and black boys. I have been using an "if"
statement before the comma in the regression model, e.g., if
sex==0&race==1, etc. I should also mention that I am limiting the sample
by time of observation as well, so what I really have is: if
sex==0&race==1&period==1|2, for example. I started getting suspicious that
I wasn't doing what I wanted to do when my sample size stayed large. So, I
tried preserving the data set, the dropping male and black and running the
model again, and the n was much smaller.
My question:
Shouldn't using "if" before the comma accomplish the same thing as dropping
those people from the sample? What am I missing?
Below are my examples:
xi: xtgee pul per2Xlagccm period2 lagccm age if
sex==0&racewh==1&period==1|2 [pweight=wt], robust corr(exch)
GEE population-averaged model Number of obs =
2723
Group variable: id Number of groups =
390
Link: identity Obs per group: min =
6
Family: Gaussian avg =
7.0
Correlation: exchangeable max =
7
Wald chi2(4) =
150.68
Scale parameter: 210.9225 Prob > chi2 =
0.0000
GEE population-averaged model Number of obs =
468
Group variable: id Number of groups =
67
Link: identity Obs per group: min =
6
Family: Gaussian avg =
7.0
Correlation: exchangeable max =
7
Wald chi2(4) =
87.58
Scale parameter: 191.5843 Prob > chi2 =
0.0000
Sarah A. Mustillo, Ph.D
Center for Developmental Epidemiology
Department of Psychiatry and Behavioral Sciences
Duke University School of Medicine
Box 3454
Durham NC 27710