--- Simo Hansen <[email protected]> wrote:
> I am using moter's years of schooling and father's years of schooling
> as explanatory variables in my regression model. I also creata a
> dummy indicators for whether mother's and father's years of schooling
> are missing. <snip> When I run the following regression:
> reg childedyrs dadeduc moteduc misdaded mismoted,
> Stata drops two dummy indicators for whether parents' schooling is
> missing.
Stata (and most other stats packages) ignore observations with missing
values on either the dependent or independent variables. So Stata sees
the variabel misdaded and mismoted only if both are observed and in
that case the dummies will only have the values 0, and is thus a
constant and is thus dropped. The conventional way of dealing with this
is to replace daded and moded with the mean value if it is missing.
Supposedly the dummies no measure how much the child's education
deviates from the mean if the child has missing values on mother's and
father's education respectively.
However, this approach leads to biased estimates and your problem is
likely to be a worst case scenario for this approach. For simplicity,
assume that mother's education is completely observed, so the
regression equation is:
childedyrs = b0 + b1*dadeduc + b2*moteduc + b3*misdaded
if father's education is observed the regression becomes:
childedyrs = b0 + b1*dadeduc + b2*moteduc + b3*0
childedyrs = b0 + b1*dadeduc + b2*moteduc
if father's education is missing the regression equation becomes:
childedyrs = b0 + b1*dadeduc + b2*moteduc + b3*1
but now notice that dadeduc is now a constant: for these cases they
were all replaced by the mean value so we now have a constant equal to
b0 + b1*dadeduc + b3. Call this constant b0'. So we can rewrite the
regression equation as:
childedyrs = b0' + b2*moteduc
So the effect of mother's education is the effect controlled for
father's education if father's education is observed, and the effect
not controlled for father's education if father's education is not
observed. The parameter you will find is some weighted average of these
two effects. The ``uncontrolled'' effect gets more weight as the
proportion of missing values increases. The ``controlled'' and
``uncontrolled'' effects are more different if father's and mother's
education are more correlated. In my experience the proportion of
missing values in father's and mother's education tends to be pretty
high and the correlation of levels of education between partners is
amonght the highest nontrivial correlation produced by social
processes. So your problem is a worst case scenario for this method.
To controll for missing values you could do multiple imputation with
-ice-. Another option is to use -hotdeck-
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/