There is a literature on censored regressors. A quick Google
search on
"censored regressors" turned up, for example:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1071239
http://web.mit.edu/tstoker/www/research.htm
The original poster has not yet responded to Al Feiveson, so we
do not
know whether the regressor "x", say, is "censored" in the technical
sense (has unobserved values <0 that are coded as zero). If there
are
many zeros, perhaps x was generated by one of two processes: the
first
for whether x would be non-zero, the second for the value of x if it
were non-zero.
One way to handle a mixture would be to generate two variables, an
indicator that x is zero and, for non-zero x, the actual value.
x_zero = x ==0 & x<.
x_pos = x*(x > 0 & x<.) or xlog_pos = log(x)*(x > 0 & x<.)
Insert x_zero and either x_pos or xlog_pos into the predictor
list. In
fact, it is not necessary to choose between logged and unlogged
versions; -fracpoly- could model the best transformation of x_pos.
The references above suggest that the indicator approach is
biased if x
is truly censored.
-Steve
On Jan 8, 2009, at 11:29 AM, Lachenbruch, Peter wrote:
Since the goal is to look at a logarithmic relationship, I'm
wondering
if using glm with a log-link for a normal family wouldn't be
helpful.
That way you don't need to worry about 0 values.
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
Feiveson,
Alan H. (JSC-SK311)
Sent: Wednesday, January 07, 2009 11:54 AM
To: [email protected]
Subject: st: RE: RE: RE: RE: how to deal with a censored and skewed
regressor?
If there were other X-variables, one way (probably not the best)
would
be to use multiple imputation. More generally, some sort of
structural
model that relates Y to true X and includes the censoring mechanism
could be estimated (ha!). I suspect there are econometric models
out
there that do this sort of thing - possibly even already
programmed in
Stata.
AL F.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 1:06 PM
To: [email protected]
Subject: st: RE: RE: RE: how to deal with a censored and skewed
regressor?
And how would you do that? Other than knowing that c.i.s and P-
values
are not as good as they seem, what difference does this
knowledge make
to what you do?
Nick
[email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
Feiveson,
Alan H. (JSC-SK311)
Sent: 07 January 2009 18:27
To: [email protected]
Subject: st: RE: RE: how to deal with a censored and skewed
regressor?
Nick wrote: "If you regard such a regressor as error-free, as one
usually does, then I am not clear that procedure need otherwise be
affected."
But if the variable (say X )is censored, then it's real value is
unknown
except for an upper or lower bound and there is error ,hence
bias in the
regression parameter estimates if X is used as is. So in
mbaier's case,
if X is really censored at zero, that means it's true value is some
negative number. This needs to be taken into account in the
estimation.
Al Feiveson
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 12:09 PM
To: [email protected]
Subject: st: RE: how to deal with a censored and skewed regressor?
(x - r(min)) / (r(max) - r(min))
does not yield missing when r(min) is 0 unless x is missing or r
(max) is
also zero. But that's neither here or there. The above is just a
linear
rescaling of a variable and will thus leave skewness unchanged.
Skewness of a regressor is not itself fatal to anything.
Censoring of a regressor is something to take account of in
interpretation. If you regard such a regressor as error-free, as
one
usually does, then I am not clear that procedure need otherwise be
affected.
Nick
[email protected]
mbaier
I tried to transform it according to ln(skewed variable), but my
regressor has a lots of values at zero, for which ln is not
defined. I
also tried to create an index like I=100*(x-r(min))/(r(max)-r
(min)),
which again leads to many missings (due to many x's being zero).
What can I do?
Besides, do I have to account for the censoring of my regressor?
If so,
what can I do?
w.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/