You need to check that -growth- has a fairly smooth
skewed distribution with one shorter left tail
and a longer right tail. If this were so,
-log(growth + 100)- should be
nearly symmetric and so have might some practical justification.
But if I were a paper reviewer or a thesis examiner I would
want to hear that case spelled out step by step. I would still
be worried about replacing what looks like a natural origin
with a fudged one.
As John says, the second approach is very difficult
to justify. In fact, he understated the case against
it, as 0 is emphatically not the lowest possible
logarithm! At most, if negative values are judged
to be in some sense mistaken or irrelevant then
they should be replaced by missing values, not
zeros.
But, most of all, it is not obvious that you absolutely
need to transform -growth- at all.
I have often taught how useful transformations can
be and repeatedly emphasised how logs can make
your life easier. I then find students, understandably,
worried what to do when faced with air temperature
variables which are skewed but with negative values.
Here the situation is less clear-cut: a problem
of negative values with the originals in Fahrenheit is
sometimes "solved" by shifting to Celsius and it
will always be "solved" by shifting to Kelvin, but
in the first case the base remains arbitrary. However,
my answer -- bearing in mind also various physical
considerations -- is usually that the variable is
often best left as is, even if highly skewed.
You don't say what kind of growth you're dealing
with but, whatever it is, zero is surely a natural
origin.
Nick
[email protected]
Wallace, John
> I think its usually a mistake to throw data away. I'd be in
> favour of the
> first approach, as you can do your log transformations, play
> with models,
> etc and then project the results back onto your original
> number line by
> reversing the math. The second case would only make sense to
> apply if the
> negative values were the result of goofy arithmetic where
> negative values
> wouldn't result in reality (negative brightness, or mass for
> instance).
> As long as negative growth makes sense (you aren't starting with a
> population of zero, for example) then its perfectly
> reasonable to add an
> offset to make logarithmic math work...just keep track of
> what the offset
> is.
> I'll leave the stat questions for the statisticians to answer!
[email protected]
> i am transforming a bunch of variables into their natural logs, and i
> have read conflicting advice on how to treat the negative
> values, such
> as growth, which ranges from -99 to +300 in my dataset.
>
> one website suggests i just add +100 to the variable and then log it
>
> gen log_growth = ln(growth+100)
>
> a second website i visited suggests turning all negative values into 0
>
> gen log_growth = ln(growth)
> (75 missing values generated)
>
> recode log_growth .=0
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/