Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: About taking log on zero values
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: About taking log on zero values
Date
Wed, 19 Feb 2014 20:18:21 +0000
Actually the worst thing you can do!
People often get this the wrong way round, and forget what they know
otherwise about logarithms.
However, a very small non-zero value implies a very big negative
logarithm. Far from being a conservative change, it is a radical one.
You _create_ one or more outliers that will dominate any later
analysis.
For convenience assume -log10()-. Now
log10(a millionth) = -6
log10(a billionth) = -9
etc.
So, while arbitrarily small fractions are, agreed, arbitrarily close
to zero, they on a logarithmic scale are arbitrarily far from log(1) =
0, which is what is important once you choose a logarithmic scale.
Nick
[email protected]
On 19 February 2014 19:56, Alfonso Sanchez-Penalver
<[email protected]> wrote:
> Seb,
>
> Stata would interpret the "." as a missing value and thus drop the observation from the estimation. You would thus only be regressing the observations with positive values of the original variable. A simple trick to not lose any observations is to add a very small constant (say 0.00000001) to those zero values before taking logs. That would keep all observations. I'm sure this will have many retractors too.
>
> In your case entering the log of sales as an explanatory variable I guess is to capture nonlinearities in the relationship? If that's the case, to avoid the problem with the zeros, have you thought of entering a quadratic relationship with sales instead of a linear one?
>
> Best,
>
> Alfonso Sanchez-Penalver
>
>> On Feb 19, 2014, at 2:44 PM, Sebastian Say <[email protected]> wrote:
>>
>> Hi my question is about how stata treats a log-transformed variable
>> that draws upon an original variable that contains zero.
>>
>> In my dataset, i have firm sales data but some of them have zero. I
>> created a logsales variable and noticed that those with zeros are
>> indicated as a "."
>>
>> I plan to run a regression, e.g.
>>
>> reg y x1 x2 logsales
>>
>> My question is, how would stata treat these "." if I do not remove them?
>>
>> Technically the "." should be undefined.
>>
>> I've read some papers and they usually put a 1 for those sales data
>> with zeros in them. Is this a usual practice?
>>
>> Thank you very much.
>>
>> Seb
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/