Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: About taking log on zero values
From
Austin Nichols <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: About taking log on zero values
Date
Thu, 20 Feb 2014 07:18:40 -0500
Sebastian Say <[email protected]>
Sorry, that should be addressed to
Sebastian Say, Alfonso, Jeph, Maarten, and Nick (not just Alfonso):
On Thu, Feb 20, 2014 at 7:16 AM, Austin Nichols <[email protected]> wrote:
> Alfonso:
>
> Whether sales=0 means
> "literally nothing" or "so small that it could not be detected"
> you can't do any of the things suggested without introducing bias.
> In the former case, you must run separate models for cases with and
> without sales (or fully interact by a dummy variable nosales) while in
> the latter case you must multiply impute sales using a sensible model,
> not simply add a constant.
>
>
> On Thu, Feb 20, 2014 at 4:57 AM, Maarten Buis <[email protected]> wrote:
>> One option you could also consider is that you treat the value 0 as
>> special which needs its own effect. This depends whether 0 means
>> "literaly nothing" or "so small that it could not be detected". In the
>> former case you would often want to treat the value 0 as qualitatively
>> different, while in the later case adding a small but not too small
>> number to the 0 values could be justified.
>>
>> In case that you would want to treat the value 0 as qualititively
>> different, then I would do something like this:
>>
>> gen byte nosales = (sales == 0) if sales < .
>> gen logsales = ln(sales)
>> sum logsales, meanonly
>> replace logsales = r(min) if nosales == 1
>> reg y x1 x2 logsales nosales
>>
>> In that case the coefficient for logsales can be interpreted as
>> before, but refers only to sales > 0. The coefficient for nosales
>> represents the difference in expected value of y between those units
>> with no sales at all and those units with the smallest non-zero sales.
>>
>> Hope this helps,
>> Maarten
>>
>>
>> On Wed, Feb 19, 2014 at 9:11 PM, Nick Cox <[email protected]> wrote:
>>> Stata would ignore numeric missings in anything like a regression calculation.
>>>
>>> That applies also to missings that result from calculating log(0).
>>>
>>> Changing values of 0 to values to 1 so that you can take logarithms is
>>> not something I would call "usual practice". It is, I suspect,
>>> regarded differently by different people on a spectrum from unethical
>>> and incorrect to an acceptable fudge, depending partly on the rest of
>>> the data and what you are doing with them.
>>>
>>> An incomplete list of things to think about:
>>>
>>> 0. If values of 1 occur otherwise, you have created an inconsistency.
>>> If values between 0 and 1 occur otherwise, you have created a bigger
>>> one. Applying log(x + 1) consistently solves this problem only by
>>> creating another. Applying log(x + 1) and pretending that it is really
>>> applying log(x) is not widely accepted.
>>>
>>> 1. If 0 really means what it says, changing it to 1 is a
>>> falsification. Whether you can put a spin on it as an acceptable or
>>> necessary falsification is an open question.
>>>
>>> 2. If 0 really means "small but not detected", changing it to e.g.
>>> half smallest observable value is sometimes an accepted or acceptable
>>> modification.
>>>
>>> 3. Replacing log(0) with log(1) is not, necessarily, even a small and
>>> conservative modification. If apart from the values of 0 values range
>>> from e3 to e6 then after logging you have 0 and otherwise a range of 3
>>> to 6. You may have _created_ a bundle of outliers that will dominate
>>> analyses.
>>>
>>> 4. Doing something about 0s is only necessary with logarithmic
>>> transformation. If you have 0s in the response, you can leave them and
>>> use a logarithmic link. That won't necessarily be a good model, but
>>> using a logarithmic link doesn't require positive values in the
>>> response, only that the mean function be always positive. (This
>>> doesn't apply in your case as the variable in question is a
>>> predictor.)
>>>
>>> 5. There are usually alternatives, such as transformations other than
>>> logarithms.
>>>
>>> 6. I wouldn't do anything without considering some kind of sensitivity
>>> analysis, i.e. a consideration of how much difference an arbitrary
>>> treatment of zeros makes.
>>>
>>> 7. There is often an argument that implies that the observations with
>>> zeros don't belong any way.
>>>
>>> (I have generalised your question, but suspect that zero values for
>>> sales usually mean exactly what they say.)
>>>
>>> Nick
>>> [email protected]
>>>
>>> On 19 February 2014 19:44, Sebastian Say
>>> <[email protected]> wrote [edited]
>>>
>>>> My question is about how Stata treats a log-transformed variable
>>>> that draws upon an original variable that contains zero.
>>>>
>>>> In my dataset, I have firm sales data but some of them have values of zero. I
>>>> created a logsales variable and noticed that those with zeros are
>>>> indicated as a "."
>>>>
>>>> I plan to run a regression, e.g.
>>>>
>>>> reg y x1 x2 logsales
>>>>
>>>> My question is, how would Stata treat these "." if I do not remove them?
>>>>
>>>> Technically the "." should be undefined.
>>>>
>>>> I've read some papers and they usually put a 1 for those sales data
>>>> with zeros in them. Is this a usual practice?
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> ---------------------------------
>> Maarten L. Buis
>> WZB
>> Reichpietschufer 50
>> 10785 Berlin
>> Germany
>>
>> http://www.maartenbuis.nl
>> ---------------------------------
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/