Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: About taking log on zero values
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: About taking log on zero values
Date
Fri, 21 Feb 2014 10:37:24 +0100
I was trying to answer, but noticed I was just repeating my earlier
arguments. I propose that we call this discussion closed with the
somewhat unsatisfying conclusion that we were somehow unable to
comunicate with one another.
-- Maarten
On Fri, Feb 21, 2014 at 12:08 AM, Alfonso Sánchez-Peñalver
<[email protected]> wrote:
> Ok, so the coefficient of on the log of sales should be a semi-elasticity of whatever the dependent variable is and sales, in which a 1% increase in sales increases the average value of the dependent variable in the coefficients number of units. Your estimation method doesnt add any information to ln(sales) and thus we cannot use the coefficient there as the semi-elasticity since it will be biased because of all the zeros, plus the dummy variable captures the average additional effect of all unobserved characteristics that are common to the observations that have zero sales, not just the fact that they have zero sales. How then can we estimate the right semi-elasticity using your method?
>
> It's not a question of whether it is possible to have a negative sale. The fact that it's impossible is what censors the sales variable at zero. It still remains a censored variable, and as such if you're looking to replace the zero values (or ln(0) values in the log transformed variable) the most appropriate methods are methods of estimation for censored variables, like Tobit and Heckman, or other methods for censored variables.
>
> Sorry, but I cannot agree with you, so I guess we must agree to disagree.
>
> Alfonso Sánchez-Peñalver, PhD
>
> Visiting Assistant Professor
> Suffolk University
> Senior Instructor
> UMass Boston
>
>
>
> On Feb 20, 2014, at 4:29 PM, Maarten Buis <[email protected]> wrote:
>
>> On Thu, Feb 20, 2014 at 7:55 PM, Alfonso Sanchez-Penalver wrote:
>>> I don't agree with that. There are two reasons why sales can take the value of zero:
>>>
>>> 1. Because it's an actual zero
>>> 2. Because sales cannot be negative and thus the variable is censored at zero.
>>>
>>> As long as there's one observation with a zero value that is not a true zero the real relationship between the dependent variable y and ln(sales) is broken, and that's what needs to be fixed.
>>
>> I am trying to think of an owner of a neighbourhood shop and I imagine
>> asking her or him if (s)he ever encountered scenario 2. I cannot
>> imagine any other response than a very very very blank stare (if (s)he
>> is polite). Either you sell something or you don't.
>>
>> In other situations you might get a mixture of different meanings of
>> the value 0. Consider a measurement devise that measures a
>> concentration of a substance. Such devises are not infinitly precise,
>> and there will be a concentration below which it can no longer detect
>> the substance. So in such cases the value 0 could mean the substance
>> is totally absent (often unlikely) or the concentration is very very
>> low. However, I cannot imagine how in that case neither Heckman or
>> Tobit would be a solution.
>>
>>> Your methodology only computes the average effect of the observations for which sales is zero, while not accounting for the true variation in the means of the log of sales in the observations where sales equals zero. A Heckman or Tobit intermediate estimation of the ln(sales) will replace the values of ln(sales) with the predicted ones, and thus account for at least the explained variation across those observations, thus producing a better estimate of the coefficient on ln(sales) than your methodology.
>>
>> What you describe is a general measurement problem. If you are worried
>> about that, then you should do much much more than just look at 0s.
>> For that reason alone I would suspect that the Heckman or Tobit
>> solution would make the problem worse rather than better.
>>
>> My rule of thumb is not to stack fragile methods like Heckman or Tobit
>> with other models. In all likelihood you make things a lot worse when
>> you do.
>>
>> -- Maarten
>>
>> ---------------------------------
>> Maarten L. Buis
>> WZB
>> Reichpietschufer 50
>> 10785 Berlin
>> Germany
>>
>> http://www.maartenbuis.nl
>> ---------------------------------
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany
http://www.maartenbuis.nl
---------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/