Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Log of the mean vs mean of the log

From	Austin Nichols <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Log of the mean vs mean of the log
Date	Wed, 23 Apr 2014 09:29:20 -0400

Estrella Gomez <[email protected]>:

I misread your first post; I thought you meant to include the
downloads as an explanatory variable in a gravity model (which seemed
an interesting idea, as that might be a proxy for levels of trade that
would obtain without respect to distance between countries). The
gravity model would then be estimated using -glm- with a log link, not
by taking logs and then running a linear regression.  See e.g. refs in
http://www.stata.com/meeting/boston10/boston10_nichols.pdf

If downloads are your depvar, then I can't see how a gravity model is
appropriate, since distance in the traditional sense is irrelevant for
song downloads.

I cannot see why you want ranks at all, but perhaps there was more
information at the start of this post that got cut off:

On Wed, Apr 23, 2014 at 8:40 AM, Estrella Gomez <[email protected]> wrote:
> ranks, that is, the top 300 songs per each country, and I want to use
> this (inverted) variable as a proxy for sales (downloads), because I
> don't have real downloads. I have already done the analysis at the
> song level, but I would also like to aggregate at the country level to
> see the total cross border sales per country. That's why I would like
> to sum all the ranks, because I understand that the sum of all
> (inverted) ranks would be a proxy for total sales from a country to
> another. Then I use this as dependent variable in a gravity equation,
> which requires the use of logarithms, but I'm not clear if first take
> the logarithms of rank and them sum all the logs (by country) or
> either if I should first sum all the ranks (by country) and then take
> the logarithm of this sum
>
> Thank you very much,
> Estrella
>
> 2014-04-23 14:05 GMT+02:00 Austin Nichols <[email protected]>:
>> Estrella Gomez <[email protected]>:
>>
>> Neither sounds right to me.  You want to take the sum over many songs
>> for one country with few downloads, ranking say 200th out of many
>> countries on all those songs, and take the log of the sum?  Or compute
>> the sum of many ln(200) values? What interpretation would this
>> variable have--movements up or down in percentage terms in rank of
>> downloads is some kind of measure of changes in intrinsic propensity
>> to engage in internet trade? I would think you could get much more
>> interesting information by preserving the data at the song level,
>> because it could inform who are likely to be trading partners, if you
>> have country of origin of the song (as least language can play a role,
>> if not other cultural factors). Also, numbers of downloads is no doubt
>> more informative than rank. If you are committed to using ranks
>> instead of numbers, I would think computing ranks from 0 to 1, or
>> 1/200 to 1-1/200 for 200 countries, as "percentile" scores, would be
>> better than raw rank.  For that kind of rank, logit is a more natural
>> transformation than log, but I doubt any transformation is required
>> here--just keep it on the scale from 0 to 1.
>>
>> On Wed, Apr 23, 2014 at 4:51 AM, Estrella Gomez <[email protected]> wrote:
>>> Hi,
>>>
>>> I have a variable that is the number of downloads in a country at the
>>> song level, so each observation is song & artist & number of downloads
>>> & country & rank. I want to aggregate this at the country level and
>>> introduce the sum of the ranks as dependent variable in a gravity
>>> equation. I have aggregated taking the sum of the ranks and then the
>>> logarithm of this sum. My question is: is this correct or should I
>>> instead take first the logarithm of the ranks at the song level and
>>> then take the sum of this logarithms? I am not very clear on the
>>> difference between the sum of the log ranks and the log of the sum of
>>> the ranks
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Log of the mean vs mean of the log
  - From: Estrella Gomez <[email protected]>

Prev by Date: st: Using Aalen's additive hazard model in Stata
Next by Date: st: graph box: outsides color
Previous by thread: Re: st: Log of the mean vs mean of the log
Next by thread: Re: st: Log of the mean vs mean of the log
Index(es):
- Date
- Thread