Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Regression question
From
"Mike Kim" <[email protected]>
To
<[email protected]>
Subject
RE: st: Regression question
Date
Mon, 28 May 2012 09:45:29 -0500
Hi Cam and David,
Thank you for your suggestions. The number of companies are not many (around
150), but I will try your suggestions.
Mike.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Cameron McIntosh
Sent: Saturday, May 26, 2012 7:28 PM
To: STATA LIST
Subject: RE: st: Regression question
Yes, for looking at the effect of AD spending, aggregation may be the most
feasible option. But you may want to also try a different, or at least
supplemental approach. I think that it would be quite interesting to know
which media types clustered most often across companies -- how many
companies are there?
Zhang, S., & Wu, X. (2011). Fundamentals of association rules in data mining
and knowledge discovery. Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 1(2),
97-116.http://onlinelibrary.wiley.com/doi/10.1002/widm.10/pdf
Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (April 23, 2012). Mining
Association Rules and Frequent Itemsets: package arules, Version,
1.0-8.http://cran.r-project.org/web/packages/arules/index.html
Liu, G., Zhang, H., & Wong, L. (2011). Controlling false positives in
association rule mining. Proceedings of the VLDB Endowment, 5(2),
145-156. http://vldb.org/pvldb/vol5/p145_guimeiliu_vldb2012.pdf
Adamo, J.-M. (2001). Data Mining for Association Rules and Sequential
Patterns: Sequential and Parallel Algorithms. New York, NY: Springer
Verlag.
Cam
> From: [email protected]
> To: [email protected]
> Subject: RE: st: Regression question
> Date: Sat, 26 May 2012 16:43:20 -0500
>
> Hi David,
>
> Thank you for your opinion. The data structure is more complicated in
fact.
> Say, there are 50 different media types and each company (i) has
> different number of media spending (from 1 to 50). So, setting all
> these as independent variables is not possible.
>
> Anyway, the regression form I specified below does not seem correct.
> Probably the only way is to aggregate information about (j) and make
> all variables specific to only (i).
>
> Thank you,
> Mike.
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of David
> Hoaglin
> Sent: Friday, May 25, 2012 10:38 PM
> To: [email protected]
> Subject: Re: st: Regression question
>
> Hi, Mike.
>
> I may not understand the structure of your data, but it seems that the
> explanatory variable that you denote by X2(ij) is actually several
> related variables. That is, in your example, j seems to index the
> various media (j = 1 for TV, j = 2 for Newspaper, etc.). In that
> situation, you should treat each of the media as a separate
> explanatory variable, with its own regression coefficient. You might
> learn from the analysis that those coefficients are essentially equal,
> in which case the interpretation would be that what matters is the
> total amount of AD spending. Then you could simplify the model by
> using total AD spending as the explanatory variable, instead of the
> amount spent on each of the types. It seems more likely, however, that
the coefficients for the types of media will differ.
>
> In interpreting the regression coefficients, please keep in mind that
> the set of other explanatory variables in the model is part of the
> definition of each coefficient, and that each estimated coefficient
> reflects the contribution of its explanatory variable after adjusting
> for the contributions of the other explanatory variables.
>
> I hope you are planning to make plots of the data and use various
> regression diagnostics to spot influential observations.
>
> David Hoaglin
>
> On Fri, May 25, 2012 at 9:52 AM, Mike Kim <[email protected]> wrote:
> > Hi all,
> >
> > This question is not about Stata, but I would appreciate your opinion.
> > I wonder whether the following regression (e.g., OLS) makes sense.
> >
> > Y(i) = b0 + b1*X1(i) + b2*X2(ij) + e(i) That is, Y varies by i, but
> > some independent variables vary by i and j. Each
> > Y(i) is repeated j times, so data structure is:
> >
> > Y X1 X2
> > 10 2 1
> > 10 2 2
> > 10 2 3
> > 20 3 4
> > 20 3 5
> > ...
> >
> > For example:
> > i: company, j: adverting spending by media (TV, Newspaper, etc.)
> > REVENUE(i) = b0 + b1*R&D SPENDING(i) + b2*AD SPENDING BY MEDIA(ij) +
> > e(i)
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/