Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: exploratory data analysis for finding substitutes and complements
From
Cameron McIntosh <[email protected]>
To
STATA LIST <[email protected]>
Subject
RE: st: exploratory data analysis for finding substitutes and complements
Date
Fri, 30 Sep 2011 13:11:16 -0400
Hi Dimitriy,
This type of analysis might be a bit dicey without basket data (record per customer with a transaction date, along with items purchased), but I don't imagine ecological data is completely prohibitive, either -- this is discussed in the Nestorov and Jukić (2003) paper below. I don't know about Stata specifically...
Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (September 19, 2011). Mining Association Rules and Frequent Itemsets: Package 'arules', Version 1.0-6.http://cran.r-project.org/web/packages/arules/arules.pdf http://cran.r-project.org/web/packages/arules/index.htmlhttp://cran.r-project.org/web/packages/arules/vignettes/arules.pdf
Hahsler, M., Chelluboina, S. Hornik, K., & Buchta, C. (2011). The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets. Journal of Machine Learning Research, 12, 2021-2025.http://jmlr.csail.mit.edu/papers/volume12/hahsler11a/hahsler11a.pdf
Zhang, S., & Wu, X. (2011). Fundamentals of association rules in data mining and knowledge discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(2), 97-116.http://onlinelibrary.wiley.com/doi/10.1002/widm.10/pdf ;
Ben Messaoud, R., Loudcher Rabaséda, S. Missaoui, R. & Boussaid, O. (2008). OLEMAR: an On-Line Environment for Mining Association Rules in Multidimensional Data. In D. Taniar, (Ed.), Data Mining and Knowledge Discovery Technologies (pp. 1-35). IGI Global, 2008.http://eric.univ-lyon2.fr/~sabine/adwm_2007.pdf
Khan, A., Baharudin, B., & Khan, K. (2011). Mining customer data for decision-making using new hybrid classification algorithm. Journal of Theoretical and Applied Information Technology, 27(1), 54-61. http://www.jatit.org/volumes/research-papers/Vol27No1/7Vol27No1.pdf
Nestorov, S., & Jukić, N. (2003). Ad-Hoc Association-Rule Mining within the Data Warehouse. Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 8 - Volume 8. Washington, DC, USA: IEEE Computer Society.
Cam
> Date: Fri, 30 Sep 2011 11:34:50 -0400
> Subject: st: exploratory data analysis for finding substitutes and complements
> From: [email protected]
> To: [email protected]
>
> I have a panel data set with store-level sales data for 125 items at a
> chain restaurant. My variables are quantity sold of that item in a
> particular store and time. My data looks like this: store_id, week,
> hot_dogs, burgers, fries, and drinks. For each item, I would like to
> figure out which items are substitutes or complements. For example, I
> would expect hamburgers and fries and hot dogs and fries to be
> complements, while hot dogs and hamburgers to be substitutes. I would
> like to group items into clusters to make some time-series graphs, but
> plotting all 125 items on the same graph is messy.
>
> My first attempt at this involved calculating pairwise correlations
> between items, and grabbing those where the correlation is above some
> threshold X in absolute value. This works reasonably well, but I don't
> want to do this by hand for all the items and my loop-over-items
> approach is slow and inefficient.
>
> Is there a command that can accomplish this for me? Or is there a
> better way of doing this using some sort of clustering algorithm?
>
> DVM
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/