Calendar Views on Consumption
In this project, we started tackling the visualization of consumption patterns in 729 Portuguese supermarkets, from May 2012 to April 2014. In order to represent the data, the Calendar View uses a small-multiples approach to show deviations from typical consumptions. Small-multiples, can use the display space efficiently by maximizing data density as they minimize the use of ink. We analyzed the consumptions by Department (the highest level on the product hierarchy), by Business Unit, and finally by Product (bottom level).
Calendar: Cod Fish
Having cod fish during Christmas Eve is an important Portuguese tradition. That can be correlated with the positive deviations observed on December 2013. Furthermore, high deviations can also be found for July 27-28 coinciding with discounts on cod fish during that period.
The dataset has a size of 278 GB for 2.86 billions of transactions in 729 supermarkets, from May 2012 to April 2014. Each transaction has the product acquired, the time of purchase, and the anonymized customer id.
Finding Deviations: Baselines
After an initial analysis, we detected the repetition of a weekly behavior for most of the weeks. Having this periodic behavior, we created a mechanism to emphasize atypical days by computing a week-based baseline. The baselines are computed by clustering similar normalized patterns. Two patterns are considered similar if the Euclidean distance is less than a certain threshold. Our clustering approach is a centroid-based algorithm that assigns points to a cluster accordingly with their distances to the cluster’s centroid.
Visualization: Calendar View
We developed a Calendar View to create an overview of the deviations from the baselines, enabling the comparison between deviations. Each day is represented by a rectangle. The top and bottom edges of the rectangles represent, respectively, the maximum and minimum consumption values. The baseline is a black horizontal line positioned over the rectangle. From each baseline, we draw a rectangle, with a height corresponding to the deviation in consumption for that day, being red, if it is positive, and Persian green, if it is negative.
This visualization gives us two levels of information: (i) a general overview of all the days with higher deviations, and (ii) a local view that enables to quantify the specific deviation. The Calendar View highlights the deviations along time, eliminating periodic repetitions, and emphasizing singular moments.