Calendar Views on Consumption

In this project, we started tackling the visualization of consumption patterns in 729 Portuguese supermarkets, from May 2012 to April 2014. In order to represent the data, the Calendar View uses a small-multiples approach to show deviations from typical consumptions. Small-multiples, can use the display space efficiently by maximizing data density as they minimize the use of ink. We analyzed the consumptions by Department (the highest level on the product hierarchy), by Business Unit, and finally by Product (bottom level).
 
 

Calendar: Drinks

 
Looking at drink’s consumption, noticeable positive deviations can be found for Christmas, New Year’s Eve, on Summer and on other Portuguese Holidays such as São João, on July 23-34.
 

calendars_drinks-02
Figure 1

Positive and negative deviations in the Drinks Business Unit, July 2012 – April 2014



Calendar: Cod Fish

 
Having cod fish during Christmas Eve is an important Portuguese tradition. That can be correlated with the positive deviations observed on December 2013. Furthermore, high deviations can also be found for July 27-28 coinciding with discounts on cod fish during that period.
 

calendars_cod-04
Figure 2

Positive and negative deviations in the consumption of cod fish, March 2013 – March 2014.



Calendar: Condoms

 
Portuguese people also enjoy fishing in other waters. Condom’s sales increase on Valentine’s Day, New Year’s Eve, Summer overall and on May 1st – Labor’s Day.
 

calendars_cond-02
Figure 3

Positive and negative deviations in the sales of condoms, July 2012 – April 2014



Data

 
The dataset has a size of 278 GB for 2.86 billions of transactions in 729 supermarkets, from May 2012 to April 2014. Each transaction has the product acquired, the time of purchase, and the anonymized customer id.
 
 

Finding Deviations: Baselines

 
After an initial analysis, we detected the repetition of a weekly behavior for most of the weeks. Having this periodic behavior, we created a mechanism to emphasize atypical days by computing a week-based baseline. The baselines are computed by clustering similar normalized patterns. Two patterns are considered similar if the Euclidean distance is less than a certain threshold. Our clustering approach is a centroid-based algorithm that assigns points to a cluster accordingly with their distances to the cluster’s centroid.
 

baseline_clusters-13
Figure 4

First two clusters of the Business Unit Fruits and Vegetables.



Visualization: Calendar View

 
We developed a Calendar View to create an overview of the deviations from the baselines, enabling the comparison between deviations. Each day is represented by a rectangle. The top and bottom edges of the rectangles represent, respectively, the maximum and minimum consumption values. The baseline is a black horizontal line positioned over the rectangle. From each baseline, we draw a rectangle, with a height corresponding to the deviation in consumption for that day, being red, if it is positive, and Persian green, if it is negative.
 

sonae_scheme-12
Figure 5

Scheme of the calendar design


This visualization gives us two levels of information: (i) a general overview of all the days with higher deviations, and (ii) a local view that enables to quantify the specific deviation. The Calendar View highlights the deviations along time, eliminating periodic repetitions, and emphasizing singular moments.
 
 

Interactive Application

 
We have created an application to ease the selection and exploration of calendars for any product category.
 

1_Calendar-01
Figure 6

Application: Hello Screen


 
2_Calendar-01
Figure 7

Application: Selecting a calendar


 
4_Calendar-01
Figure 8

Application: Browsing


 

Author

Catarina Maçãs

Pedro Miguel Cruz

Evgheni Polisciuc

Hugo Amaro

Penousal Machado


Acknowledgements

This project was developed within a partnership with Sonae: Sonae Viz — Big Data Visualization for retail. Special thanks to Tiago Carvalho, Frederico Santos and João Amaral.