Hexagonal Gridded Maps and Information Layers: an Approach for the Exploration and Analysis of Retail Data

In retail business intelligences, and more specifically in an analysis of customer-supermarket relationship, the factors, such as geographic location of customers, demographic distribution, customers’ preferences, accessibility to the store are crucial in decision-making tasks. Visualization is an important tool for analysis and decision-making. This work is a novel approach to the hexagonal gridded maps, which integrates diverse information layers with adaptive zoom. The visualization provides means to: (i) explore and analyze data regarding customer-supermarket relations; (ii) reveal the impact of supermarkets localization on customer preferences; (iii) suggest areas of low coverage by supermarkets. Additionally, the interplay among the complementary graphical layers provided by the visualization increases its exploratory and analytical power.
  

Diverse combinations of graphical layers along with hexagonal gridded map.
Figure 1

Diverse combinations of graphical layers along with hexagonal gridded map.


 
 

Data

 
The datasets of consumptions are retrieved from 729 Portuguese supermarkets and hypermarkets of the Sonae chain, which cover the entire country. The consumptions are linked to the client through a client card issued by house hold, which can be shared by the entire family. Each transaction corresponds to one product bought and it has several properties, such as, spent amount (in euros), date, time, and place of purchase. The data for this project were aggregated by months and were divided in three datasets.
 

  • The dataset A contains the total amounts spent, and the number of visits to the supermarkets for each customer.
  • The dataset B contains customer–supermarket pairs with the spent amounts and visits to a particular supermarket.
  • The dataset C is composed by customer–supermarket pairs with the supermarket preference score.

 
The calculation of the nearest supermarket was done in two ways: (i) through geometric distances – the nearest supermarket in straight line; and (ii) computing distances by traversing the road network – the nearest supermarket taking into account the road network topology.

 
 

Hexagonal Binning

 
Hexagonal binning is a process that produces hexagonal grids of variable resolution to subdivide space according to the density of points in geographical space. Similar to Quadtrees the hexagonal tree uses a hierarchical structure with hexagons on the leaves and a composition of seven copies at higher levels of the tree. The construction of the hexagonal tree and the insertion of points are more sophisticated when compared to the Quadtrees. Since it is impossible to evenly subdivide a hexagon, we employed an algorithm based on the simplified version of Snowflake fractal.
 
The process proceeds in five successive steps:

  1. start with the hexagon and recursively subdivide it until a certain level of depth;
  2. compute the geometry of polygons at each level using bottom-up approach;
  3. insert the points into the tree;
  4. merge brunches that contains no points or the branches that contains the number of points below a certain threshold;
  5. compute composite circles per each cell.

 

Visualization of dataset B in a close view of Lisbon area.
Figure 2

Visualization of dataset B in a close view of Lisbon area.

 
 

Composite Circles

 

The aggregation of points is performed in each cell of the grid and is represented by composite circles. The composite circles, located at the center of each cell, depict the type of points found in the corresponding cell. The number of points of each type is encoded by the area of the corresponding circle. The order in which the circles are drawn depends on the frequency of points of each type – from the inner circle to the outer circle, from high to low numbers, respectively.

Aggregation by hexagonal grid (left) and traditional rectangular grid (right).
Figure 3

Aggregation by hexagonal grid (left) and traditional rectangular grid (right).

Adaptive Zooming

 
Adaptive zooming enables users to identify areas of interest in higher levels of zoom, and to analyze these areas in more detail at low levels of zoom, retaining coherent representation at each zoom level. The map provides different levels of abstraction and granularity depending on the zoom level. In particular, the cells are scaled proportionally to the zoom level, providing different granularities of aggregation. Consequently, the variation in the number of points per cell diminishes as the user zooms in, providing close to optimal representation at the lower levels of zoom.
 

Illustration of the adaptive zooming behavior. Images from left to right correspond to low, middle and high levels of zoom. The visualization represents the values from the dataset B.
Figure 4

Illustration of the adaptive zooming behavior. Images from left to right correspond to low, middle and high levels of zoom. The visualization represents the values from the dataset B.


 
 

Bivariate Color Space

 
To encode dataset A we used the following scheme: the two extremes of the triangle encode high values for consumption and frequency; the top corner encodes low values for both components; the other colors are linearly interpolated. Having the color scheme generated, we map the normalized values as follows: first we find locations of a point on the edge of consumptions (point A) and on the edge of frequency (point B); then to find corresponding color we take the ratio between A and B and project it onto the segment AB. The point C indicate the color that the point on the map should take.

Discrete triangular color space. Red and cyan colors correspond to the upper bound for consumption and frequency of visits, respectively. Yellow color corresponds to zero consumption and frequency of visits. Other colors are linearly interpolated. Points A, B and C exemplify the projection of values to this color space.
Figure 5

Discrete triangular color space. Red and cyan colors correspond to the upper bound for consumption and frequency of visits, respectively. Yellow color corresponds to zero consumption and frequency of visits. Other colors are linearly interpolated. Points A, B and C exemplify the projection of values to this color space.

 
 

Graphical Layers

 
The two additional layers depict different types of information, and have different goals: (i) the identification of territorial domain boundaries and areas with low supermarkets coverage;
 

The blue lines show traditional Voronoi diagram, the black lines display distorted Voronoy diagram in accordance with the road network.
Figure 6

The blue lines show traditional Voronoi diagram, the black lines display distorted Voronoy diagram in accordance with the road network.


 
(ii) the representation of the population densities, providing the user with additional information about the demographics.
 
The shades of gray encode population density, and the data were scaled using predefined maximal threshold.
Figure 7

The shades of gray encode population density, and the data were scaled using predefined maximal threshold.

 
 

Findings

 

The figure below illustrates close view of Lisbon area. As can be observed, areas with high frequency of visits are in most cases located near supermarkets. This behavior is expected, since it is normal to see that the customers who live near supermarket purchase in less quantity but more frequently, and vice-versa. Also it is possible to find areas with greater purchasing power. Finally, the strong correlation with reach neighborhoods is visible, such as Odivelas, central Lisbon, south of Oeiras, and Cascais.

Close view of Lisbon area. Visualization of the dataset A in a time window of May of 2012.
Figure 8

Close view of Lisbon area. Visualization of the dataset A in a time window of May of 2012.

 
Another case study captures the customers that reside around the intersection of three districts – Coimbra, Viseu and Guarda. These areas are mostly covered with mountains. Nevertheless, there seems to live a significant number of customers. As it can be observed, these clients are colored in red, which means that they do not prefer to shop in the nearest supermarket. We discovered that recently SONAE has opened a new store on the intersection of three districts.
  

Visualization of the negative impact of distance.
Figure 9

Visualization of the negative impact of distance.

 
 

In Proceedings

  • E. Polisciuc, C. Maçãs, F. Assunção, and P. Machado, “Hexagonal Gridded Maps and Information Layers: A Novel Approach for the Exploration and Analysis of Retail Data,” in SIGGRAPH ASIA 2016 Symposium on Visualization, New York, NY, USA, 2016, p. 6:1–6:8.

Author

Evgheni Polisciuc

Catarina Maçãs

Filipe Assunção

Penousal Machado


Date

07/12/2016


Acknowledgements

This project is partially funded by Fundação para a Ciência e Tecnologia (FCT), Portugal, under the grant SFRH/BD/109745/2015