# Hexagonal Gridded Maps and Information Layers: an Approach for the Exploration and Analysis of Retail Data

In retail business intelligences, and more specifically in an analysis of customer-supermarket relationship, the factors, such as geographic location of customers, demographic distribution, customers’ preferences, accessibility to the store are crucial in decision-making tasks. Visualization is an important tool for analysis and decision-making. This work is a novel approach to the hexagonal gridded maps, which integrates diverse information layers with adaptive zoom. The visualization provides means to: (i) explore and analyze data regarding customer-supermarket relations; (ii) reveal the impact of supermarkets localization on customer preferences; (iii) suggest areas of low coverage by supermarkets. Additionally, the interplay among the complementary graphical layers provided by the visualization increases its exploratory and analytical power.

### Data

The datasets of consumptions are retrieved from 729 Portuguese supermarkets and hypermarkets of the Sonae chain, which cover the entire country. The consumptions are linked to the client through a client card issued by house hold, which can be shared by the entire family. Each transaction corresponds to one product bought and it has several properties, such as, spent amount (in euros), date, time, and place of purchase. The data for this project were aggregated by months and were divided in three datasets.

- The dataset A contains the total amounts spent, and the number of visits to the supermarkets for each customer.
- The dataset B contains customer–supermarket pairs with the spent amounts and visits to a particular supermarket.
- The dataset C is composed by customer–supermarket pairs with the supermarket preference score.

The calculation of the nearest supermarket was done in two ways: (i) through geometric distances – the nearest supermarket in straight line; and (ii) computing distances by traversing the road network – the nearest supermarket taking into account the road network topology.

### Hexagonal Binning

Hexagonal binning is a process that produces hexagonal grids of variable resolution to subdivide space according to the density of points in geographical space. Similar to Quadtrees the hexagonal tree uses a hierarchical structure with hexagons on the leaves and a composition of seven copies at higher levels of the tree. The construction of the hexagonal tree and the insertion of points are more sophisticated when compared to the Quadtrees. Since it is impossible to evenly subdivide a hexagon, we employed an algorithm based on the simplified version of Snowflake fractal.

The process proceeds in five successive steps:

- start with the hexagon and recursively subdivide it until a certain level of depth;
- compute the geometry of polygons at each level using bottom-up approach;
- insert the points into the tree;
- merge brunches that contains no points or the branches that contains the number of points below a certain threshold;
- compute composite circles per each cell.

### Composite Circles

The aggregation of points is performed in each cell of the grid and is represented by composite circles. The composite circles, located at the center of each cell, depict the type of points found in the corresponding cell. The number of points of each type is encoded by the area of the corresponding circle. The order in which the circles are drawn depends on the frequency of points of each type – from the inner circle to the outer circle, from high to low numbers, respectively.

### Adaptive Zooming

Adaptive zooming enables users to identify areas of interest in higher levels of zoom, and to analyze these areas in more detail at low levels of zoom, retaining coherent representation at each zoom level. The map provides different levels of abstraction and granularity depending on the zoom level. In particular, the cells are scaled proportionally to the zoom level, providing different granularities of aggregation. Consequently, the variation in the number of points per cell diminishes as the user zooms in, providing close to optimal representation at the lower levels of zoom.

### Bivariate Color Space

To encode dataset A we used the following scheme: the two extremes of the triangle encode high values for consumption and frequency; the top corner encodes low values for both components; the other colors are linearly interpolated. Having the color scheme generated, we map the normalized values as follows: first we find locations of a point on the edge of consumptions (point A) and on the edge of frequency (point B); then to find corresponding color we take the ratio between A and B and project it onto the segment AB. The point C indicate the color that the point on the map should take.

### Graphical Layers

The two additional layers depict different types of information, and have different goals: (i) the identification of territorial domain boundaries and areas with low supermarkets coverage;

(ii) the representation of the population densities, providing the user with additional information about the demographics.

### Findings

The figure below illustrates close view of Lisbon area. As can be observed, areas with high frequency of visits are in most cases located near supermarkets. This behavior is expected, since it is normal to see that the customers who live near supermarket purchase in less quantity but more frequently, and vice-versa. Also it is possible to find areas with greater purchasing power. Finally, the strong correlation with reach neighborhoods is visible, such as Odivelas, central Lisbon, south of Oeiras, and Cascais.

Another case study captures the customers that reside around the intersection of three districts – Coimbra, Viseu and Guarda. These areas are mostly covered with mountains. Nevertheless, there seems to live a significant number of customers. As it can be observed, these clients are colored in red, which means that they do not prefer to shop in the nearest supermarket. We discovered that recently SONAE has opened a new store on the intersection of three districts.