GlyphSOMe — Using SOM with Data Glyphs for Customer Profiling

With the possibility of storing customer data, retail companies can improve their marketing strategies, creating promotions and special offers specific for individual customers. The application of information visualisation combined with machine learning methods can facilitate the tasks related to customer profiling, and therefore, the creation of individualised campaigns. More specifically, we argue that clustering and segmentation methods, in particular Self-Organising Map (SOM) algorithms, foster customer characterisation by defining a shopping topology that can distinguish different patterns of consumption. Furthermore, we believe that adding visual descriptors of the shopping behaviours through the means of data glyphs, can further improve the efficiency and efficacy of SOMs. In this project, we present a visualisation method that combines SOMs and data glyphs, with an ultimate goal to reveal purchasing patterns of individual customers. With access to a large and complex dataset on consumption, we also argue that it is possible to identify individualised behaviours and enable the creation of individualised marketing campaigns. Additionally, we apply two SOM projections: the traditional matrix projection, and a novel force-directed projection, for a more detailed view over the clusters of the SOM.




The data used in this project consists of an anonymised dataset of all purchases made within 729 Portuguese super- and hyper-markets from SONAE, a Portuguese retail company. When shopping in these chains, customers tend to use their client cards, enabling the company to track their shopping behaviour. We retrieved the transactions made by different customers between January and December of 2013. Each transaction from the dataset contains the details regarding the purchase (e.g., price, product ID), the client (e.g., zip code, client ID), and the store (e.g., store ID, location). Additionally, all products are categorised according to a product hierarchy that starts with departments and proceeds to the product itself.


SOM Algorithm


We applied a variant of the batch algorithm prepared to handle mixed data — Frequency neuron Mixed Self-Organising Map (FMSOM) [1]. This consists on preserving the original algorithm for handling the numerical part of the data, and extending the neuron prototype with a set of category frequency vectors. The features extracted from the raw data are the following: price, quantity, season, nearest store, department, product necessity, discount. These features characterise each purchase, and thus, will categorise the customers consumption behaviour.


Neuron Representation


To visualise the multiple features of the neurons and to enable their comparison, we created a glyph-based visualisation (Figure 1). We defined different visual mappings to represent each feature (described previously). All neurons base shape is a circle. The other components of the glyph, created to represent the features, are then placed inside or outside the circle.


Figure 1

First row, from left to right, the representations of: (i) the four seasons of the year; (ii) (un)necessary products. Second row: (i) products with discount; and (ii) product bought in the closest store. Third row: (i) quarter circle bar graphs to depict the price and quantity values (both graphs represent the values as depicted in the rightmost image). Fourth row: (i) colours attributed to each Department.



SOM projections


We implemented two different approaches for the positioning of the neurons on the canvas. In the first, we place each neuron within a conventional matrix, commonly applied in the visualisation of SOMs.


Figure 2

Matrix projection


In the second, we place each neuron and transaction from the raw data within a force-directed graph, to represent the relations between neutrons and transactions and achieve a better comprehension of the customer profile.


Figure 3

Force-directed projection




[1] Del Coso, C., Fustes, D., Dafonte, C., No ́voa, F. J., Rodr ́ıguez-Pedreira, J. M., and Arcay, B. (2015). Mixing numerical and categorical data in a self-organizing map by means of frequency neurons. Applied Soft Computing, 36:246–254.