SpeechTyper: From Speech to Typographic Composition

Many authors consider typography as what language looks like. Over time, designers explore connections between type design and sound, trying to bridge the gap between the two areas. SpeechTyper is an ongoing system that generates typographic compositions based on speech. Our goal is to create typographic representations that convey aspects of oral communication expressively. The system takes a pre-processed analysis of speech recordings and uses it to affect the glyph design of the recited words. The glyphs’ structure is generated using a system we developed previously that extracts skeletons from existing typefaces.


Figure 1

Results of the system


Analysing Speech Data

We asked 12 people to record a declamation of a prose excerpt from the “Stories and Texts for Nothing”, a collection of stories by Samuel Beckett. We do the word segmentation using Vosk [1], a speech recognition library. We measure the duration time of each word and the loudness of each frequency that constitutes each recording in each frame, calculating the spectrogram for each word. By doing that, we perceive the voice pitch and how the frequencies are distributed. All the collected information is exported to .CSV files to use later in the system.


Designing Glyphs

One of our goals is to create typographic compositions capable of adapting themselves to the variability of voices. As such, the glyphs need to have a set of adjustable parameters. To be able to deform and transform the structure of the glyphs we use our previously developed library that extracts skeletons from existing fonts and converts them into a point list. With the points of the skeleton, we design a method for expanding horizontally the structure of the glyphs. To fill the generated glyphs we take into account a series of parameters (such as axis, contrast, weight, among others). This way we create glyphs that resemble those created in traditional font design.


Designing Creating Typographic Compositions Based on Speech

The speech analysis and the created generative method to design the glyphs are essential to constructing the final system. So after having that, the goal is to translate the speech analysis into typographic/visual aspects. The generated glyphs adapt the colour of each of its circles that composed the filling according to the frequency it is being visualized. To represent the variations in amplitude we use the size of the circles. The weight of the glyphs and their contrast is affected by the volume of the speech. The structure of the glyph gains the colour of the maximum frequency and its stroke weight changes according to its intensity. We represent the duration of each word with the expansion of the skeleton of the letters that compose it.
After that, we add the time component to the system. Considering that we have time values, by just having a static representation of the text we would be losing information. Thus, for each frame, the system changes the representation of the word following what is being said at the moment.

Related links