«Máquina de Ouver» — From Sound to Type

What is it?

 

«Máquina de Ouver» is a system that analyses speech recordings and represents its expressiveness through typographic manipulation. Despite being the result of a Master’s Thesis in Design and Multimedia, it is still a work in progress due to its promising results and possible integrations.

 

Figure 1

Máquina de Ouver Dynamic Example – Winner of Moscow International Festival Typomania’s Typographic Video Contest 2019

 

Abstract

 

Typography is considered by many authors the visual representation of language, and through writing, the human being found a way to register and share information. Throughout the history of typography, there were many authors that explored this connection between words and their sounds, trying to narrow the gap between what we say, how we hear it and how it should be read.

 

We introduce “Máquina de Ouver”, a system that analyses speech recordings and creates a visual representation for its expressiveness, using typographic variables and composition.

 

Our system takes advantage of the scripting capabilities of Praat, Adobe’s InDesign and Adobe’s After Effects software to retrieve the sound features from speech recordings and dynamically creates a typographic composition that results in a static artefact as a poster or a dynamic one, such as a video.

 

The majority of our experimentation process uses poetry performances as the system input, since this can be one of the most dynamic and richest forms of speech in terms of expressiveness.

 

How does it work?

 

As we can see in Figure 1, the pipeline of the system is composed of 4 stages: (1) Annotation, (2) Sound Analysis, (3) Data Treatment and (4) Artifact Generation.

 

Figure 2

Pipeline of the system.

 

Annotation

 

To be able to dynamically produce the files containing the required data needed for the artefact generation, a prior annotation phase is imperative and requires the transcription of the sound recording.

 

The software used for the annotation process, as well as the later sound analysis, is Praat [1]. Being fairly easy to learn and use, well documented and with well-known results in the scientific community, Praat features a transcription tool and scripting capabilities, enabling the dynamic creation of files containing all the textual information and its corresponding measurement values and timestamps.

 

Praat annotation files, known as “TextGrids”, can save both textual and time information and, in this case, must be created manually in the Praat application. TextGrid files can store information by layers called “Tiers” and we use them to distinguish the different structural pieces of a poem, from the most general to the most detailed: stanzas, verses, words, syllables, and letters — as we see in Figure 2.

 

Figure 3

Screenshot from the Praat GUI during the annotation process.

 

This method allows for the next stages of the system to reach the full structure of the poem and the TextGrid file that results from this stage, along with the Wave file, is used as input in the Sound Analysis stage.

 

Sound Analysis

 

The TextGrid file with the proper annotation and the Wave file with the speech recording are the inputs to a simple Praat script developed by the authors. This script takes advantage of Praat’s Pitch and Intensity objects and the possibility of creating them dynamically from a given sound file with the Praat Scripting Language. Using these objects and the TextGrid file, this script is able to extract the start and end timestamps, duration, mean intensity and pitch values for every existing interval on each tier of the TextGrid.

 

This process results in a collection of Comma Separated Values (CSV) files, one for each tier containing all the intended data for each interval of the respective tier (Figure 3). These CSV files are processed in the Data Treatment phase.

 

Figure 4

Example of one of the CSV files that results from the Sound Analysis step. In this case, the words.csv.

 

Data Treatment

 

After the data acquisition from the sound file, there is a data treatment step where the CSV files are read, processed and merged into one single JavaScript Object Notation (JSON) file that represents the full poem.

 

The JavaScript script developed for this step serves two purposes: getting and writing the poem metadata and filter some possible errors that result from the Sound Analysis step.

 

Since there is the possibility of some pitch and intensity values being returned and written as “undefined”, this script identifies them as outliers and assigns them with the values resulting from a search for the next or previous interval valid value, depending on its existence.

 

Figure 5

Screenshot of the final JSON file that results from the Data Treatment process.

 

The script then finds the minimum and maximum values for pitch, intensity, and duration and saves them as metadata inside the poem JSON object as shown in Figure 4, making it the only input necessary for the Artifact Generation step.

 

Artifact Generation

 

The final step in the system chain is the Artifact Generation and the chosen developing environment was the Adobe’s ExtendScript Toolkit with the help of the BasilJS library which makes scripting for Adobe InDesign very similar to Processing programming.

 

The system is able to produce two types of artefacts: static (image) and dynamic (video).

 

Figure 6

Screenshot of a static artifact generated by the Artifact Generation stage.

 

After creating a new Adobe InDesign Document, the script loads the poem JSON and maps the sound features’ values to their respective typographic variables. This process is concluded by generating a PDF file with the final artefact (Figure 5) or an image sequence that is later used by an After Effects script to create a video merged with the correspondent sound file.

 

Results & Dissemination

 

You can see the full Master’s Dissertation here and the paper part of ARTECH 2019: Proceedings of the 9th International Conference on Digital and Interactive Arts here.

 

Here is an example of a dynamic artefact generated by «Máquina de Ouver». This video was awarded at the Moscow International Festival Typomania‘s Typographic Video Contest 2019 with the distinction of Niklaus Troxler’s Personal Choice.

 

«Máquina de Ouver» was presented at GO! Romaria Cultural 2019 festival and in Porto Design Biennale‘s exhibition “Y, Desenhar Portugal
 — Projetos de Escolas de Design Nacionais”.

 

The project was also starred in national and international media like Público – P3, TSF (PT-PT audio) and Pravda.

 

You can hear a bit about it in 90 Segundos de Ciência – Episode 878 (PT-PT), a science communication radio program on Antena 1.

 

Current Work

 

Currently, «Máquina de Ouver» is being adapted to integrate a Digital Art Installation called “Sonar” to be exhibit at the ELO2020 Conference and Media Arts Festival.

 

 


 

References

 

[1] Boersma, Paul & Weenink, David (2019). Praat: doing phonetics by computer [Computer program]. Version 6.0.53, retrieved 26 May 2019 from http://www.praat.org/

 

  • J. C. e Castro, P. Martins, A. Boavida, and P. Machado, “Máquina de Ouver – From Sound to Type: Finding the Visual Representation of Speech by Mapping Sound Features to Typographic Variables,” in ARTECH 2019: 9th International Conference on Digital and Interactive Arts, Braga, Portugal, October 23-25, 2019, 2019, p. 13:1-13:8.