«Máquina de Ouver» — From Sound to Type

Typography is considered by many authors the visual representation of language, and through writing, the human being found a way to register and share information. Throughout the history of design, there were many authors that explored this connection between words and their sounds, trying to narrow the gap between what we say, how we hear it and how it should be read. «Máquina de Ouver» is a system that analyses sound recordings of speech and creates a visual representation for its expressiveness, using typographic variables and composition.


As we can see in Figure 1, the pipeline of the system is composed of 4 stages: (1) Annotation, (2) Sound Analysis, (3) Data Treatment and (4) Artifact Generation.


Figure 1

Pipeline of the system.




To be able to dynamically produce the files containing the required data needed for the artifact generation, a prior annotation phase is imperative and requires the transcription of the sound recording.


The software used to annotate the verbal content of the sound file, as well as the later sound analysis, is Praat [1]. Being fairly easy to learn and use, well documented and with well-known results in the scientific community, Praat features a transcription tool and scripting capabilities, enabling the dynamic creation of files containing all the textual information and its corresponding measurement values and timestamps.


Praat annotation files, known as “TextGrids”, can save both textual and time information and, in this case, must be created manually in the Praat application. TextGrid files can store information by layers called “Tiers” that can be of two types: interval tiers and point tiers. For the purpose of this investigation, interval tiers are used to distinguish the different structural pieces of a poem, from the most general to the most detailed: stanzas, verses, words, syllables, and letters — as we see in Figure 2.


Figure 2

Screenshot from the Praat GUI during the annotation process.


This method allows for the next stages of the system to reach the full structure of the poem and the TextGrid file that results from this stage, along with the Wave file, is used as input in the Sound Analysis stage.


Sound Analysis


The TextGrid file with the proper annotation and the Wave file with the speech recording are the inputs to a simple Praat script developed by the authors. This script takes advantage of Praat’s Pitch and Intensity objects and the possibility of creating them dynamically from a given sound file with the Praat Scripting Language. Using these objects and the TextGrid file, this script is able to extract the start and end timestamps, duration, mean intensity and pitch values for every existing interval on each tier of the TextGrid.


This process results in a collection of Comma Separated Values (CSV) files, one for each tier containing all the intended data for each interval of the respective tier (Figure 3). These CSV files are processed in the Data Treatment phase.


Figure 3

Example of one of the CSV files that results from the Sound Analysis step. In this case, the words.csv.


Data Treatment


After the data acquisition from the sound file, there is a data treatment step where the CSV files are read, processed and merged into one single JavaScript Object Notation (JSON) file that represents the full poem.


The JavaScript script developed for this step serves two purposes: getting and writing the poem metadata and filter some possible errors that result from the Sound Analysis step.


Since there is the possibility of some pitch and intensity values being returned and written as “undefined”, this script identifies them and rewrites them with the values resulting from a search for the next or previous interval valid value, depending on its existence.


Figure 4

Screenshot of the final JSON file that results from the Data Treatment process.


The script then finds the minimum and maximum values for pitch, intensity, and duration and saves them as metadata inside the poem JSON object as shown in Figure 4, making it the only input necessary for the Artifact Generation step.


Artifact Generation


The final step in the system chain is the Artifact Generation and due to the authors’ familiarity with Adobe InDesign software and the Processing programming language, the chosen developing environment was the Adobe’s ExtendScript Toolkit with the help of the BasilJS library which makes scripting for Adobe InDesign very similar to Processing programming.


Figure 5

Screenshot of a static artifact generated by the Artifact Generation stage.


After creating a new Adobe InDesign Document, the script loads the poem JSON and maps the sound features’ values to their respective typographic variables. This process is concluded by generating a PDF file with the final artifact (Figure 5).


Current Work


In order to be able to show how typography reacts to the respective sound recording, we are currently working on a solution to automate the generation of video artifacts.






[1] Boersma, Paul & Weenink, David (2019). Praat: doing phonetics by computer [Computer program]. Version 6.0.53, retrieved 26 May 2019 from http://www.praat.org/