Subscribe to JCMS

x

Subscribe to our mailing list to receive updates about forthcoming issues. JCMS will not disclose your details to any third party.


First name

Second name

Email address


Volume 1, Issue 2, March 2017

Formalizing Fado: A Contribution to Automatic Song-Making

Tiago Gonzaga Videira, Bruce Pennycook, Jorge Martins Rosa

Keywords

creative music systems, algorithmic composition, generative music, fado

Abstract

This article is a contribution to the formalisation of the process of song composition, focusing on a case study: Portuguese fado (a performance practice). Based on previous musicological studies, we describe a theoretical model which is parametrised using empirical data retrieved from a representative symbolic corpus. Starting from the model, we present the formalisation of a symbolic artificial intelligence implemented as a generative system. This system is able to automatically generate instrumental music based on the musics and vocal sounds typically associated with fado’s practice. The output of the system is evaluated resorting both to a supervised classification system and to a quasi-Turing test. We conclude that the output of our system approaches the one from the corpus and has relevant pleasantness to human listeners.

1 Introduction

Music composition can be a very complex process. Previous studies suggest that there are as many different creative processes in composing music as there are composers in the world (Collins, 2016; DeMain, 2004). Based on these previous studies and observations, a logical starting point for further investigation in songwriting processes might involve a model which takes into account what best reflects actual practices of various methods of composing. There are many styles and traditions of lyric-based storytelling traditions such as ballads, romances, blues, mornas, country music and most recently rap and hip-hop, all of which could make for excellent case studies. While these styles share many common traces with each other (as well as past and present traditions), there is one style of songwriting in particular which is a very peculiar and unique method of storytelling with very distinctive features: Portuguese fado. Fado is considered Intangible Cultural Heritage by UNESCO,(1) providing further motivation for pursuing this research within our own interests (songwriting), while simultaneously aiding in the preservation and memory of this performance practice.

In this article we will briefly present and characterise the practice of Portuguese fado. The description presented herein is based on previous ethnomusicological and historiographical studies. These studies are complemented with additional research conducted on a representative collection of 100 encoded musical transcriptions. Following the characterisation of fado, we will proceed with a brief discussion about the current state and practice of algorithmic composition. We will refer to several approaches conducted focusing on the type of algorithmic compositional processes being modelled. The previous observations and characterisation of the fado practice and the analyses of the collection of transcriptions will shape and highlight the various processes that inform our own model. We present a generative model, one which will be able to generate instrumental music similar in style and structure to the music of the representative sample collection, starting from some minimal structural features of the characteristic lyrical shape and constrain the subsequent musical materials that follow. Such a model requires our methodology be a hierarchical linear generative process that translates into a series of constraint-satisfaction problems resorting to mixed-techniques. We describe in detail how we implemented our model as a system built using the Symbolic Composer software. Our system consists of a number of modules used to deal with specific musical parameters, from the general structure of the new compositions to melody, harmony, instrumentation and performance, leading to the generation of MIDI files. We propose a method for creating sound files departing from the generated MIDI files. Subsequently, we evaluate the output both by devising an automatic test using a supervised classification system, and also human feedback collected in the context of a quasi-Turing test. We conclude that the output of our system approaches the one from the representative collection and maintains a relative pleasantness to human listeners. However, there is room for improvement. We realise that a module dealing with full lyrical content (including coherent semantic content) might be crucial for optimal results, that the original collection should be enlarged in order to collect more data and that further parametrisation of human performance is needed.

2 Fado

Several previous studies show how the concept and practice of fado historically emerged, what it represents, and how it has changed. These studies have mainly followed the lines of ethnomusicology resorting to ethnographic and historiographic methods namely, (Carvalho, 1994; Castelo-Branco, 1998; Côrte-Real, 2000; Gray, 2013; Nery, 2004, 2010a, 2010b; Videira, 2015). For a more detailed survey see Holton (2006). From these various studies one can conclude that fado tells a sorrowful tale, a narrative emerging from the life experience of the people, using the vocabulary and lexical field appropriate to be shared by the people who relate to the story. This narrative is then embodied and communicated through the means of several gestural codes. The text shapes the form of the music. The structure of the narratives, which are usually organised in stanzas of equal length, translates into strophic musical forms repeated over and over, while more complex texts typically generate more complex forms. Since most people have limited musical knowledge, the most common practice is to play very simple musical patterns (often just arpeggios, other times simple stock figurations and repetitive ostinati), maximising the number of open strings and the easiest fingering positions, using the simplest time signature. Therefore, a binary pulse and a chord progression alternating tonic and dominant is vastly preferred over any other more complex variation. The first description of the music that accompanied fado practice was described on the musical dictionary of Ernesto Vieira:

A section of eight bars in binary tempo, divided into two equal and symmetric parts, with two melodic contours each; preferably in the minor mode, although many times it modulates into the major, carrying the same melody or another; harmony built on an arpeggio in sixteenth-notes using only alternating tonic and dominant chords, each lasting two bars. (Vieira, 1890, p. 238)

If skilled performers are available, then the degree of complexity and inventiveness of a fado tends to increase accordingly. These conditions constrain the way the melody is shaped. In fado practice, the combined ability to manipulate, not only the rhythm of a melody, but also the pitch, is called estilar (styling). Styling is a kind of improvisation, and is similar in practice to that found in other musical styles that rely heavily on improvisation, such as Jazz (Gray, 2013). The rhythm of the melody is dependent both on the prosody of the text and the time signature chosen, while the pitches of the melody are constrained by the underlying chord progression presented in the instrumental accompaniment. The tempo, dynamics and articulations of the same melody are constrained by the emotional (semantic) content of the lyrics and by the physical characteristics of the fadistas (performers of fado). The performer’s body shape, the way he/she positions him/herself, his/her age and health condition all affect and determine the acoustic and phonatory characteristics of the voice (Mendes et al., 2013), thus shaping most of the timbral characteristics of the melody and how they will be perceived by the audience. These various physical and musical elements combine to create a complex network of relationships and interdependencies.

The available musical instruments determine the musical arrangement. The most common ensemble since the twentieth century consists of one classical six-string nylon or steel guitar (named viola, in European Portuguese) and one Portuguese Guitar (pear shaped twelve-string lute, called guitarra). The viola provides the pulse and harmony, guiding the singer. The guitarra complements the voice using stock melodic formulas. Armando Augusto Freire, “Armandinho” (1891–1946), significantly renewed the accompaniment practice by introducing guitarra countermelodies between the sung melodic lines. These were so characteristic that in many cases they were appropriated and integrated by the community into the fados where they were first performed (Nery, 2010b, pp. 131–132; Brito, 1994; Freitas, 1973). The timbral uniqueness of these guitarra gestures, absent from other musical practices and styles, makes them a key feature in the characterisation of fado. During the nineteenth century several written and iconographic sources show fadistas accompanying themselves, often with a guitarra, playing arpeggiated chords. Sources also mention other string instruments, the accordion or the piano (Cabral, 1982; Castelo-Branco, 1994; Cristo, 2014; Nery, 2004; Sá de Oliveira, 2014). We speculate that one would play with whatever was available in the given context. Due to the limitation of recording technologies in the early-twentieth century, fados were often recorded using pianos and wind or brass instruments (Losa, 2013). The instrumental group’s function is to provide a ground for the fadista, to provide a foundation on which to create, and the spatial organisation of instruments in performance reflects this hierarchical organisation. Usually the instrumentalists are on the background, they are seldom mentioned and their interaction with the audience is minimal (Côrte-Real, 1991, pp. 30–35). Several contemporary examples of live fado performance in the context of tertúlia (“informal gathering”) can be found in the YouTube channel “Tertúlia”.(2)

We have complemented the ethnographic and historiographic analyses discussed above with the use of computational musicology on empirical data. Since fado performance predates the era of sound recordings the study of symbolic notation is required to access the music of the earlier periods. There were no previously encoded transcriptions of fado available for computational research. Therefore, we compiled a representative collection of 100 samples to perform our initial research. Given time and budget constraints we privileged the older sources that were already photographed. The Cancioneiro de Músicas Populares, compiled by César das Neves and Gualdim Pais (Neves & Pais, 1893), contains several traditional and popular songs transcribed during the second half of the nineteenth century, 48 of which categorised as fados by the collectors, either in the title or in the footnotes. This set was accompanied by a transcription of Fado do Marinheiro, provided and claimed by Pimentel to be the oldest fado available (Pimentel, 1904, p. 35). We have also had access to the collection of musical scores from the Theatre Museum in Lisbon.(3) The first 51 scores categorised as fados were selected in order to reach the desired sample number of 100 representative works. We estimate that there are at least 200 more works categorised as fados to be found in the collection, and several others in other collections. Future work will be to encode more scores categorised as fados and enlarge the database. These transcriptions represent the practice c. 1840–1970, and one example is shown in Fig. 1.

Serenata à morena
Figure 1 – An edited fado transcription, Serenata à morena (“Serenade to the brunette”) originally found in (Neves & Pais, 1893, p. 26).

These transcriptions are piano reductions (adapted for the domestic market), of both the instrumental accompaniment and vocal line, the vocal line being reduced to an instrumental version. Most of the transcriptions confirm the model described by Ernesto Vieira. The guitarra countermelodies and ornamental motifs are largely absent in the musical scores from the nineteenth century and in the earlier recordings, however they become rather common in later examples. This sample collection has been edited and is currently available as a digital database.(4) This database consists of the musical scores, MIDI files, and analytical, formal and philological commentaries, as well as slots for relevant information (sources, designations, date(s), authorship(s)) for each fado. The creation of this new digital object is relevant for archival and patrimonial purposes. We have applied music information retrieval techniques, followed by statistical procedures on the sample collection in order to identify some patterns and rules for shaping its characteristics. This allowed us to retrieve information regarding form, harmonic progressions and rhythmic patterns, among many other features (Videira, 2015).

3 Algorithmic Composition

During the most recent decade, the references and discussions around the core problems of the field of algorithmic composition grew exponentially. The main references for what constitutes the field of algorithmic composition and its primary achievements rapidly become outdated or redundant. The research and books by Curtis Roads (1996), Eduardo Reck Miranda (2001) and David Cope (1996; 2000; 2004; 2005; 2008) will always be historically important and full of relevant information, taking into account their historical context, but Nierhaus’ (2009; 2015) books on algorithmic composition provide new updates to the field and are quickly becoming references as modern state-of-the-art surveys. Jose David Fernández and Francisco Vico (Fernández & Vico, 2013) also present a very complete and up-to-date survey for the field, and therefore we will use their paper as a point of departure to present a brief overview.

The use of algorithms in music composition precedes computers by hundreds of years. However, Fernández and Vico use a more restricted version and present algorithmic composition as a subset of a larger area: computational creativity. They claim that the problem of computational creativity in and of itself is still difficult to define and formalise. It can essentially be said to be “the computational analysis and/or synthesis of works of art, in a partially or fully automated way” (Fernández & Vico, 2013, p. 513). Fernández and Vico also mention that the most common approaches and problems algorithmic composition had to solve are now systematised and well-defined.

Colton and Wiggins refer that much of mainstream AI practice is characterised as “being within a problem solving paradigm: an intelligent task, that we desire to automate, is formulated as a particular type of problem to be solved. The type of reasoning/processing required to find solutions determines how the intelligent task will then be treated” (Colton & Wiggins, 2012, p. 22). On the other hand, “in Computational Creativity research, we prefer to work within an artifact generation paradigm, where the automation of an intelligent task is seen as an opportunity to produce something of cultural value” (Colton & Wiggins, 2012, p. 22). In their view, computational creativity research is a subfield of Artificial intelligence research, and define it as “The philosophy, science and engineering of computational systems which, by taking on particular responsibilities, exhibit behaviors that unbiased observers would deem to be creative” (Colton & Wiggins, 2012, p. 21).

Although these visions seem to differ on whether the goal of modelling particular processes is to generate specific artefacts or if artefacts will emerge as a consequence of such processes, they share the relevant idea that some processes are being automated or modelled. The kind of processes used and how they are used seems, to us, a relevant starting point to survey the field.

A Taxonomy of Algorithmic Composition Methods
Figure 2 – A Taxonomy of Algorithmic Composition Methods, as originally found in (Fernández & Vico, 2013, p. 519) and reproduced with their permission.

In the classification system Fernández and Vico adopted (Fig. 2) there are two primary areas representing two radically different approaches. Either one can observe how intelligent beings typically create and then model their behaviour or reasoning (artificial intelligence) or one can simply focus on getting good musical results even if a human would never be able to do it that way. These last approaches usually represent processes found in nature and are assumed to be the result of chaos or complex interactions without any intelligence behind. Among these complex systems one can find self-similarity processes (fractals, for instance) and cellular automata.

Among artificial intelligence, one finds processes that intelligent beings use. Those are divided into three main areas: symbolic processes, optimisation and machine learning. Symbolic artificial intelligence relies mainly on modelling the process of creation by the use of simple rules. They usually represent the creation of symbols and their subsequent manipulation and transformation. These approaches depend on the idea that the creative processes can be described in formal terms and reduced to a theory. The problems must be well-defined and their boundaries established, as well as its constants, variables and sets of operations. In this case, the rules must be conveyed by the programmer, who obtained them through the previous observation and analysis of the processes involved.

Optimisation processes derive mainly from the idea that one can depart from very simple units or imperfect samples (members of a representative population) and combine, recombine and eventually transform them through successive trial-and-error operations. The strongest and fittest combinations are selected, while the weakest and most unsuitable ones are discarded, until results start to improve and eventually get optimal. These processes resemble the Darwinist idea of evolution.

Machine learning processes, on the other hand, focus on analysing the outcomes of previous creations. A representative sample is constituted, processed and analysed, and then new predictive or prescriptive rules are inferred by the machine. Using those rules the computer will then be able to generate new works.

Following the example of David Cope (Cope, 2005), we tried to teach the computer to compose like we ourselves would do. In order to achieve that, we tried to learn how some people compose music and vocal sounds associated with fado, then we tried to do it ourselves. We spent a long period learning and perfecting the craft of composing and producing music using computers and virtual instruments. We had to understand not only more analogical tasks, such as how to create melodies from lyrics having just a viola or a guitarra in our hands, but also the process of creating a score using a music notation program. Then, to edit it in a DAW (Digital Audio Workstation), to record, mix and master it, up to the point of obtaining a convincing sound result. Derived from all that experience, we now have a new conception of how to perform and complete the compositional process from scratch. If someone asks us to compose music for a fado, we have a way of doing it. And now we are teaching the computer the most effective way to replicate the way we do it.

By trying to imitate a style and creating an endless repertoire of songs, several goals are met: not only is a relevant problem being addressed (the formalisation of specific style of composition process), but also an attempt is being made at creating something to fulfil a cultural and social need: the endless desire for music. At present there are numerous software applications based in algorithmic composition for people to try out and generate music at their will, like PG Music Band-in-a-Box(5) or Dunn’s Artwonk,(6) as well as leisure and relaxing environments that generate music in real-time with interactive components, not only in computers, but also in smartphones, for instance, Brian Eno’s Bloom or Trope(7) or the more academic ANTracks (Schulz et al., 2009).

4 The Model

Following the ideas of Brian Eno, it seemed to us that the most elegant model is the one that takes the least amount of information possible and is able to extract the maximum output and still retain the core ingredients of the process, the necessary steps in composing a song. That translates into a generative process like the one of planting a seed and then seeing how an entire tree grows out of it (Eno, 1996). This seems to imply strong hierarchical structures and a good degree of self-similarity. Our prior ethnographic and historiographic work and observations of the practice suggest that given a lyrical text (what would be literally a fado) the entire musical work might derive from it using a set of predefined conventions, shared grammars and transformations. This is a starting hypothesis for our model. The idea of a text shaping several other musical parameters is not without precedent (Nketia, 2002; Pattison, 2010). However, there are songs without words, and composers sometimes start from textless melodies (DeMain, 2004; Zak III, 2001). This might imply that, in the compositional process, the semantic content of a text could be less relevant than its structure. As such, one could work with only some structural information, namely the number of stanzas, lines and syllables. Having this seminal information, and the correct rules and transformations defined, then, one would be able to, just through following those rules, obtain a suitable musical work complying with the genre modelled.

Fado model
Figure 3 – A fado model.

The model (Fig. 3) assumes the sequential order through which fados are typically performed, and simulates the various agents and decision processes involved. The point of departure is a text (a fado, in the form of a poem). Fado narratives are usually styled and improvised on top of stock musical structures. At this stage, we are not working with actual lyrics or textual content, although that would be ideal in future endeavours Therefore, the first module is only pretending that there is a text, and, based on a seed, it creates a list of symbols in numeric format. Those numbers represent the relevant information an original text would carry: namely the number of stanzas, lines per stanza and syllables per line. One could argue that in a real text all information is relevant (namely semantic content); however, fado practitioners claim that as long as the metrical structure is the same, the lyrics can be easily interchangeable between the fados. So, while it can still be argued that accents and vowel quality might affect the prosody (and therefore the melody to some extent), the core values that define the importance of a text, in structural terms, are the ones we are indeed retrieving/generating. This is because the number of stanzas determines the number of sections the work will have (hence, the overall form), and the number of lines determines the length of each section (number of melodic phrases per section). The number of syllables of each line determines the rhythms of the melody (constraints the number of notes of each melodic phrase).

A second module simulates the agents involved in the instrumental harmonic accompaniment (either a piano, or, more commonly, a viola and a bass). This module retrieves information from the “text” module (the form and general structure of the sections) and generates suitable harmonic progressions and scales to fit those progressions. It also generates ostinati and bass lines based on those progressions and scales. A third module simulates the singer. This module retrieves the information from the “text” module (namely the number of syllables of each line) to generate the rhythms of the vocal melody. It combines this information with the harmonic information retrieved from the second module to assign suitable pitches, therefore simulating the creation of melodic lines on top of the previously known stock-accompaniment.

A fourth module simulates the guitarra. This module retrieves information on all previous modules and is then able to generate suitable counter-melodies or melodic figurations that simultaneously fit in the harmonic progressions, and at the same time, simulate a reaction to the melodies generated by the singer.

All four of these modules generate what we would call an “Urtext” score, similar to the ones in the corpus. This is a quantised ideal to be performed. This performance implies the repetition of sections and styling. A MIDI file exported at this stage could be printed as a music score and representation of the output from the first four modules.

A fifth module, a pseudo-humaniser, consists of a series of functions that will transform some of the generated parameters to originate variants. The transformations, when applied to the melodies, will simulate the styling of fadistas, therefore slightly changing rhythms, pitches and adding ornamentation. When applied to the accompaniments they might originate pauses, and small rhythmic and melodic variations as well. This will simulate a real performance practice, guaranteeing that virtually any repetition of a section will sound slightly different each time. A MIDI file exported at this stage reflects the performed fado, not suitable to be printed as a score, but perfect to be imported into a digital audio workstation to be recorded with virtual instruments.

A sixth module is idealised as an extension of the fifth module to be implemented at a later stage. It will involve micro-transformations to enhance the humaniser results, namely by controlling “groove” and syncopation in a not random fashion, and also controlling pitch bend messages to be able to generate portamentos, glides, slides and more convincing tremolos and mordentes.

5 Methodology

The observation of the performance practice, the processes it encompasses, and the model obtained determines the methodology to be followed. Since the problem of song composition, in general, and the musics and vocal sounds of fado, in particular, has been modelled as a hierarchical linear generative process, a symbolic artificial intelligence is the logical choice. Each step of the process involves either the generation of symbolic material, its retrieval from another source (a table, a list), or its transformation, via a set of rules. It is either a chain of choices or orders, each one having boundaries and probabilities of happening or not. Therefore, it is mainly a consecutive series of constraint-satisfaction problems.

An inspirational figure in this area, David Cope (1991; 1996; 2000; 2002; 2004; 2005; 2008), uses a borderline approach based on grammars and transition networks; however, he also relies on direct feeding from corpora for style imitation. The strong aspect of his work are the actual results, since most of the works displayed seem well-formed and coherent structurally. They could be the manual labour of any aspiring composition student imitating the style of past masters. Although Cope’s arguments and descriptions are clear, the actual computational processes seem obscure and his algorithms are never made explicit enough for replication. On the other hand, the more recent approach of François Pachet, Pierre Roy and Fiammetta Ghedini, the Flow Machines project (Pachet & Roy, 2014; Pachet et al., 2013), seems promising and more of an actual source of reliable inspiration. Still, their integration with lyrics is not yet done and one is not able to discern how the text influences or shapes the music. Also, their examples of jazz lead-sheets lack the depth of a fully developed song in terms of form and variation. Moreover, their content sometimes resembles just a collage in which their individual components lack the semantic context from which they were originally derived, therefore the ending result losing coherence. The work of Jukka Toivanen, Hannu Toivonen and Alessandro Valitutti (Toivanen et al., 2013), however, seems a good example of what it is pursued: they aim at automatic composition of lyrical songs, and create a module that actually generates text and this text directly influences the musical content. This approach, theoretically, makes perfect sense. The mapping of the textual parameters regarding the musical ones seem consequent regarding what are the actual processes and practices observed. They also imply that the semantic content of the text should affect the mood of the music (according to some arbitrary Western-European conventions), by mapping it to tonality, which seems also logical within many genres. However, their program also lacks the formal coherence of what would be expected in a typical song. There is not a clear phrase structure, neither a complete form defined over time (regarding repetition and variations, the songs just seem run-on), nor does it have a performative layer: the program merely generates quantised MIDI scores.

These detailed examples, two of them still in progress, act as direct inspirations for the model and all of them rely on hybrid symbolic approaches mixing several techniques to solve the constrained-choices along the way, or to generate specific sets of material. Therefore, it has been decided to pursue the same path, learning from their research and trying to integrate in the model the best ideas one could find among them, while discarding, changing or improving the ones thought to not be adequate. Hence, a hybrid constraint-satisfaction method has been applied, in order to be possible to build the main structures of the instrumental music desired.

6 Implementation of the Model

Taking into account prior projects of the experts and references in the area (Cope, 2004, 2008; Nierhaus, 2009), it was decided to implement the model using a Lisp-based language. We started practicing and implementing some algorithms in Common Music, an open source software developed by Henrich Taube.(8) However, after some insipid, yet important, developments we soon reached the conclusion that Common Music was not powerful enough for the desired purposes because it lacked predefined functions ready to deal with a model of tonal music implementation. It would have taken several months to create those functions from scratch. That work, however, had already been done by Tonality Systems and their software Symbolic Composer,(9) also Lisp-based, but with a set of numerous musical functions already predefined which would allow us to save time and resources and concentrate on the concrete problem and work in higher-level detail, thus enhancing the final result greatly. According to the Tonality Systems website:

Symbolic Composer uses two grammars to define the section and instrumentation structures: The section grammar and the instrumentation grammar. Both are tree-like structures with as many elements and levels as needed. Each node contains definitions of class properties. Class properties are inherited with the inheritance trees of the section and instrumentation grammars …. Inheritation mechanism allows to define common properties among instruments and sections.… A section consist of any number of instruments. Each instrument is defined by inherited musical properties called classes. Classes are: symbol, length, tonality, zone, velocity, channel, program, controller, tempo, duration, tempo-zones and groove. Obligatory classes are symbol, length, tonality, zone and velocity. The other classes are optional. Classes can be cross-converted to other classes, or cloned between instruments and sections, and global operations can be distributed over sections. Section has default class properties, which are used if they are not specified for instruments.

During the conception of the model, we were constantly struggling with the consilience of the performance practice emulation with the practice reflected in the representative corpus of musical scores. The initial idea was to somehow build an artificial intelligence that mimicked and represented this corpus in the generation of new scores similar to the existent ones. In order to do that, we have built a program that mainly recombined weighted lists containing fragments, and specific parameter values extracted from that same corpus with little to no variation. It was a simple approach that gave us controlled and predictable results lying within the desire scope, but that lacked inventiveness, the ability of going further. Moreover, it neither expressed algorithmically any particular new idea regarding the methods of composing with computers, nor it was challenging to code regarding the current state of the art. As a consequence of this fact, after the first round of initial programming we decided to explore a second approach and build a second artificial intelligence that, instead of just using lists with weights and data derived from the corpus, actually generates lists from scratch using functions. This second approach was much more challenging, because we actually had to try to formalise the sub-processes of musical composition. The result was a series of abstractions able to generate a whole series of different parameters that were later combined with one another and integrated and nested in the previously designed architecture. Due to the very nature of what an abstraction is, and how a function works, it allows the users either to be very strict and confined, thus providing predictable results, or if they decide to use arguments outside the expected range, it will provide totally unexpected and unpredictable ones. Furthermore, it can be even more unpredictable if one decides to let a randomiser pick what the possible range is and what parameters it may randomise. The point is that, using this approach, we have obtained a very flexible artificial intelligence and anyone can decide how much further one wants to go exploring it.

Each module comprises a series of constraint-satisfaction problems to be solved. As such, having arrived at this point, we decided that the best solution would be the combination of both approaches. Either they can be solved by resorting to a symbolic artificial intelligence based on the statistical data retrieved from the corpus (thus the weighted lists, the fragments) or via a second symbolic artificial intelligence working with formalisations of the sub-process involved (translated into grammars, rule-based algorithms or other iterative procedures). The combination of both approaches is more flexible and unpredictable regarding the results users may have in their minds, increasing the possibilities of interactivity (by allowing them to customise and manipulate more parameters). In order for this to work in a manageable way, we have designed a coefficient of representativeness (CR) that varies between 0 and 100, 100 being total representativeness of the corpus and 0 total inventiveness. So, when the user decides this coefficient is 100, only the artificial intelligence dealing with the corpus generates results. When it is 0, all the parameters are retrieved from the lists generated by the second artificial intelligence. When it is any value in between, this value equates the percentage of probability, for each calculated parameter, to be picked from either of them. Ideally, the default should be something along the lines of 70 if one wants a general, predictable score with a spice of novelty, but it can obviously be changed if the user is willing to take some risks and decides to have a more out of boundaries musical piece.

7 First Module

The first module of the model corresponds to the triggering of a seed that will cause the information retrieval of a simulated lyrical text, in order to constrain subsequent choices and structures.

7.1 Seed

A generative program like the one coded relies heavily on random functions and constrained decisions that depend on seeds to run. There is no true randomness inside the chaotic and random functions, as they all depend on a seed to perform their calculations. Along with this issue comes the fact that, if one wants each and any instance of the program to be reproducible, then they must all be attached to the same seed. Therefore, we have decided to program right at the beginning of the generator a function that generates a variable called “my-random-seed”. Every time the program is called, it will generate a different result. But, at the same time, if the user likes any of the outcomes and wants to replicate it, then they are able to do so by choosing exactly the same seed. So, in other words, each seed represents an instance of a given score, and it is like its own ID number. And the user has total control over it:

(setq my-random-seed (* (float (/ (get-universal-time)

1000000000000)) (rnd-in 100) ))

(init-rnd my-random-seed)

The block of code shown uses the universal clock of the computer to generate a seed. And it has a resolution up to the milliseconds. So virtually every time the program is run it will generate a different score, because the time will always be different. If one likes it, one just has to keep track of the inspector window and take note of the number that will appear there and one can keep that seed for later use. The entire score depends on this seed in order to work properly and it is the initial trigger to derive everything that follows.

7.2 Text

The text is in fact just a simulation of what would be a complete lyrical text and the relevant numerical information about its structure. It corresponds to the parameters number of stanzas, lines per stanza and syllables per lines. These three parameters are positive integer numbers and are stored in variables. This can be done as merely a constrained random generation or they can be defined by the user.

Those numbers, in theory, can be whatever the user envisions, but in the actual practice they are constrained among specific values. The number of stanzas is usually a small number between two and six, four being the most common by far; the number of lines usually also varies between three and ten, four being the most common and five and six being rather common as well, especially since the beginning of the twentieth century; the number of syllables per line varies between four and twelve, seven being the most common (Gouveia, 2010). Given the strophic nature of most fados, in our model these numbers are fixed for the entire composition.

For the purpose of demonstration, we are assuming in the version presented in this article four stanzas with four lines and seven syllables, but all other values can be implemented with some future refining .

7.3 Form

The overall form will impose a hierarchical superstructure containing and constraining everything else that follows. The form of a fado, in particular, and of any song in general, is decided by the lyrical content: namely, the number of stanzas and lines. At this stage we are not dealing with lyrical content in concrete; however, one can deal with its consequences in an abstract way by following its trends. A text with four quatrains in heptasyllabic lines, for instance, can be extrapolated into how one would typically sing such a text. Following Ernesto Vieira’s description of fado, one would sing the first line over a two bar antecedent melodic phrase (a1), and the second line over a two bar consequent melodic phrase (a2). These two lines of text would represent a typical A section, four bars long, on top of a harmonic progression. This A section could be immediately repeated (representing the textual repetition of the same two lines) or it could be followed by a contrasting B section representing the following two lines of the quatrain. So, departing from this simple quatrain one could end up with several possible combinations – AB, AAB, AABB, ABB, ABAB. At this point, and knowing that there are still three quatrains to sing, several options arise: either they would be exact replications of the first one, and, therefore, a typical Ernesto Vieira’s strophic fado: ABAB ABAB ABAB ABAB or AABB AABB AABB AABB, for instance. Alternately, it could be assumed that the first two quatrains were one fado in the minor mode and the last two quatrains could have a different music in the major mode, and so they would represent a doubled version of the same model, and it would be something like ABAB ABAB CDCD CDCD or AABB AABB CCDD CCDD.

Having this in mind, one then realises that even if the concrete text is not known (its semantic content), one can try to model the form of the music based on its structure (number of stanzas, lines per stanza and syllables per line). The simplest way to formalise this is to simply define a list containing the distribution of the sections, each section represented by an alphabetic character. Going one step further, we assume for the model different versions of each section to represent slightly different performances of the same section whenever they are repeated. A different version is obtained by applying transformations to some parameters in order to obtain what can be called variations. The concrete implementation of these variations will be discussed later. This increases variety and models what often happens in performance. We append an integer to each variation to identify them. There is no clear boundary on how many variations one could have. In theory, each instance of any given section is sung uniquely, therefore it could be a variation, resulting in at least as many variations as the total numbers of stanzas. Therefore, the resulting variables assume the form:

\begin{equation*} \text{Section}=Xn, x \in \{ A,B,C,D \dots \}, n \in \mathbb{N}_0 \end{equation*}

A form list is then the chronologically ordered sequence of these variables, reflecting the idiosyncrasies of the lyrical text, completed with the uniquely titled sections “introduction” (Intro), optional “intermezzos” and “coda”. Our example of the four heptasyllabic quatrains following the simplest Ernesto Vieira’s model, without redundancy, would result in the list:

\begin{equation*} F=\langle \text{Intro}, A0, B0, A1, B1, A2, B2, A3, B3, \text{coda} \rangle \end{equation*}

In practice such a list might be doubled, since in many fados the singers repeat the lines. This list represents the superstructure of the piece, its overall form. It is, in fact, the formalisation of the time dimension in a linear fashion.

There are at least three ways of generating such a list. The first one is to let it literally be defined by the user. In a fully developed application this could be a decision mediated by a graphic user interface, in order to simplify the process, based on the structure of the text. The user would just need to provide the number of stanzas, the number of lines and the number of syllables, at their will.

The second one is using the retrieved simplified forms from the corpus and building a weighted list of all reasonable structures. There are 53 different simplified forms in the corpus, and a table with the most common of them is presented below. These are the simplified forms that appear at least three times.

Simplified form
#
AABB 23
AABC 7
ABAC DEDE
5
AABB CCDD
4
ABAB 4
ABCC 3
ABCD 3
Table 1 – Most common forms in the corpus.

These simplified forms can be implemented simply by letting the computer randomly pick one according to its weight, and then subsequently transforming it in order to obtain a full well-constructed form. The first transformation is the decision of how many times the chosen simplified form is to be repeated, which can be done by appending the list to itself the amount of times desired (usually no more than four if one does not want an endless song): $T_1(F) \rightarrow F \cup F \cup F \dotsc$

The second transformation is to concatenate each symbol, of the resulting list, with an integer corresponding to its variation. This integer is randomly picked from another list. For the purpose of keeping the program within a reasonable degree of simplicity, we are generating only three versions of each section which is more than enough for demonstrative purposes:

\begin{equation*} T_2(F) \rightarrow \forall X \in F, X \Rightarrow Xn: n \in \{ 0,1,2 \}. \end{equation*}

However, technically one could have as many variations as there are unique elements in the list to be transformed, so the general formula would be:

\begin{equation*} T_2(F) \rightarrow \forall X\in F, X \Rightarrow Xn: n \in \{ n \in \mathbb{N}_0 | 0 \leq n \leq i : i = \# \langle \forall j, k \in F\rangle : j \neq k \}. \end{equation*}

The third transformation consists in appending the Intro section to the beginning of the list, and the coda to the end, which is just a trivial list manipulation operation:

\begin{equation*} T_3(F) \rightarrow \langle \text{Intro} \rangle \cup F \cup \langle \text{coda} \rangle \end{equation*}

The result could look something like this:

\begin{equation*} F =\langle \text{Intro},A2,A1,B1,B1,A0,A2,B1,B0,A0,A1,B1,B2,A0,A0,B2,B0,\text{coda} \rangle \end{equation*}

Notice that the variation indexes are random and do not follow any ordered criterion; the point being that there is not a hierarchy concerning variations, since they are all equally valid versions of a given material, therefore the users are oblivious to which one is the presumed “original” and which one are the variations.

The third way to generate a well-constructed form is formalising an abstraction that randomly generates a simplified form. A well-constructed simplified form can be implemented as:

  1. Define the total number of sections as $i$, which is a positive integer: $ i \in N $
  2. Create the empty list $F$ and the empty set $S$: $F = \langle \rangle ; S = \{ \} $
  3. Define $p$ as the length of the unique elements of $F$ plus 1:
  4. $p = 1 + \# \langle \forall j, k \in F : j \neq k \rangle $
  5. Append the alphabetic character in position $p$ to set $S$:
  6. $S = S \cup \{ X_p \} : X \in \{ A, B, C, D,\dotsc \}$
  7. Append a randomly picked element from $S$ to $F$: $F = F \cup \langle s_n \rangle : s \in S$
  8. Repeat 3 to 5 $i$ times

This will generate a simplified form that can be subsequently transformed into a well-constructed form using the same procedures exemplified in the simplified form derived from the corpus.

In the generator we implemented both solutions and one or the other is triggered, depending on the coefficient of representativeness.

7.4 Section

Each section can be seen as a hierarchical class that has one relevant time-domain property: length. This length can be one single value, but often it is a set of them, each one corresponding to a bar unit, mirroring the typical divisions of notated music. Therefore, its implementation takes the form of a list of lengths:

\begin{equation*} \forall \, \# \text{Section}_{Xn} = \langle \text{bar unit}_1 , \text{bar unit}_2 , \text{bar unit}_3 , \dotsc \rangle \end{equation*}

This definition formalises the whole-form of the song, encompassing its total time span and constraining all the other time-dependent elements hierarchically below this level. The value these lengths assume depends, of course, on the song and style to be modelled, and can be fixed or variable. They can be randomly generated or defined by the user. Its implementation depends largely on the problem to be solved. In the specific case of fado, and following empirical data retrieved, it is assumed that each section corresponds to four bars long, each one with a length of 2/4, and this has been implemented as such:

\begin{equation*} \forall \, \# \text{Section}_{Xn} = \langle \frac{2}{4} , \frac{2}{4} , \frac{2}{4} , \frac{2}{4} \rangle \end{equation*}

Modifying these values could radically change the final results and the entire perception of the practice.

7.5 Example of a Single Section

At this point, and while we may be getting ahead of the formalisation of the other structural elements, we must stress that in terms of organisation of the narrative it is important to explain how the hierarchical structures were actually implemented in code within Symbolic Composer. While the formalisation of the model allows it to be implemented in other languages and probably using other kinds of nesting, the software Symbolic Composer that we have used to code the generator has its own architecture of predefined objects and classes, simplifying some of the low-level operations. After the overall form is decided and one has all the sections needed, it is possible to build a modular frame based on this superstructure that can be used as a template for any other different superstructure in the future. We assumed each section as a basic module to support all other parameters inside.

Therefore, we will exemplify how we have made a template defining one single section representing four bars of a dummy song and how it can be similar for all subsequent sections and variants.

Inside this template section we will explain the global parameters that will apply to all instruments in this section. They are called “default” parameters and in this case the parameters are “zone” and “tempo”.

“Zone” is defined as the total length of the section. In this example, we are assigning four 2/4 bars for the A section, but it can be any length one wishes, depending on whatever genre one is modelling.

“Tempo” stands for at how many beats per minute the song will be played. One can decide to go for a strict tempo, or let the computer decide it randomly within a certain range acceptable for that section. “Tempo” is a list of values, each value corresponding to a zone. If one has a list with four zones and a list with four different tempos, the computer will use a different tempo for each zone, but one can assign just one tempo and it will play the same tempo within all the four zones.

Then we need to define the instruments. In this example we initially just used the right hand of a piano, which we have called “voice” (it creates the melody) and left hand that we have simply called “ostinato” (it creates the accompaniment), very much like the scores from the corpus. One can create as many instruments as one desires.

Inside each of these instruments we have defined our local parameters; this means the parameters that only affect a particular instrument or voice.

“Tonality” stands for the scales and chords that were used in this section. In this example we have decided to base everything in scales and chords without accidentals and then use a variable as a transposition factor to contemplate the other possibilities.

“Length” is the rhythm of each voice, and is basically a list of durations. Ideally the sum of these durations should match the length of the previously defined zone. If it is shorter, the rhythm will loop until that value is exhausted, if it is longer, then it is truncated.

“Symbol” is a list of alphabetic characters and accidental symbols that will be associated both with the chords and scales defined in tonality (and thus becoming pitches) and with the durations defined in length, becoming notes. As with previous parameters the number of elements in the pitch list should match the number of elements in the length list for ideal correspondence. In case of a mismatch, the shorter list will be repeated and mapped onto the other until it exhausts all of its elements.

“Velocity” is a list of velocities, ranging from 0 to 127, associated with each note generated by the previous parameters. In order to obtain maximum correspondence and efficiency, the number of elements of this list should also match the number of elements both in symbol and length.

“Program” is the timbre of the instrument. Usually it is a value between 0–127 that corresponds to a list of the available sounds according to the synthesisers available. In case of a generic MIDI set, the value 0 stands for a piano.

“Channel” is the MIDI channel associated with that particular instrument, ranging from 1 to 16, and number 10 is typically reserved for percussion.

So a template in Symbolic Composer, like a blank canvas for generating one section, is something that is coded like this:

;;section A

(def-section A

default

zone

tempo

voice

tonality

length

symbol

velocity

program

channel

ostinato

tonality

length

symbol

velocity

program

channel

)

This template was the same for any section or sub-section we ever needed to compose. We found it very reasonable, because it employs pretty much the same parameters one would use to compose with pen and paper or with a notation program.

After this stage, one just needs to know exactly what is desired inside each section and either provide or generate the material for every possible parameter listed. We can fill in the values for an archetypical basic section A.

Inside section A, we were trying to generate our first four bars of music, meaning we were in fact combining sub-sections a1 and a2 regarding some of the parameters. Therefore the zone has to be a list that adds up to four bars of 2/4. Notice it can be any length one wants, so it can be easily adapted to any other genre or variation one may want in the future. Notice how a list is coded as simply a bracketed set of numbers preceded by an apostrophe (’)

'(2/4 2/4 2/4 2/4)

Then, there were many options for tempo. For the sake of simplicity we could have simply pointed out to a single value. Another option was to have an entire interval available, therefore the best way was to tell the computer to generate a random integer and add a value to it. Because tempo needs to be a list of values (even when it has only one value) we needed to tell the computer to make it a list.

(list (+ (rnd-in 12) 72))

Then we started to deal with the instrument “voice”.

For the sake of simplicity, in this example one is assuming a tonal melody that is going to be built on top of an archetypical T|D|D|T harmonic progression. The melody will be built on top of a major scale. To be in a range compatible with a voice it is best to centre it around octave 5 (in Symbolic Composer the first symbol of octave 5 is one octave above central C, midi pitch 60). So we have assigned:

(activate-tonality (0 major c 5))

Note that it can be any tonality or tonalities one wants in any range desired. It can be even detuned tonalities. One can further explore this at one’s will or even change the score later for any other scale one wants and see the differences.

The next step is to define the rhythm of the melody. There are at least two approaches, the first is to assign a list of specific durations, either totally determined, or picked up from a set of different choices. The second approach is to have a generative function that, based on algorithms, creates a list of suitable durations from scratch. For this example we will just assign an archetypical fado rhythm, however, other options will be explored further ahead:

'(1/8 1/8 1/16 1/8 1/16 1/8 1/8 -1/8 1/8 1/8 1/8

1/16 1/8 1/16 1/8 1/8 -1/8 1/8)

Notice how the durations fitting four entire measures are being defined, which means the same two-bar scheme repeated twice. Notice also the negative values, which represent pauses. Finally, notice as well that the way the structure was built a typical pick-up note is missing because it pertains to a previous section (an introduction for instance), and how the last notes following the pause are actually pick-up notes for the following sections.

The following step was to generate the pitches of the melody, so symbols were needed. Each symbol is basically a letter representing each element of the scale previously activated in an ordinal position, relative to the tonic defined. In this case, the scale was C major starting in the octave 5. As such, the first symbol “a” represents the first element of that scale, which is the pitch class C5. The second symbol “b” represents the second element of that scale, which is the pitch class D5. Negative symbols work the same way in the opposite direction: a symbol “-b” represents the pitch class B4 and a symbol “-c” represents the pitch class A4. As such one can define a melodic contour using a string of symbols that can then be imposed on any scale desired.

There are countless ways of generating a list of symbols that will make more or less sense depending on the processes one is trying to model. In this dummy example, and just to sparkle and foster ideas, we have used a random-walk generation around the centre of the scale :

(find-minimal (vector-to-symbol -d d

(gen-noise-Brownian 4 my-random-seed 0.5)))

We essentially used a mathematical function that comes with the program “gen-noise-Brownian” (there are many others) that generates a random walk, a singable contour, based on Brownian noise. Using another preset function “vector-to-symbol” we mapped this numeric contour into symbols within the range “-d” up to “d” (so around the centre of our scale that would be “a”), and used another preset function called “find-minimal” to remove all repetitions while still preserving the general shape of the curve. Running this set of nested functions once, we have obtained the following result:

--> (a -c -b -d -c -b d c d b c a b a)

This is a perfectly singable undulating line. One can try out with different values, seeds and nested combinations and see what happens. One can try out some of the other mathematical functions that come with the program, like “gen-pink-noise”, etc. More possibilities and a discussion about the ones that make more sense and are actually used in the generator are explored further ahead.

Then we just needed to define the velocities. A velocity is a number between 0 and 127 that represents the rate at which a MIDI note is sounded, representing how loud a note sounds. For this dummy example, we have opted for another kind of randomisation, using again a Brownian-noise generator:

(vector-round 64 100 (gen-noise-Brownian 5 0.75 0.75))

Again, there are countless ways of doing this, some making more sense than others. This is just one of many. One can play with these values and see what happens. We were just happy with these results after trial and error. The generated values will be all between 64 and 100, so, all in the range of mezzo-forte. A more sophisticated algorithm for dynamics used in the generator will be explored in its own section.

Then, keeping the example simple, we assigned the default instrument piano to the voice, which is program 1. We told it to play on channel 1 as well. One can assign any number between 0–127 for instrument – 74, for instance, is a general MIDI flute.

program 1

channel 1

And this concludes the definition of the first four bars of melody. As one can see it is pretty much free within the confines of some constrains: a melody with a tonal sound around a range of one octave around C5 and with an archetypical rhythm will be obtained.

Then we needed to define an ostinato to accompany the melody. For the ostinato we decided exactly what we wanted. We defined the same zone so the two layers will match. The tonalities employed (since we needed the T|D|D|T movement in the major mode) were the appropriate chords – C major and G7 major – complemented with the appropriate voice leading inversions provided. Therefore:

(activate-tonality (2 ch-maj c 4) (0 ch-7 g 4)

(0 ch-7 g 4) (2 ch-maj c 4))

We have centred the ostinato around octave 4 so it does not clash with the melody and it is in the appropriate lower register. We have provided a defined length, since it is a stock march/fox rhythmic pattern. This list represents the durations as we would write them on paper or a notation program. Notice how we just need to write out two bars of the ostinato, even if the zone is four bars long, since it is assumed that, in case where the list is shorter, it will loop until it exhausts the time.

'(1/8 1/8 1/8 1/8 1/8 1/16 1/16 1/8 1/8)

The same is true also for the symbols, which in this case are all pre-determined.

'((-12 a) bcd -b bcd -c bcd bcd -b bcd)

Notice the logic of construction around the symbols – the “-12” attached to the first symbol indicates to play it twelve semi-tones below, forcing it to be a bass note, as expected in this kind of ostinato. The cluster “bcd” means that these three symbols are played as simultaneously as possible.

We kept the same solution for the velocities as we did in the melody: let them be somewhat random:

(vector-round 64 100 (gen-noise-Brownian 5 0.75 0.75))

For the program we kept the piano, but we wanted this hand to be played on another channel. So we assigned it to channel 2.

program 1

channel 2

At his point the coding of a four bar long section A is finished. It looks like this:

(def-section A

default

zone '(2/4 2/4 2/4 2/4)

tempo (list (+ (rnd-in 12) 72))

voice

tonality (activate-tonality (0 major c 5))

length '(1/8 1/8 1/16 1/8 1/16 1/8 1/8 -1/8 1/8 1/8

1/8 1/16 1/8 1/16 1/8 1/8 -1/8 1/8)

symbol (find-minimal (vector-to-symbol -d d

(gen-noise-Brownian 4 my-random-seed 0.5)))

velocity (vector-round 64 100

(gen-noise-Brownian 5 0.75 0.75))

program 1

channel 1

ostinato

tonality (activate-tonality (2 ch-maj c 4) (0 ch-7 g 4)

(0 ch-7 g 4) (2 ch-maj c 4))

length '(1/8 1/8 1/8 1/8 1/8 1/16 1/16 1/8 1/8)

symbol '((-12 a) bcd -b bcd -c bcd bcd -b bcd)

velocity (vector-round 64 100

(gen-noise-Brownian 5 0.75 0.75))

program 1

channel 2

)

As one can easily understand, if we wanted to proceed and define a section B, all we had to do was to copy-paste the same template and change the desired parameters, and repeat the process as many times as sections needed in order to complete the piece.

An efficient way to automate this process is to consider this example as an archetype and to define an abstraction that generates as many sections desired, from scratch, just changing the relevant parameters.

Having presented a way of implementing sections, nested inside the superstructure of form, one can now understand the hierarchical dimension of the model. While it has a vertical dimension of six modules representing all the agents involved, in their orders of dependency, one also understands that it has a horizontal dimension of one section at a time, and how each section seems to be independent from each other. The implementation of the following modules consists of generating the materials for these agents in a modular fashion. Generating the material for a hypothetical section A, and then repeating the entire process for a section B, and again for a section C and so on.

8 Second Module

In this section we will detail how we have formalised the harmonic layer and which assumptions and algorithms were used.

In order to formalise any accompaniment or melodic process, the harmonic environment has to be defined, and represented in symbolic terms. Fado musics are tonal and, as such, tonality has to be modelled. That kind of formal exercise has already been done several times, namely in the work of Elaine Chew, who presented a mathematical model for tonality and its description in formal terms (Chew, 2014). We have also drawn inspiration from the work of Martin Rohrmeier (2011), who proposed a generative syntax of tonal harmony.

The predefined objects and classes in the software that we have used shaped how the problem was formalised, in the sense that we were building our reasoning on top of them. Tonality, in Symbolic Composer, is a class defined as a foundational set of audible frequencies: $Tonality_x=\{freq_1 , freq_2 , freq_3 , \dotsc , freq_n \}$. Therefore, a tonality is often equated to a scale from which the pitches will be derived. These sounds can be virtually anything, and are not constrained by any boundaries. Users can literally build artificial sonic worlds by defining their own tonalities. While it is possible to define a tonality in terms like:

\begin{equation*} Tonality_{example}=\{245\,hz , 277\,hz , 389\,hz , 411\,hz , 447\,hz \} \end{equation*}

For the most part, this is not convenient. Alternatively, by using a morphism it is possible to map frequencies into a pair of symbols such that the first one is an alphabetic character corresponding to a pitch and the second one is an integer corresponding to an octave. An octave establishes a principle of equivalence such as an increase of a unit corresponds to the doubling of the frequency. And within an octave there is room for twelve equidistant different pitches. The 128 available MIDI pitches are defined this way, with a reference frequency of A4 = 440 Hz. At present, this convention is widely used and there is no need for further exploration and formalisation in the scope of this article. It suffices to say that a tonality can be also defined, using this convention equivalence, in terms of a scale, such as:

\begin{equation*} Tonality_{example}=\{C3, D\#3, E3, F\#3, G3, A\#3, B3\}. \end{equation*}

In the particular case of fado, as observed in the practice and from the examples of the corpus, there are two tonalities involved and their possible transpositions within the twelve tone system. They correspond to the major and minor modes, since fado is a part of a tonal tradition in the Western sense of the term. Software limitations prevent us from actually defining different ascending and descending versions of a scale, as a melodic minor mode would require. Since scales, in this case, are merely sets of frequencies (or pitch classes) in abstract, melodic movement can only be dealt locally and contextually, requiring more information. As such, three different minor sets are defined so we can switch between them, locally, as needed:

\begin{equation*} Tonality_{Major}=\{C3, D3, E3, F3, G3, A3, B3\} \end{equation*} \begin{equation*} Tonality_{Minor}=\{C3, D3, Eb3, F3, G3, Ab3, Bb3\} \end{equation*} \begin{equation*} Tonality_{Harmonic}=\{C3, D3, Eb3, F3, G3, Ab3, B3\} \end{equation*} \begin{equation*} Tonality_{Melodic}=\{C3, D3, Eb3, F3, G3, A3, B3\} \end{equation*}

Two modifier functions were also considered to account for all possible transpositions, and to allow the definition of either local or global modulations:

\begin{equation*} f:\forall n \in Tonality_x \rightarrow n = (n+t+m):\{t, m \in \mathbb{Z} | [-6,6]\} \end{equation*}

In this way, the sonic domain for fado is defined and constrained in a modular fashion. It could be argued that not all transpositions have the same weight, and indeed within the corpus there are some degrees that are never present. Therefore, the value of $t$, formalised above, could be further constrained to reflect these weights. At present we do not feel that need, since in the current practice, and among skilled performers, transpositions are often chosen based on the vocal range of the singer and do not reflect the original tonality the composer has chosen. Moreover, the lack of use among some tonalities often reflected either the easiness to play them in the chosen instruments or the mere inability of amateurs to perform them, it did not reflect any intrinsic aesthetic decision regarding the musical content. Since these works are to be performed by the computer, and the computer has no difficulty in performing them in whatever transposition assigned, then it makes no sense to artificially constrain them. The modifier $m$, however, is constrained, and only makes sense in very specific cases. Not all modulations occur in the practice, and often this modifier is contextually attached to the harmonic progressions used, namely to generate secondary dominants, which is something to be developed further ahead.

While it is now possible to derive all possible pitches for the harmonies and melodies from this foundational base, it is clear that this structure, by itself, is a huge determinant in shaping the final sounds and in the characterisation of the practice. It seems also obvious that a simple change in these definitions can radically alter the final result. After some experimentation we dare to say that entirely different practices and genres might be achieved simply by changing this parameter alone.

The problem of automatic harmonisation is now considered a well-defined problem, especially when seen from the typical viewpoint of generating logical, yet inventive, chords to accompany a given melody. One of the most recent papers by Pachet and Roy has shown that “standard Markov models, using various vertical viewpoints, are not adapted for such a task, because the problem is basically overconstrained” (Pachet & Roy, 2014). Their proposed solution consists in implementing their technique of Markov constraints to fioriture (melodic ornamentation, defined as random walks with unary constraints). The end results are claimed to be more effective and musically interesting.

While these approaches seem promising, they all rely on the idea that there is a base melody per se to be harmonised. However, in the case of fado, what one usually sees in performance is the reverse: there is a prior harmonic structure, on top of which the text is sung. So, while the rhythm of the melody and its general contour derive from constraints emerging from the lyrical structure, the pitches are mostly derived from the scalar structure of the underlying harmonic progression. Hence, one needs another kind of approach to solve this problem.

If the overall form is a foundation to structure the span of time and the internal recurrences of patterns inside a fado, the harmonic foundation reflected in the scalar structures, is the foundation of each section of fado regarding the way the pitches behave. By choosing a scale, derived from the tonalities assigned, one is able to derive pitches that will be coherent with the melody, accompaniment and counter-melodies, within context, and will make the existence of the piece possible.

8.1 Harmonic Progressions

We made a list containing all the relevant harmonic progressions found in the musical scores and their respective weights.

A chord is formalised as an object corresponding to a set of symbols that is defined using four attributes (or modifiers): quality, root, octave and inversion.

\begin{equation*} Ch_x = \langle Q, r, o, i \rangle \end{equation*}

The quality is a set that specifies the amount and relation between the symbols that compose the chord. A major chord is predefined as a triad in which the interval between the root and the second symbol is of four semi-tones, while the interval between the root and third symbol is of seven semi-tones, for instance:

\begin{equation*} Q_{maj} = \{0,4,7\} ; Q_{min} = \{0,3,7\} ; Q_7 = \{0,4,7,10\} , \text{etc.} \end{equation*}

There are numerous possible qualities for chords (diminished, augmented, extensions …) and all may be formalised in the same fashion.

The root of a chord is a modifier that assumes the form of an alphabetic symbol relative to the pitch of the defined tonality that is active at the time the chord is called. Hence, in the case of a scale C major in octave 3, a root “a” or “-a” would correspond to pitch C3, a root “b” would correspond to pitch D3, and a root “c” to E3. Negative values work in the opposite direction, as such, a root “-b” would correspond to pitch B2 and a root “-c” to a pitch A2, etc.

The octave of a chord is also another modifier, which is an integer corresponding to the octave in which the root is to be placed. The inversion is yet another modifier (a movable rotation), corresponding to an integer, which will determine which element from the ones available in the quality set will correspond to the first note. As such, in a C major scale in octave 3, a root “b”, and a quality “minor” (defined as $\{0,3,7\}$), would give back pitches “D3, F3, A3”. So, an inversion “0” would mean “D3, F3, A3”; an inversion “1”, “F3, A3, D4”; “2”, “A3, D4, F4”; “3”, “D4, F4, A4”. As always, negative values work in opposite directions, so, an inversion “-1” would mean “A2, D3, F3”; “-2”, “F2, A2, D3”; “-3”, “D2, F2, A2”, etc.

A harmonic progression can be thus formalised as a list of chords:

\begin{equation*} HP_x = \langle Ch_1 , Ch_2 , Ch_3 , Ch_4 , \dotsc \rangle \end{equation*}

As an example, the formalisation of a typical Tonic | Dominant | Dominant | Tonic progression can be done as:

\begin{equation*} \begin{aligned} HP_{T|D|D|T} = \langle \langle \{0,4,7\}, c, 3, 2\rangle, \langle \{0,4,7,10\}, g, 3, 0\rangle, \\ \langle \{0,4,7,10\}, g, 3, 0\rangle, \langle \{0,4,7\}, c, 3, 2\rangle \rangle \end{aligned} \end{equation*}

This specific case reflects the formalisation of a progression of I6/4 | V7 | V7 | I6/4, in the third octave. In this example, the tonic chord is presented in the second inversion, so that its bass note is a common note with the following dominant.

Most of these modifiers, instead of being fixed, can be assigned to variables. The modifier “inversion” can be further modified by an offset variable, to increase variability. As an example attaching a fixed offset to every inversion in each chord, could be easily done as such:

\begin{equation*} \begin{aligned} HP_{T|D|D|T} = \langle \langle \{0,4,7\}, c, 3, (2+i)\rangle, \langle \{0,4,7,10\}, g, 3, i\rangle, \\ \langle \{0,4,7,10\}, g, 3, i\rangle, \langle \{0,4,7\}, c, 3, (2+i)\rangle \rangle : i \in \mathbb{Z} \end{aligned} \end{equation*}

Defining the variable $i$ as “+ 1”, would cause all chords to be offset by one inversion, so that the tonic chord would be presented in “third inversion”. In this case, that would be the fundamental state again (one octave higher though), while the dominant chord would go to the first inversion, which would still create a very acceptable voice-leading context with minimum finger movement (approaching what a human player would do in a real instrument).

The implementation of the harmonic progressions, as seen in the corpus, can be done as weighted lists. To optimise resources, all possible progressions are defined as a sequence, in which the position in the list identifies the progression.

\begin{equation*} Possible\; HPs = \{HP_1 , HP_2 , HP_3 , HP_4 , \dotsc \} \end{equation*}

Each defined progression has scales associated with it, reflecting both the tonality that is active and a modifier offset for local modulations (concrete values for the parameter $m$). For instance, if in a given progression there is a secondary dominant, this means that this progression involves two different versions of a given scale, therefore the modifier has to change accordingly. If a progression has borrowed chords from the relative, then it involves two different scales. As such, parallel to the sequence of possible harmonic progressions a sequence of lists of scales is also defined, respecting the same order, so that their position matches.

\begin{equation*} Scale_x = \langle Tonality_x, t, m \rangle \end{equation*} \begin{equation*} S_x = \langle Scale_1, Scale_2, Scale_3, \dotsc \rangle \end{equation*} \begin{equation*} Possible\; Scales = \{ S_1 , S_2 , S_3 , S_4 , \dotsc \} \end{equation*}

Then, further sets for each section are built containing the weights, in the same order the progressions and scales were defined.

\begin{equation*} Weights\; for\; Section\; Xn = \{w_1 , w_2 , w_3 , w_4 , \dotsc \} \end{equation*}

As such, when one wants to assign a progression and a scale for a section $Xn$ there is a probability $w_n$, constraining both the choice of $HP_n$ and $S_n$ simultaneously.

\begin{equation*} \forall \; w_n\; \exists \; HP_n , S_n : (HP_n , S_n) \Rightarrow Xn \end{equation*}

Implementing the progressions in this modular fashion means one only has to define them once, and can alter them at any stage. It is also easy to change the weights among each section, alter them, nullify some of its elements (meaning that in any given sections, some progressions will never be picked up, even when they are defined) or force them by choosing very high values. This also means one can actually make them vary with time, if one decides to define the list of weights as variables instead of fixed values. The same can be said about the progressions and scales themselves, since they are defined as symbolic data. In this way, instead of needing to create a complex generative function to act as a second artificial intelligence to deal with the coefficient of representativeness, we just had to make a second set of weights with a more unpredictable behaviour This way, the program might either pick up progressions and scales from the set of weights that fully mirror the corpus, or pick up progressions and scales drawing values from the unpredictable set of weights, hence creating more adventurous musical results.

The option to use weighted lists instead of other methods relies on the fact that we believe this best represents the process that actually happens in the practice. People do not generate progressions out of thin air; instead, they seem to rely on previous implicit knowledge, in a kind of mental scheme of already known progressions that work within the tradition. This is done either by copying the records, other musicians or by resorting to other songs they already know. It is not by chance that harmonic progressions in the musics of fado also depend on the education of the musicians. Schooled ones like Reynaldo Varela, Alain Oulman or Jorge Fernando(10) often employ more complex progressions, since they are borrowing them from other traditions, namely from the erudite music or jazz universes. Therefore, by using weighted lists this previous knowledge of which progressions work best and which not (also according to their temporal context within the repertoire), its distribution among the overall population, and the choice to pick them, is modelled.

8.2 Ostinati

The analysis of the corpus and the observation of players have shown that the accompaniment layer in fados is mainly comprised of ostinati: a stock rhythmic pattern that repeats itself over and over during a certain period. Sometimes it goes along the entire fado, but other times it just spans through some sections, having other contrasting ostinati in the other sections. In order to mimic this layer, we have studied the kind of ostinati that are more often used. Most of these ostinati are recurrent figurations vastly used in other performative traditions and already have designations and categories by which they are known and there are already some established rules regarding the way players should approach them. One can find popular march, Alberti-bass-like figurations, habanera, cuban bolero, fox, and hybrid versions between them, among several others.

The first approach regarding this problem was to make a weighted list of all the relevant patterns observed and then let the computer pick one, because this models what happens in the actual practice: players just chose one figuration from a previously known group. The formalisation of these stock patterns was done by literally expressing the archetypical stock figurations as music fragments, as shown (Fig. 4).

Ostinati
Figure 4 – Ostinati.

In order to facilitate the process, each ostinato was broken in two parts, representing what happens in the real practice: a bass layer and a harmonic chordal layer. Often, the bass and chordal layers are played by the same instrument and they do not overlap each other, but other times they can be separated, especially when there is a bass instrument present. Moreover, the chordal layer usually has notes deriving from the chords while the bass layer may present additional scalar material. Both layers were expressed as lists containing the explicit durations and symbols of the patterns in their atomic form. The atomic form is the minimum fragment that needs to be provided so it can be looped in order to generate the complete accompaniment. In order to be looped correctly, the total duration of this fragment should match the total duration of a section or at least be divisible by it without a remainder. In the case of fado, since the sections are four binary bars long, equivalent to eight quarter notes, this implies fragments with the length equivalent to two, four, or eight quarter notes. Both the durations and symbols are represented in lists in which the position matters, since it is this linearity and correspondence that allows each value to be matched with the others at a later stage.

\begin{equation*} \begin{aligned} Ostinato\,Chord\,Durations_x = \langle dur_1 , dur_2 , dur_3 , dur_4 , \dotsc \rangle , \\ m(\#Ostinato\,Chord\,Durations_x)=\#Xn : m \in \mathbb{N} \end{aligned} \end{equation*} \begin{equation*} Ostinato\,Chord\,Symbols_x = \langle Ch_1 , Ch_2 , Ch_3 , Ch_4 , \dotsc \rangle : Ch_n \in HP_n \Leftarrow Xn \end{equation*} \begin{equation*} \begin{aligned} Ostinato\,Bass\,Durations_x = \langle dur_1 , dur_2 , dur_3 , dur_4 , \dotsc \rangle , \\ m(\#Ostinato\,Bass\,Durations_x)=\#Xn : m \in \mathbb{N} \end{aligned} \end{equation*} \begin{equation*} Ostinato\,Bass\,Symbols_x = \langle s_1 , s_2 , s_3 , s_4 , \dotsc \rangle : s_n \in S_n \Leftarrow Xn \end{equation*}

These lists comprise the rhythms and symbols, abstracted from these patterns. They have to be mapped into any given section $Xn$, so they will retrieve the information regarding which harmonic progressions and scales are active at that point, and that will allow the symbols to be relatively converted into suitable pitches.

To be implemented at a later stage we considered a function that would, somewhat randomly, generate pauses within the ostinati, and so approaching a live performance and the freedom the players have when they actually perform them. One of the main concerns in increasing the realism of this function is to make sure the pauses will neither happen in the strong beats of each section, nor be so many that they will prevent the listener from actually recognising the ostinati as such. This approach concerning the accompaniment has the advantage of being totally representative of the corpus and coherent within the tradition. It also handles very well the problems of voice-leading, since by having concrete lists of symbols to pitches, usually one automatically assigns the most commonly used chord inversions at each stage, which actually are more natural to the way a player would approach them. The model of Toivanen et al. (2013), creates accompaniments in many styles, like Alberti bass and many other chord patterns, which, we believe, can be similar to our own. They have used a model of voice-leading that chooses chord inversions based on a minimal total movement (smallest sum of intervals of simultaneous voices).

Our second approach to this issue was to consider building a second artificial intelligence, triggered according to the coefficient of representativeness, consisting of some functions that would generate ostinati and rhythmic patterns from scratch. The symbols derived from the harmonic progression would then be assigned to those durations and new and inventive ostinati would be obtained. This approach has the advantage of generating out-of-the-box rhythmic patterns and expands the possibilities for diversity and for unexpected results making the outcome more interesting. However, this approach also has the problem that it might generate patterns that lack coherence regarding voice-leading, the way a player usually performs them. It also may present the problem that, even within the designed constraints, it may generate ostinati that might not be recognisable within the tradition, therefore jeopardising the common expectations of what a fado should sound like.

9 Third Module

In this section we will detail how we have coded the melody and which assumptions and algorithms were used.

There are several ways to approach the problem of generating a melody. According to previous research on the topic, in most cases the authors adopt a modular strategy in which the main focus is on two parameters: rhythm and pitch. Usually the first task is to generate a rhythm, a pattern, and then assign pitches to that same rhythm. Since the melody in fado is intimately connected with lyrics and prosody, our main references were the state of the art references dealing with the generation of melodies in the context of songs. Therefore, we studied the approach proposed by a team of Finnish researchers who developed the prototype M.U. Sicus-Apparatus (Toivanen et al., 2013). This program generates art songs, writing lyrics first and then composing music to match the lyrics. The program consists of two modules: one that writes the lyrics based on user input of a theme and a second one that receives an emotional valence and composes the music. In order to generate the melody, the program first generates a rhythm based on the prosody of the already written lyrics. It breaks down each word into syllables and then assigns a rhythmic element with as many notes as there are syllables. Those rhythmic elements come randomly from a set of commonly found patterns in art songs. Then, the program randomly chooses a scale coherent with the emotional valence previously provided and generates a harmonic progression based on a second-order Markov chain with common progressions found in diatonic Western classical music. It also assigns time values, in a probabilistic manner, for each chord generated so that the overall length of the accompaniment structure will be consistent with the rhythm of the melody. After this stage, pitches are assigned to the rhythm of the melody. The underlying harmony defines a probability distribution for pitches which can be used, and the subsequent note is found by the means of a random-walk favoring small intervals (Toivanen et al., 2013, p. 89).

Another approach to melody generation and edition was the one presented by Young-Woo Jeon, In-Kwon Lee and Jong-Chul Yoon using a noise function, specifically Perlin noise, because it “is bounded, band-limited, non-periodic, stationary, and isotropic” (Jeon et al., 2006, p. 164). These traits would make the noise function ideal because, while it preserves a decent amount of randomness, it still is controllable and presents somewhat predictable results to the user. Mainly, Jeon et al. (2006) present the idea that Perlin noise can generate a melody from scratch (if mapped into pitches) or be mapped into an already existing melody to change some of its elements while still preserving its original shape. This concept can also be extended to any other parameter, like timing information, tempo or dynamics. The smoothness of the curve that noise can produce, under certain values, is ideal for altering any parameter originating what one would call a “humaniser”, because in the final outcome it produces small deviations similar to the ones a human would introduce when performing a certain task, since human behaviour is slightly erratic and imperfect. Jeon et al. (2006) also present the idea that after the noise is applied to anything it can then be further processed to revise some things that might get off-place or that the user desires to be constrained, for instance, certain melody notes on key points to be determined instead of purely random. They conclude that:

noise function makes the modification of an existing melody easy and fast. The modified melody shares the basic form of the original melody, but the details are randomly different. The extreme changes that can be caused by white noise are not found in the modification because of the ideal band-limited feature of the [Perlin] noise function. By noise editing, it is possible to control the composition or modification process precisely to reflect a composer’s intent in the work. (Jeon et al., 2006, pp. 167-168)

In our own explorations with the software Symbolic Composer we have noticed how several functions based on noise already came predefined and ready to use, namely generators of white, pink, red or Brownian noise. Also, another set of generators based on other types of curves and waves like triangular, for instance, or even real time readings of astronomic patterns and walks like solar wind or flux particles. We consider that the same ideas Jeon et al. (2006) present in their paper are suitable for our own use; it is just a matter of, again, picking up the suitable generator and deciding its seed and steepness, and one is able to achieve desirable results. We have found that Brownian noise and its graphic curves are suitable for melodic generation the way they, themselves, generate contours similar to the ones a human voice can do. A number of combinations of more than one noise are able to generate virtually all the melodic contour archetypes found in European Folksongs and described by Huron (1996; 2008), and thus be successfully applied in algorithms with the intent of either melody generation or creating melody variation on previous melodies. Also, we have pushed further into noise exploration to also make them suitable for creating variations or simple humanisation on several parameters like rhythms, tempo, dynamics, etc.

Our approach to the problem of melody generation was to model what we have found in the data analysed and to try to replicate those results. Basically, when one is composing a melody for a fado, one is really setting up a lyrical line on top of a harmonic progression set over the span of two bars. The first task was to decide how to divide and break the total span into smaller figures, thus creating a rhythm. And, according to the data, nearly almost any combination is suitable, there is really no clear pattern on what might happen. Surely, the quantised melodic phrases found in musical scores tend to favour regular and simple divisions, like beats with quarter notes, a group of two eight notes, or a highly recurrent syncopated figure of two sixteenth notes with an eight note in between them, found mostly in the second beat of the odd bars. However, triplets are frequent and dotted figures and other combinations also arise. Furthermore, even when they are not notated, they eventually show up later in the performance and can effectively be heard. The conclusion is that one can expect virtually any subdivision of a measure into smaller figures as long as it consists of a reasonable number corresponding roughly to the number of syllables of the lyrical line. And even that is just a rough estimate, because sometimes a singer can prolong the same syllable over two or more different notes. Therefore, taking into account the typical heptasyllabic line, it would make sense to hear any subdivision consisting of seven to ten notes. Since in this version of our model there is no concrete textual line to which one is referring, there is no prosodic reference to be coherent with in terms of accents, for instance. Therefore, the issue is not so much the apparent chaotic randomness of such freedom, but rather the consistency of the recurrence of this same pattern once generated. This is so because what the lines have in common is their metric regularity. As such, even if one does not know exactly how a particular melodic rhythm was generated, because of the lack of a concrete text, one knows that the quantised version of such rhythm has to be repeated throughout the sections, simulating a consistent stanza.

Having this in mind, our first thought was to initially have actual weighted list of the most common rhythms of the melodies found in the corpus. Having these lists and weights, the artificial intelligence would simply pick up a list comprising a rhythm that we knew made sense because of it being used successfully in the past. This solution (data-based) would eventually model what some recent amateur performers do when they literally imitate the records. However, this lacks inventiveness. Moreover, contrary to what happens with harmonic progressions and ostinati, such an approach actually does not reflect a good practice. In a good practice, fadistas style their melodies on the spot, each performance being unique, this being one of the most defining traits (Nery, 2010b, p. 451; Gray, 2013, pp. 144–145). As such, using quantised versions of past performances is not an elegant solution. Hence, we took another approach and decided to code an abstraction that would generate an acceptable list from scratch (thus, a rule-based approach), trying to model the behaviour when the rhythm of the melodies is generated on the spot.

This particular list is important because its quantised version represents a base pattern that will give consistency to the entire work in a cross-sectional fashion. As discussed before, the rhythm of the melody is important because of its constant recurrence with small variation (the non-quantised versions), within the same work. Therefore, after we have generated it once, we have a structural foundation that we can use to build upon or further manipulate in order to create variations, and in this particular case, extending beyond one simple section.

Each line from the text results in a melodic phrase to be distributed along two binary bars (Ernesto Vieira’s model discussed in Section 2). The singer starts by singing a line of the text, with the syllables of each word resulting in notes contained within the bars. The textual line invariably runs out of syllables and is completed in the first half of the second bar. The singer has to expressively cue the end of each line, stop, breathe, and then to begin another line of text. This process is repeated ad nauseam. The process of singing lines of narrative text in sequence in European Portuguese and distributing them over two bars in a way that an audience understands the narrative and its segmentation leads to the even bars containing less notes than the odd bars. Therefore, in order to efficiently generate the rhythm of one line, we broke down each line into its component parts. A typical line comprises pick-up notes, followed by what we will call “rhythm-head” (the first bar) followed by a “rhythm-tail” (the second bar). Typically the “rhythm-tail” contains longer rhythmic durations in order to end the line.

The second beat of the second bar is, in fact, the pick-up of the following bar:

\begin{equation*} Rhythm_{line} = \langle \langle Rhythm_{pick-up} \rangle , \langle Rhythm_{head} \rangle , \langle Rhythm_{tail} \rangle \rangle \end{equation*}

Each of these rhythm components can be generated by slicing a predetermined total duration (bar length, single beat, multiple beats) by a random integer. Usually, smaller numbers are preferred, since one is usually dealing with already short durations. Two, three, four and eight will yield more realistic results stylistically, while five, six and seven generate more complex rhythmic structures. Integers above nine things often yield unrealistic results. The second step involves taking the individual rhythmic slices and concatenating some of them into larger units in order to obtain durations of unequal length. Obviously, the sum of the durations of all resulting slices remains equal to the original available length of roughly 1.5 bars.

\begin{equation*} Rhythm_n = \sum \{ [\#(Rhythm_n)/j] \times k_i\} : (\sum k_i) = j; j,k \in \mathbb{N} \end{equation*}

This process of slicing durations and redistributing the slices can be refined if applied to smaller hierarchical units when desired. For instance, since “rhythm-head” represents an entire bar, which in the case of fado comprises two beats, it can make sense to actually apply the process separately to each beat, instead of the whole bar. Also, no matter which integer $j$ has been used to slice the “rhythm-head”, it makes sense for it to be a smaller value when applied to “rhythm-tail” so it generates relatively longer durations, and one actually obtains a tail effect.

Once all those sub-rhythms have been generated and joined into a single list comprising what would be the rhythm of a line, one might notice that all durations are positive values and represent notes. This is not realistic, since every line has to contain a few pauses (as also seen in the corpus). One of them, in the case of fado, is always determined and occurs in the beginning of the second beat of the second bar, corresponding to the end of the tail and the beginning of the pick-up notes. Therefore, for the sake of simplicity, the pick-up rhythm is implemented as a quarter-note duration slot, including an eighth-note pause, followed by the actual pick-up note or notes. Experience demonstrated that in most cases only a single eight-note is used as a pick-up into the subsequent line. A small weighted list suffices in this particular case. All other rhythms might include additional pauses that can be implemented using a random function to switch any of the available durations to a negative value.

Once one has obtained the rhythm of the melody of one line (a list of durations and pauses adding up to two 24 bars, in fact): $Rhythm_{Xn} = \langle d_1 , d_2 , d_3 , \dotsc , d_n \rangle$, the next step is to assign pitches to each one of those durations. Again, there are several ways to approach this problem. The most obvious and consistent way of assigning pitches to durations is to rely on the underlying harmonic progression. The harmonic progressions for the ostinati are already defined in the second module, so the most logical step would be to assign pitches derived from the chords used in each beat by means of a weighted function. By doing this one obtains a melody that will sound consonant, however, lacking fluidity and contour. The second approach is to apply only this principle to hierarchical relevant durations (those which stand in the first tick of each beat, for instance) and then fill in the gaps with other notes from the scale, but not necessarily from the chord. These notes will behave as passing tones. The problem with this approach is that it might simply generate non-realistic intervals and strange contours between the notes, so one has to refine this reasoning through the means of that constraint: use tones from the scale but that are stepwise apart from the previous chord tone assigned in a random-walk fashion.

Another approach is to simply rely on templates and formulas. Based on the data collected from the corpus, there are already a series of melodies known to work, that sound good and realistic and that, in fact, represent the corpus. So it is possible to obtain a supply of reliable melodies simply by having an archive with sequences of pitches that work together and represent different types of melodic contours and then using a weighted function to randomly pick some. Then, they can be transformed using a second weighted function to add slight variety to the final result. The outcome of this method has the enormous advantage of being somewhat predictable, aesthetically pleasing and stylistically consistent with the context of fado. Additionally, it has the extreme disadvantage of being a small remix of already existing material, thus not increasing the span of possibilities by that much.

However, we believe the best approach is to try to mimic what happens in real performances with real singers, in which a melody is improvised over a harmonic progression. With this approach the singers are relying on the knowledge of the underlying harmonic progression and (primarily) the scale associated with it, respecting the mode to which it pertains. The precise way in which the pitches are chosen among those constraints is incredibly complex and derives from an entanglement not yet fully understood by us. There are obvious relations between the semantic and syntactic content of the lyrical text, since the speech contours should influence the melodic contour. Not only that but the language in which the text is written also influences the pitch. It is known that the pitch of speech is a cultural construct based on the pitches people hear around them when they are infants. It is a function of the linguistic community (Deutsch et al., 2009; Dolson, 1994). Those dimensions are impossible to model in our current state. Moreover, the physical constraints of the singers also limit the available choices, since some performers have wider ranges with more theatrical performances while others prefer to sing in an almost spoken form thus, narrowing the overall range of their melodies. Through acknowledging these difficulties and limitations we have opted, in this case, to try to simulate this complex behaviour using the ideas proposed by Jeon et al. (2006) and using noise as a central concept in shaping the melodic contour.

For each of the chord progressions listed previously used to generate the ostinati, we have built a second list of all the possible associated harmonic scalar progressions, indexed in the same position. Therefore, it is possible to use them to retrieve the pitches for the melody. These scales, however, should also be modified and further constrained by an offset variable in order to build coherent contours and avoid strange intervals, influencing the voice-leading.

\begin{equation*} \begin{aligned} Melody_x = \langle \langle Scale_1 , o_1 \rangle, \langle Scale_2 , o_2 \rangle, \langle Scale_3 , o_3 \rangle, \dotsc \rangle: \\ \forall\, s_n \in Scale_x, s_n = s_{n+o} : o \in \mathbb{Z} \end{aligned} \end{equation*}

The offset variables $o_i$ are local transformational devices in the form of a rotation implying that all elements on a scale change their positions back or forward accordingly. This transformation is used to force the first symbol in any given scale to be the one desired, in order to condition the future mapping into pitches.

As an example, the progression T|D|D|T in the minor mode, “i|V7|V7|i”, the first chord “i” is best associated with the “natural-minor” scale, starting in the first symbol, with an offset 0. The second chord “V7”, however, is best associated with the “harmonic-minor” scale. This is so because the second note of the chord “V7” does not exist in the “natural-minor” scale; it is part of the harmonic or borrowed from the major mode. Also, it would be better starting with an offset of either 1 or -1, influencing the voice-leading so that the first symbol of the scale is the nearest one matching a chord tone. Following this reasoning, the third chord, again “V7”, and thinking of how a melody might typically evolve – arch shaped up and not down – could be best associated with the same “harmonic-minor” scale but with an offset of 3 or 4, the nearest notes of the scale that are chordal tones.

\begin{equation*} \begin{aligned} Melody_{T|D|D|T} = \langle \langle \langle Tonality_{minor} , t, m \rangle, 0 \rangle, \langle \langle Tonality_{harmonic} , t,m \rangle, 1 \rangle, \\ \langle \langle Tonality_{harmonic}, t, m \rangle, 4 \rangle, \langle \langle Tonality_{minor}, t, m \rangle, 0 \rangle \rangle : \\ \forall\, s_n \in Scale_x, s_n = s_{n+o} : o \in \mathbb{Z} \end{aligned} \end{equation*}

Of course, some of these are subjective decisions and in a way influence the voice-leading and the general melodic contour, but none of these decisions are final. They all can be refined at any moment, and any of these offsets (or all of them) can be set to variables. The point is that, having a harmonic progression based on scales constrained by the chords, one is now fully apt to apply a contour to it to generate suitable pitches.

In order to generate such a contour, a function that generates a vector based on Brownian noise ($PSD \propto 1/{f^2}$) with 257 samples was used, in order to have a considerable shape while avoiding repetition. Then, the vector is scaled to the length of the actual “rhythm-head” list, and mapped to symbols within the desired range.(11) This allows having the precise length needed, while retaining the overall shape of the contour. The operation is repeated to generate a second vector that is applied to the “rhythm-tail” list. Both vectors are appended and the result is a list of symbols, whose length is the same as the list of durations, and that has a realistic contour:

\begin{equation*} Melody_{Xn} = \langle s_1, s_2, s_3, \dotsc , s_n\rangle : n=\#(Rhythm_{Xn}) \end{equation*}

When that contour is applied on top of the corresponding harmonic scalar progression the symbols are converted to pitches that actually make sense within the context of the chords being played in the ostinato. Therefore, the melodies will always sound consonant, respecting the tradition involved.

Modifying the range of the symbols or the samples of the vectors can greatly influence the result and the quality and realism of the melodies. Using other noise functions and hybrid functions that generate graphical contours can have similar effects. The limit is one’s imagination, and the code is flexible enough to allow very easily the refinement of the abstraction.

10 Fourth Module

In this section we will detail how we have coded the counter-melody and bass layers and which assumptions and algorithms were used.

The counter-melody is a layer that often appears in the practice of fado. However, it is often omitted in early examples from the corpus. The more complex sample scores include similarities to counter-melodies and it roughly corresponds to what a modern guitarra player does, by filling in the spaces in between the melody. In order to mimic this layer, we have prepared a set of stock figurations based on the figures found in literature, but also on a set of observations we did on our own, observing players and listening to records. These stock figurations are based on arpeggiations, small scalar figures or even motifs drawn out of the melody. If one uses Ernesto Vieira’s model as a base, then they usually occur in the middle of the second bar, right when the voice is beginning to end a line, and end in the downbeat of the following bar, when a new line has just started (Fig. 5).

Guitarra figurations
Figure 5 – Typical guitarra counter-melodic figurations.

In terms of formalisation, since they apply mostly scalar figures, we had to prepare a weighted list of possible stock figurations, much like the one we have done for the ostinati, just with the difference that, instead of associating them with the chordal harmonic progressions, we have associated them with the scalar harmonic progressions used for the melody, to allow passing tones and other non-chord tones. In a way, they are fragments of pre-composed music selected more or less at random and having variations. A way to refine this process is, once again, to code an abstraction that generates short motives based on small vectors with Brownian (and similar) noises. Of course, this later approach may originate more unexpected results and sometimes results that do not conform with the tradition. The best approach, in our opinion, is a weighted combination of both solutions.

The bass line is covered in this same section because the principle is the same. The bass line in fado practice is usually redundant, since most of the times one does not have a bass present and the bass is contained in the ostinati performed by the viola (already described in the second module). However, the solution found for ostinati only allows chordal material and, although this is enough for most of the earliest scores, and probably reflects the simpler forms of the practice, the truth is that often the bass also performs scalar material, sometimes performing like a walking bass when connecting sections or even in the same ostinati patterns like marches and foxes. In these cases, one can really say that the bass presents counter-melodic material. So we have decided to take this into account and prepare a layer that most of the time performs the same bass notes as the viola, but it is linked to the scalar harmonic progressions and also performs stock scalar figurations, namely alternating ascending and descending motives.

11 Fifth Module

Every musical score omits several parameters that are to be completed by the performer within the cultural practice, context, society and audience expectations. Fado is no exception, and for a musical score to be a convincing display of this practice, that human factor ought to be present, so the document generated should not be played neither too randomly, neither strictly and mechanically as is often the case with MIDI realisations. The automatic performance of quantised MIDI scores has been studied in the Royal Institute of Technology (KTH) and models for human performance are presented in the studies of Friberg and colleagues (Friberg et al., 2006; 2000). These models present and implement a series of rules (hereby referred as the KTH performance rules) inferred from the empirical study of real performances, namely how the performer shapes all audible parameters such as tempo, sound level, or articulations. It has been found that each note suffers micro-deviations relatively to the notated value, not only due to human imprecision, but also as systematic variations, implying things like long notes tend to be lengthened and louder and short notes to be shortened and softened; the higher the pitch the louder the note; notes in uphill motion have a decreasing duration, etc.

We have thought to include in the generated musical scores something that would model the major characteristics inherent to human performance, such as articulations that our observation has shown to be important cues to perceive the genre, and that we can define as “gestures” – some particular legatos, mordentes, tenutos, suspensions, portamentos, and so on. This can be done with an extra layer of transformative functions built on top of the quantised musical score generated. We see them as a “humaniser” that adds some kind of swing and deflection on the exact values provided by the MIDI score. We will explain with examples how the humaniser was formalised and implemented and how it can be refined at any time during the process.

The primary function of the program is to generate fragments of music, semiographic schemes, much like a normal score generated by pen and pencil would present. The problem with this approach is that each instance of the scheme would be presented exactly as the previous one, thus becoming dull and repetitive. In human performance the same does not happen, as each instance of a semiographic scheme is varied. Therefore, and in order to have the best of all worlds, the way to have consistency and coherence among the score is to effectively have these semiographic schemes repeated over and over again in each section, instead of constantly generating new bits of music. However, each time they are repeated something about them also has to change. Therefore, we formalised a series of abstractions, inspired by the KTH performance rules, that generate a little variety in each scheme and thus allow us to create variations regarding the same scheme.

For any given section $X0$ in the form, transformations are applied originating slightly varied sections $Xn$. Further transformations built on top of $Xn$ might be applied, originating more varied sections:

\begin{equation*} f:T_{variation}(X0) \rightarrow Xn \ \end{equation*} \begin{equation*} f:T_{variation}(Xn) \rightarrow Xm: n \neq m \end{equation*}

The best way to take advantage of the implementation of these functions is to use them as transformations to create alternative sections for each base section created. Therefore, when a section $A0$ is generated, it has corresponding lists of rhythms and lists of symbols. The section $A1$ would have the same lists affected by some of these gestural functions. The section $A2$ could have the same base lists affected again by the same or other gestural functions to randomly distort them in a different way. The ornamental functions can also be used cumulatively in order to create more variety.

11.1 Rhythmic Variations

If one observes the text, the prosody and the defining base of the sections, one understands that the rhythm of the melody is one of the anchors present in every section of every work. Each rhythm is a list of values that has a fixed total length. Therefore, the only way to increase variety within this parameter is to slightly skew the values within the constraint of this fixed length by stealing some duration to some notes and adding it to others. We have created three functions to so do. The first one, “length-emphasise”, emphasises a rhythm by a factor, keeping the length intact within resolution, implying that 0% – almost no distortion, 100% – maximum practical distortion to avoid smaller values becoming void. This function is implemented as follows: the absolute value of each duration is powered by a factor (a variable between 0 and 100) divided by 33.4 plus 1. This will mean that each duration is powered by a number varying between 1 and 4. Not only that, but each duration is also added up to a random micro-number varying between 0 and one tenth of itself:

\begin{equation*} f: \forall \, r \in Rhythm_{melody},\, r_i \rightarrow q_i = r^{[1+(n/33.4)]}+x : x \in [0, r/10], \, n \in [0,100] \end{equation*}

Then all values will be divided by a common factor that is the result between the sum of all original durations divided by the sum of all the new durations. This will guarantee that the length of the list of the transformed durations will be the same as the original durations:

\begin{equation*} f:\forall \,q, q \rightarrow q \times (\frac {\sum q_i}{\sum r_i}) \end{equation*}

Depending on the factor and original durations, sometimes micro-values might be originated in this process. Therefore, it is suggested that the final values might be quantised to a minimum of a 32nd note, for instance, and the micro-remainders for such quantisation operation to be added or subtracted to the larger value present in the list to guarantee the final list total durations is equal to the original list. This algorithm will create distortions in each duration in such a way that in practical terms the longer values will become even longer and the shorter values will become shorter.

Secondly, we have created the “length-homogenise” function that performs the opposite procedure, meaning that the larger values become smaller and the smaller ones become longer, thus making the lists more “uniform”. Finally, in a similar fashion, “length-distort”, randomly steals durations from some values adding them to others, distorting a rhythm by a certain factor, in a non-uniform way, keeping the length intact within the desired resolution. With a wise combination of these three functions, we have a useful resource to obtain variations of the semiographic rhythmic schemes.

11.2 Melodic Variations

Another way to obtain variation is to slightly change some of the symbols from a list that pertains to a melody. Symbolic Composer already has such built-in functions, so there was no need to build them. Mainly the idea is to use noise to modulate any list, by slightly increasing or decreasing some of its elements at random:

\begin{equation*} \exists \;s_n \in Melody_{Xn} : s_n \rightarrow (s+t)_n : t \in [-2,2] \end{equation*}

Thinking idiomatically about the piano, and how some fados might be played on a piano, one of the most common resources in the literature is the repetition of the same melodic motif played in octaves as a reinforcement. Such transformational function can be implemented as well, simply by concatenating each symbol, from a list of melodies, with itself displaced by a variable:

\begin{equation*} \forall \;s_n \in Melody_{Xn}, s_n \rightarrow s_n * (s+t)_n : t \in \mathbb{Z} \end{equation*}

In this case, if the variable is twelve, then this will generate a list of dyads in which each element is an octave apart from the original. Modifying the value of the variable originates different kinds of dyads resulting in other interesting effects.

11.3 Mixed Variations

Another class of variations can be obtained by simultaneously changing pitches and durations. Although more difficult to implement, this is not impossible. We decided to start by creating an abstraction for one of the most common gestures heard in fado, namely performed by Amália Rodrigues: the mordente. We define the mordente as the quick alternation between a note and the diatonic note immediately above. Sometimes, in highly melismatic singing, a chain of mordentes is performed. The algorithm consists in dividing the duration of the note in four parts, knowing that the first quarter is the original note, the second quarter is assigned to the diatonic note above, and the third and fourth quarters are again the original note. This specific function is already implemented in Symbolic Composer as “mord”, but to be assigned to a specific note. It has been improved in a way to randomly assign it to a whole phrase. Since the rhythm of the melodies and the symbols of the melodies have been defined as paired lists with the same length, such task can be performed by randomly choosing specific positions and applying the algorithm to those positions in both lists:

\begin{equation*} \begin{aligned} f:Mordenter = \exists\; d_n \in Rhythm_{Xn}\,, s_n \in Melody_{Xn} : \\ d_n \rightarrow \langle \frac {d_n}{4}, \frac{d_n}{4}, \frac{d_n}{2} \rangle, s_n \rightarrow \langle s_n, (s+1)_n, s_n\rangle \end{aligned} \end{equation*}

The abstraction coded, “mordenter”, receives as an input a list of rhythms and a list of symbols and outputs a mixed list. Then the auxiliary functions “rhymord” and “melmord” allow the rhythm and symbols lists to be split. Both lists are equal to the original ones, except they now have random mordentes assigned to them.

12 Global Parameters

12.1 Dynamics

Dynamics in MIDI files are usually implemented as a combination of velocities and volume and they are associated with a given note (therefore a pitch and a duration). Since the durations and pitches have been implemented as lists of equal length, the velocities follow the same principle and therefore are implemented as such: $Velocity_{Xn} = \langle v_1, v_2, v_3, v_4, \dotsc \rangle$

The model of Toivanen et al. (2013) creates dynamic markings in the music based on a coefficient of arousal retrieved from the lyrical content. This coefficient is calculated according to the proportion of substituted words in a given line (p. 90). We have decided that, although one does not have a text, one has the conventions from the practice associated with emotional prosody, from which it is possible to extrapolate the dynamics as a function of durations and symbols, following the reasoning presented in the KTH performance rules (Friberg et al., 2006; 2000). Therefore, the velocities are defined as a function of the product of durations and symbols.

The conversion is not direct, since some mapping or scaling has to occur first. Symbols are expressed as alphabetic characters, therefore, one has to guarantee that all symbols are prior mapped into an integer between 0 and 100. This operation is trivial and in Symbolic Composer can be done applying the predefined function “symbol-to-vector” and then specifying the desired range:

\begin{equation*} f:\forall\; s_n \in Melody_{Xn} \rightarrow N_{Melody_{Xn}} : N \in [1,2,3,\dotsc, 100] \end{equation*}

Furthermore, all durations have to be positive, so their absolute value is taken instead:

\begin{equation*} f:\forall\; d_n \in Rhythm_{Xn}, \exists\; v_n \in Velocity_{Xn} : v_n=|d_n| \times N_{Melody_{Xn}} \end{equation*}

And finally, one can also constrain the range of results to realistic velocities by scaling the possible range to values between 64 and 100 instead of 0 to 127, for instance:

\begin{equation*} f:\forall\; v_n \in Velocity_{Xn} \rightarrow Vel_{Velocity_{Xn}} : Vel \in [64,100]. \end{equation*}

This last step largely depends on taste and on the genre to be modelled and can be modified at will. These values we decided for the melody (all within the mezzo-forte range) can be, of course, modified at any time.

12.2 Tempo

Opposing all other parameters discussed so far, which are generally local parameters and have to be generated and assigned for each melodic line, in each possible instrument, tempo is a global parameter. This means that in each section it affects all instruments and lines being performed. The first and most obvious solution is to simply assign one simple value for tempo that affects the entire piece or section. This is what often happens in quantised scores. A more realistic approach, also based on the KTH performance rules, however, is to have tempo dealt in a local way.

In the program, tempo is not fixed and its fluctuation is defined as a function of other factors, essentially trying to emulate what happens in a real performance. According to the conventions of emotional prosodic styles, relying on text, and as a general rule of thumb, the higher the pitch is, the longer the duration and, consequently, the louder the velocity and the slower the tempo. Conversely, the shortest rhythms and the lower pitches are usually sung quieter and also slightly rushed. So, having this general idea, the tempo fluctuation was formalised as the inversion of the product of durations and pitches:

\begin{equation*} T_{max} - T_{min} \Leftarrow (Duration_{Xn} \times Pitch_{Xn} )^{-1} \end{equation*}

For the implementation of this idea a base tempo is set up in the initial settings. Since 72–84 bpm was referred as the tempo range acceptable by the standards of Ernesto Vieira’s dictionary (Vieira, 1890, p. 55), we decided to remove one third of the top value to find a minimum tempo of 56 as a base. Using the same 12 bpm range, our minimum tempo will vary between 56 up to 68 bpm: $T_{min} = 62\pm6$. The maximum tempo is defined as four-thirds times this value (reverse-engineering the one-third fluctuation ratio): $T_{max}=4/3\times T_{min}$.

Having these references as a fluctuation threshold, then inside each section the tempo is mapped according to the inversion of symbols and durations. Each symbol is converted and mapped into an integer between 100 and 1. This can be done in Symbolic Composer using the “symbol-to-vector” function:

\begin{equation*} f:\forall \; Symbol_{Xn} \rightarrow N_{Symbol_{Xn}} : N \in [100,99,98,\dotsc, 1] \end{equation*}

Each duration is converted to an absolute ratio (to avoid negative values), and the product of these two values is then proportionally mapped into a value between our previously defined minimum and maximum tempos. This mapping can be coded in Symbolic Composer using the “vector-round” function:

\begin{equation*} f:\forall\; (|Duration_{Xn}| \times N_{Symbol_{Xn}}) \rightarrow T_{Xn} : T \in [T_{min}, \dotsc, T_{max}] \end{equation*}

The outcome is a list of tempo values, one for each note. Having such a detailed tempo on the note level, and reflecting the direct effect of the other parameters, makes the performance fluid and organic. Notice how the formula applied is exactly the inverse of the one used in dynamics. We consider that dynamics and tempo have an inverse relationship in this practice. The exact coefficients are yet to be determined with more study and experimentation, however, this is not within the scope of this article.

12.3 Groove

Cumulatively with the tempo and velocities that are derived from the durations and symbols (and symbols in a certain way are already derived from the rhythms), we also consider that groove can be defined as a function of inverse of velocities, much similar to tempo.

The principle is similar to the one applied earlier – the list with the velocities is mapped onto values, in this case a range between -1 and 2, which are divided by an extremely high value, in this case 256, in order to obtain really small durations:

\begin{equation*} f:\forall\; v_n \in Velocity_{Xn} \rightarrow g_{Xn} : g \in \frac{[-1,2]}{256} \end{equation*}

This resulting list of really small durations can in fact be considered a pattern of small deviations, to be applied within the section:

\begin{equation*} Groove_{Xn}=\langle g_1, g_2, g_3, \dotsc \rangle. \end{equation*}

We have experimented with several values and, for instance, the use of divisions for 64 or 128 gave a sense of hiccup or uncanny to the performance. The formula on groove can be refined, as all others, given more time, since studies have been made (see Naveda et al., 2011; Sioros et al., 2012).

13 Templates for Rendering the MIDI Files

In order to listen to and record the generated music, several options are available. The most recent version of Symbolic Composer has a system that integrates the software MIDI Trail, which allows following the score in almost real time with an interesting graphic user interface, and using internal soft synths. It also has a connection to LilyPond, so it generates a score for immediate visualisation(12). The score poses problems, since the gestural parameters rendered by the humaniser can create strange rhythmic symbols due to lack of quantisation and constant tempo changes, rendering the score unreadable. It must be noticed that a score is prepared to be read as a semiographic simple scheme, therefore, it is only suitable if the gestural parameters are not applied to it, which is only possible with a dumbed-down version of the generator (only the first four modules active).

Another option was to rewire the software to Mainstage (Apple) and create a template with the number of MIDI tracks needed. We find timbre to be an extremely important parameter, one that is crucial in defining the practice of fado, as seen in all relevant literature. Unfortunately MIDI scores are very poor in dealing with this particular parameter. Therefore, taking advantage of recent sample libraries released, namely Alfama(13) by Adamastor Virtual Instruments, we have rewired both the melody and counter-melody tracks to be played by a sampled guitarra, with the harmony and bass tracks to be played by a sampled viola. This sampled guitarra allows us to recreate the iconic timbre that most of the time is a valuable cue in identifying the practice. We have also pre-mixed and pre-mastered the tracks, setting up the equalisers, compressors, reverbs and the additional plug-ins we desired. The file can then be saved as a template. The final step in the process is for Symbolic Composer to generate new MIDI files. We can listen to them immediately and save them as high-quality mp3s. If we are not happy with this MIDI file, we can make changes. Or we can just select the tracks and delete them, switch to Symbolic Composer and evaluate the program again. It will generate a new MIDI file that replaces the previous one.

14 Evaluation

An important issue when considering the production of any system is its evaluation process. The most suitable way, since music is a human phenomenon, is to present the final result to human listeners. Cope (2005), however, discusses the problems that arise when listeners know, a priori, that it is computer generated music which can influence a listener’s judgement of the results (pp. 345–359). To avoid this issue, a kind of Turing test could be a suitable solution. The music produced could be presented to random listeners who theoretically would not be aware that the music they are hearing is automatically generated and take into account their feedback. If the listeners recognise what they are listening to as genuine examples of tunes within the practice pretended and are pleased or positively surprised by the results, then one may be able to say that the system accomplished relevant goals. However, we acknowledge the limitations of any aesthetic evaluation system, because each listener will ostensibly have different values. Even from within the same culture, tastes can differ greatly, and so we aspired to also have a more objective goal of evaluation than to just rely on human ears. As such, it occurred to us that we could try to evaluate the output of our system using a supervised classification system. If we are modelling the musics present in a practice and we have an encoded representative corpus, one possible goal is to know how similar the newly generated recordings are to the original ones. Hence, the devised method translates into training the classifier with the entire corpus of original (human) composition recordings plus a large sample of the world’s musics. Next, we ask the trained software to classify the newly generated fados. If most of the newly generated recordings are classified as fado, and not as any of the other classes, then one has a guarantee that the generator has produced music that clusters with the music of the original fado corpus rather than with the other musics representing the world. Of course, this still does not tell us anything about its closeness to many other classes not present in the database, but it is a promising start. By applying this procedure, one is able to have an evaluation of the system based on objective and reproducible standards.

14.1 Automatic Evaluation

We generated 200 MIDI files using our system, in the MIDI file format.(14) 100 were generated using only the four modules version, what we can call “Piano fados” resembling the ones in the database. But we have also generated 100 fados using all modules, what we can call ensemble fados “for viola and guitarra”. An edited musical score drawn from one of the Piano fados is presented in Fig. 6.

MIDI File
Figure 6 – A trimmed excerpt of an automatically generated MIDI file.

This score was trimmed to assure legibility, yet still preserving the influence of the humaniser. The actual file repeats the formal sections several times and presents additional variations of the same sections. In this condensed version, the introduction is presented in the first system, two versions of a section $A$ in the second and third systems and two versions of a section $B$ in the fourth and fifth systems, all segmented by double barlines for clarity. The sections are well delineated, following Ernesto Vieira’s model. Comparing the sections between themselves, one can see how the generated material is consistent in observing logical patterns of recurrence. For instance the melodic material between bb. 5–6 and 9–10; 7–8 and 11–12; 13–14 and 17–18; 15–16 and 19–20; and the rhythm of the melodies between sections. The harmonic progressions are consonant with the melodies. Furthermore, the melodic material constantly presents slight changes both in durations and pitches to simulate what would be a human styling of a fixed semiographic scheme. The melody of the section $B$ sometimes appears duplicated at octave simulating an idiomatic rendition in a piano (compare $B$ with $B'$). If such a piece was to be played by a pianist, the score would have to be quantised and further edited to resemble Fig. 1.

In order to control bias, we decided to look up the internet for recorded MIDI files based on piano reductions. We have found the site of Doug McKenzie,(15) from which a random sample of 25 recordings was downloaded, all supposedly from “jazz-related” genres.

We have decided to use Cory McKay’s software Bodhidharma(16) as our classification tool based on its usability, ease of the learning curve, flexibility, feature selection, completeness and diversity of the corpora already present, and ability to import data in MIDI format and export results instantly into spreadsheets. Looking at the results obtained in a Symbolic Genre Classification competition, the 2005 MIREX Contest,(17) Bodhidharma seemed also the best option taking into account the available systems. Bodhidharma uses a hybrid classification system “that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour [kNN] and neural network-based [NN] classifiers are used, and feature selection and weighting are performed using genetic algorithms” (McKay, 2004, p. 3). The classification taxonomy comprises nine root categories, eight intermediary and several child categories for a total of 38 unique classes, to which we have added the fado one. This set of world musics has 25 recordings for each class leading to a total of 950 MIDI files, plus the 100 fado MIDI files we have added. This classification taxonomy is presented in Tab. 2.

CountryRapWestern Classical
Bluegrass
Contemporary
Traditional Country
Harcore Rap
Pop Rap
Baroque
Classical
Early Music
 Renaissance
 Medieval
Modern Classical
Romantic
Jazz Rhythm and Blues Western Folk
Bop
 Bebop
  Cool
Fusion
 Bossa Nova
  Jazz Soul
 Smooth Jazz
Ragtime
Swing
Blues
 Chicago Blues
 Blues Rock
 Soul Blues
 Country Blues
Funk
Jazz Soul
Rock and Roll
Soul
Bluegrass
Celtic
Country Blues
Flamenco
Fado
Modern Pop Rock World Beat
Adult Contemporary
Dance
 Dance Pop
 Pop Rap
 Techno
Smooth Jazz
Classic Rock
 Blues Rock
  Hard Rock
  Psychedelic
Modern Rock
 Alternative Rock
 Hard Rock
 Punk
 Metal
Latin
 Tango
 Salsa
 Bossa Nova
Reggae

 

Table 2 – Classification taxonomy with 39 classes.

According to Cory McKay:

the expanded taxonomy developed here … made use of an amalgamation of information found in scholarly writings on popular music, popular music magazines, music critic reviews, taxonomies used by music vendors, schedules of radio and video specialty shows, fan web sites and the personal knowledge of the author. Particular use was made of the All Music Guide, an excellent on-line resource, and of the Amazon.com online store. These sites are widely used by people with many different musical backgrounds, so their systems are perhaps the best representations available of the types of genres that people actually use. These two sites are also complimentary, in a sense. The All Music Guide is quite informative and well researched, but does not establish clear relationships between genres. Amazon.com, in contrast, has clearly structured genre categories, but no informative explanations of what they mean. (McKay, 2004, pp. 81–82)

The discussion of musical categories vastly transcends the scope of this work, so we are using these classes as they have been presented. For a more in-depth discussion of musical categories we refer to the work of Fabian Holt (Holt, 2007).

After training Cory McKay’s software with the 1050 MIDI files and 39 different classes, we supplied the newly automatically generated 100 piano fado recordings, plus the 100 ensemble fado recordings and the 25 piano jazz recordings. We asked Bodhidharma to classify all these recordings. The results give a very positive evaluation of the output of our system: 99 of the 100 piano fado recordings were indeed classified as fado (therefore, 99% precision). 77 of the ensemble fado recordings were classified as fado, as well (therefore, 77% precision), while 16 were classified as “unknown”. Three gave error and were impossible to be classified at all. Only 2 of the jazz recordings from the control group were (wrongly) classified as fado, most of them being classified into jazz related genres (cool, bebop). So, on average, a total precision of 88%.

These results show that the output of the generator clusters are well within the original corpus, instead of dispersing along the other recordings sampling the world’s musics. Moreover, these results show that even the generated fados in the form of ensemble still retain more characteristics of fado than from other classes. On the one hand, the relevant number of 16 recordings classified as “unknown” and three recordings impossible to be classified at all, show that refinement is possible; on the other hand, we consider this to be a good thing, since we also intended for the system to innovate in some way. Hence, some fados not being recognised as such, but at the same time not being recognised as any other available class, can be seen as a possible creative deviation or a pathway for innovation. The control group results have also shown that there seems to be very little bias induced due to both the over-representation of fados in the training process, and the fact that they are all in the form of piano reductions. The fact that the vast majority of recordings classified as jazz piano solos, by an external source, were indeed classified as jazz related genres reinforces the idea that fado piano solos, in particular, and fado recordings, in general, carry enough features, per se, to be possible to be identified as exemplars of a distinct genre.

14.2 Human Feedback

Five random samples were drawn from the 200 computer generated MIDI files (C1, C2, C3, C4 and C5). Another five random samples were drawn from the original corpus of human composed fados (H1, H2, H3, H4 and H5). Excerpts with duration of roughly 30 seconds (corresponding to coherent sections, no cuts in between thematic material) were recorded to mp3 using a fixed template with Alfama guitarra and viola samples and the same effects to prevent bias from different sound quality or production issues. These excerpts were ordered randomly. We obtained a final track with 10 coherent excerpts, with the same sound quality, in which the only variable was the fact that five were generated by the program and five were composed by humans.(18) The listeners were oblivious to this fact. An online quiz was created in which we asked the subjects to listen to the track, in its entirety, for them to have an idea of the overall sounding environment they were judging. Then, they were asked to attribute a note ranging from 0 to 10 to each excerpt, according to their personal, subjective, aesthetic criteria. There was also an open text box for users to leave their general commentaries, if they wished to do so. There was no further leading and no personal data was collected, apart from their relationship with music (listener, amateur or professional performer) and level of music education. Although we recognise the sample seems small, there was no other practical way to do this in a timely manner, since most people have no time or disposition to answer to long or complex enquires. 65 subjects answered the enquiry and their answers were then analysed. We realised that most excerpts had all kinds of possible classifications, showing the enormous variety of opinions among different people, although some tendencies emerged. Overall, the distributions were rather normal and there were neither exceptionally unpleasant nor particularly pleasant excerpts, with global means ranging between 4 and 7.

A one-way ANOVA test revealed that there were differences among the classifications not explained by chance (Fig. 7). A Tukey’s HSD (Honest Significant Difference) test revealed that C1 was significantly better classified that the other four computer generated excerpts. It also revealed that H3, H4 and H5 were better classified than H1 and H2. C1 was better classified than the two lowest (H1, H2) and in pair with the other three human composed (H3, H4, H5) excerpts. However, the other four computer generated (C2, C3, C4, C5) excerpts were just comparable with the two lowest and significantly worse than the three highest human composed excerpts.

Model Evaluation
Figure 7 – Model evaluation with human subjects.

Overall (Fig. 8), the cluster of computer generated excerpts had a lower classification than the cluster of human composed excerpts in a significative difference of 1.06 points of pleasantness in the scale of 0 to 10.

Human vs. Computer
Figure 8 – Overall human versus computer-composed samples.

These results indicate that the generator succeeded in creating output that has relatively relevant pleasantness (global mean of 5) and the ability of producing some high quality samples, once in a while, while avoiding garbage. However, it has also shown that there is room for improvement, since most of the time its results are perceived as slightly worse than human-composed music (global mean of 6). The difference of only 1.06 points of pleasantness, although significant, provides hope for future research as this discrepancy could have been greater.

We can also say, based on our knowledge of the samples, that the most out-of-the-box, innovative fados generated by the program polarised the subjects. While it could be argued that a successful system might be the one that mimics the practice as is, a creative system is the one that also might anticipate change or suggest actual change and thus provides alternative results by going off limits in some parameters. As such, the more one adjusts the CR the more unexpected results one might get. And some of our samples sounded peculiar due to unusual harmonic progressions mixed with more skewed melodies. While some of the subjects applauded the results and preferred the difference and innovation, and thus gave better grades to these recordings, others greatly penalised these excerpts in favor of more traditional ones. Therefore, the subjective taste of the audience and the general level of music education of the subjects has and will always play a role in their preferences. An output that may be seen as atrocious by many will actually be seen as pleasant for others. Therefore, the fact that there was no negative consensus, and even the worst classified samples were also able to obtain a relevant number of high classifications, leads us to believe that the generator has good perspectives.

15 Conclusions

We consider we have accomplished our goals in the sense that we were able to deal with the problems posed by David Cope and at the same time respect the ideas of Brian Eno,(19) two important precursors in these kinds of systems. First, we have built a generator that, with the click of a button, even without further human interaction, generates a new piece from scratch with (arguably) clear stylistic leanings. In this sense it works like an organic unique seed from which a whole plant grows. By having all abstractions indexed to the same seed, the outcomes are reproducible by anyone who desires to do so. The work is also customisable enough in the sense that a user who wishes to provide some constraints or make some decisions on their own is able to do so. The architecture of the program, being modular and highly flexible and customisable, acts as a case study for fado, but can be easily adapted to other contexts. This workflow shows how the automatisation of the composition process is very useful in case one just wants to produce a large amount of instrumental songs in a short span of time. It achieves optimum performance if the songs are all within the same style, maintaining the same instrumentation, in order to generate a thematic album. Still, the modular structure of the model allows for relatively easy modelling of other styles and ensembles by changing the relevant variables. Basically, we believe we have fulfilled our own expectations in the sense that we have not only built a proof of concept generator of music usually associated with fado practice, but also developed a set of skills and templates that allow us and any user that follows our steps and algorithms to develop their own generators for other kinds of music as well.

We think the project falls short in terms of the melodic component mainly because of the lack of a real text. The more we worked, the more we understood the importance of the semantic content and the patterns of stresses and lengths of the vowels in shaping the actual rhythms and pitches governing a melody. Moreover, when coding the gestural performance and paying attention to the KTH performance rules, we also realised that these parameters were also constrained by the same factor. So, while we still acknowledge the hierarchical importance of this parameter in the whole process, the lack of its ideal implementation, in fact, impacts the results. Therefore, in order to refine this or similar projects, we cannot stress enough the importance of the text implementation to its full extent, building a database of a given linguistic community and both a lyrical generator and extensive rules on how the words shape the subsequent parameters in a more realistic way.

We have also developed a set of tools useful for humanising the scores generated and explained ways and solutions to easily render and record them. This makes this set of tools a potentially highly appreciated device for the music industry, namely for the composer who wishes to work for the music libraries industry, allowing the composition, production and recording of large sets of culturally important music in much reduced times.

16 Future Work

We believe the corpus should be enlarged. There are several more fados to be transcribed into digital versions and data to be retrieved. The implementation of the lyrical module to its full extent should be done in order to optimise this model. Such work would require the coordination with other researchers already working with the lyrics of fado, in order to have more empirical studies regarding their data. The work of Toivanen et al. (2013) is a good inspiration in order to accomplish this task.

A proof-of-concept modular generator that can recreate other styles was programmed. We have also demonstrated how a setup can be used to quickly generate content suitable to be used in the music library industry in general, or allow the individual composer to produce general abstractions, instead of individual compositions, thus reducing the overall time. We believe the next step would be to test the market and develop applications, namely generative standalone applications, such as the ones devised by Brian Eno, already on the market, but this time devoted to other markets. We believe the association of the generator with the sample engine Alfama and a friendly graphic user interface can easily be used in the market of background music in many Portuguese contexts where it makes sense: monuments, restaurants and public places. We also believe the code and architecture in itself is suitable to be replicated in open source software, like Common Music, to be taught in schools. We believe a new paradigm for the teaching of music composition and music applications is now open.

We also believe that this same base setup can and should be used as a base to formalise and model many other musical practices.

17 Supplemental Material

Supplemental material for the article can be accessed here.


Acknowledgements

We wish to acknowledge the valuable contributions of Fábio Serranito and Jon Fielder, who proofread and edited the text.


Funding

This work was funded by FCT – Fundação para a Ciência e Tecnologia in the framework of UT Austin-Portugal Digital Media program, grant [SFRH / BD / 46136 / 2008].


References


Footnotes

  1. http://www.unesco.org/culture/ich/en/RL/fado-urban-popular-song-of-portugal-00563, accessed 11 May 2016.
  2. Back to text
  3. https://www.youtube.com/channel/UCJlUZ6RIRgbL7iHfbNdfJhA, accessed 11 May 2016.
  4. Back to text
  5. This collection comprises more than 1,000 songs published in Portugal since the second half of the nineteenth century, by several publishers. These scores are individual folios that were previously scanned by researchers from the Instituto de Etnomusicologia – Centro de estudos em Música e Dança (INET-md), FCSH/NOVA.
  6. Back to text
  7. http://fado.fcsh.unl.pt.
  8. Back to text
  9. http://www.bandinabox.com/, accessed 11 May 2016.
  10. Back to text
  11. http://algoart.com/, accessed 11 May 2016.
  12. Back to text
  13. http://www.generativemusic.com/index.html, accessed 11 May 2016.
  14. Back to text
  15. http://commonmusic.sourceforge.net/, accessed 11 May 2016.
  16. Back to text
  17. http://www.symboliccomposer.com/page_main.shtml, accessed 11 May 2016.
  18. Back to text
  19. Notable fado composers recognised and legitimised by the community.
  20. Back to text
  21. A realistic range could be -D to F, since most amateur singers do not use an extension of more than an octave and a half.
  22. Back to text
  23. The latest version of the generator as well all the necessary auxiliary functions can be retrieved at: http://tiagovideira.com/2015/06/07/instrumental-fado/.
  24. Back to text
  25. https://www.youtube.com/watch?v=VUYPI2se-WA, accessed 11 May 2016.
  26. Back to text
  27. They can be retrieved at: http://tiagovideira.com/2015/06/07/instrumental-fado/.
  28. Back to text
  29. http://www.bushgrafts.com/jazz/midi.htm, accessed 11 May 2016.
  30. Back to text
  31. http://jmir.sourceforge.net/Bodhidharma.html, accessed 10 May 2016.
  32. Back to text
  33. http://www.music-ir.org/evaluation/mirex-results/sym-genre/index.html, accessed 11 May 2016.
  34. Back to text
  35. The track with the 10 excerpts is provided in Section 17. The examples 1 (C1), 4 (C2), 6 (C3), 7 (C4) and 10 (C5) are computer generated while 2 (H1), 3 (H2), 5 (H3), 8 (H4) and 9 (H5) are retrieved from the corpus.
  36. Back to text
  37. http://www.inmotionmagazine.com/eno1.html, accessed 11 May 2016.
  38. Back to text

Author Contacts



Back to top