A description of the implementation of the theory of emergent music.

Foundational Decisions

A theory may be interesting in itself, but if one is to actually create music, a method of employing that theory must be devised. To do this, several decisions were made.

First: while it may seem strange to bother saying so in the 21st century, the first decision made was that the generative implementation of the theory would be made via software. This affords a speed, convenience, and flexibility that would otherwise be difficult to achieve.

In previous times, when composers decided to explore process-driven, generative, or emergent musics they didn't have this option available to them. Instead, they employed a range of simpler technologies, ranging from pen and paper and tape decks to the I Ching. Interesting musics at times arose, but many of those technologies and approaches suffered from limitations, some practical and some theoretical.

On the practical side, many of these approaches were time consuming and labor intensive, so it was difficult to explore the possibilities that the approaches presented. Long times and much effort could elapse between an idea and its musical elaboration, which adversely affected the composer's ability to truly explore the possibilities inherent in his or her approach.

On the theoretical side, while many of these approaches allowed music to emerge from the interaction of separate processes, they suffered from a centralization of intention and control. Processes could interact in sound, but often there were no ways for these processes to interact with one another in any deep way (a notable exception occurred when these processes were musicians who could hear what others were doing and react according to some agenda). A long delay tape deck may play many independent parts (from which an interesting sound-scape may arise), but it is the tape deck that acts as the point of control, and the components--the tape loops--act independently.

This theory of emergent musics overcomes these theoretical limitations by allowing processes to create and transform other processes without a centralized point of control, but this flexibility all but necessitates the use of modern computing technology. This, along with a desire to explore the possibilities in an empirical way, dictated that the implementation should be software-based.

Having decided that a software implementation would be used, the next decision was that the software would itself act in a transformative way. I decided that a grammar should be created to describe musics created according to the theory, and that the software would transform these grammar into a form that was convenient to work with in the creation of music. For this the Musical Instrument Design Interface (MIDI) standard was chosen. The MIDI standard is widely understood by both hardware and software systems, and can be used in many ways, from creating standard-notation musical scores to driving banks of synthesizers.

Having decided these things, the remaining task was to design the grammar that would be used to specify the musics. For this, the formalism of context-free grammars was chosen.

Context-free Grammars

In the field of formal semantics, a grammar is a collection of expressive elements (strings) and rules that transform those expressive elements. A grammar defines the set of valid expressions within a language. In linguistics the expressive elements are strings composed of letters and the grammar defines how these elements can be combined into valid expressions (sentences, parts of speech, etc.). Formal grammars are also heavily used in computer science for defining programming languages. When a program is written its injunctive meaning must be determined, and the grammar that defines the language is used for this.

Grammars are specified as a collection of symbols and production rules. Symbols are considered either terminal or non-terminal. The terminal symbols form the base elements of the statements in the language; the non-terminal symbols are ultimately transformed into those base elements. To give a concrete example, in a natural language like English the terminal symbols are what is ultimately written in the language--the letters--whereas the non-terminal symbols can be things like "<sentence>" or "<subject>."

In a musical grammar, the terminal symbols are the sounds played. When all is said and done, what is left is a series of notes.

The production rule describes the mapping and transformation process between symbols.

Extending the example of the English language, a valid production rule could be

"the sequence <subject> <verb> <object> can comprise a <sentence>."

In this example each symbol is non-terminal (you never see "<object>" when reading; you see letters making words).

(In the notation used by those studying grammars this rule would be written something like:

<sentence> -> <subject> <verb> <object>

<sentence> is called the left-hand side of the production rule, <subject> <verb> <object> is called the right-hand side.)

A language (or grammar) is a collection of these rules, such that ultimately everything can be transformed or mapped to terminal symbols (in the case of English, letters, but in the case at hand, sound). Note that a grammar defines the syntax of a language, but does not address its semantics.

A context-free grammar (first formalized by Noam Chomsky in the 1950's) is a special case of a formal grammar where the left-hand side of all the grammar's production rules consist of a single, non-terminal symbol. The above example, in which <sentence> is a single non-terminal symbol, would be a valid production rule in a context-free grammar.

Music--structured sound--can be thought of as a language, having grammar and syntax. As mentioned above, notes (or more generally sounds) can be thought of as terminal symbols. Through a series of transformations a starting point can be shifted and transformed until an expression results, a musical expression. Within this theory a musical composition is created by defining a starting point and a grammar. The music is no longer captured as a document of notes in pitch and time; the grammar is the music.

One of the inspirations for the decision to employ context-free grammars was Chris Coyne's Context Free Design Grammar (CFDG), a language and implementation that uses a context-free grammar in the production of visual art. When I saw his work I immediately saw how such grammars could be employed in the field of generative music, and in fact several of the specific design decisions made in the creation of this grammar were directly inspired by the CFDG.

Language Design

Symbols and Attributes

As stated above, the design of a formal grammar is the process of selecting symbols and their transformation via production rules.

In music, the selection of the terminal symbols is easy; they are the sounds or notes that will ultimately be sounded. In this grammar the note is the only terminal, and this symbol has several attributes that control it's sounding: the temporal attributes of time and duration (which control when and how long the note is sounded), the frequency attributes (the pitch), and the voice attributes (an enumeration of voice and the note's volume).

These attributes are controlled respectively by the following keywords: time, duration, pitch, volume, and track. Temporal quantities can be expressed as integers or floating-point numbers; pitch, volume, and track are expressed as integers. To afford easy mapping to the MIDI specification volumes are limited in range to 0 through 127. Track is a zero-based attribute limited to the steps defined in the voice space.

The following are all valid examples of note attribute specifications:

note pitch 1
note time 1.5 duration 0.5 volume -10
note time -1.0 duration 0.5 track 2 volume 10 pitch -1

Please remember that the attributes applied to notes can also be applied to the execution of processes.

Expression Structure

As detailed in the documentation of the theory, three parts comprise a musical expression: the specification of the initial processes; the placement of the process in space (and the definitions of those spaces where appropriate); and the definition of processes. The structure of the language mirrors this.

Starting Process

All well-formed expressions begin with a specification of the initial process. This is accomplished with the start command. If our initial process is named "process1," the command would be:

start process1

Spacial Placement

Following the start command, the placement of the initial process is specified, as is the structure of space. These are specified via the set command.

In general, the attributes that could be assigned to note or process can be set with this command. The following are all valid expressions:

set volume 75
set pitch 1
set duration 0.25
set track 0
set time 10

If these are not set, default values are assumed. For example, pitch, time, and duration all default to zero.

The set command can also be used to define the geometry of the music's space, but this uses special reserved keywords. The following are all examples of of valid expressions (note: within the language, the pound sign (#) can be used to insert comments into a language specification file. The pound and everything following it is ignored by the parser):

set wrapnotes True      # Pitch is circular.
set tempo 80            # 80 bpm
set maxnotes 5000       # Produce at most 5000 notes.
set pitchrange (-11 16) # Valid pitches. Will either wrap or truncate,
                        # depending upon wrapnotes.
set wraptime False      # Time is linear.
set timewrange (0 1000) # Extent of time
set numtracks 8         # Eight quanta of voice
set wraptracks True     # The dimension of voice is circular.

Cycles and Scales

Great flexibility in the specification of frequency quanta is afforded through the use of two commands: set scale and set tuning.

A scale is defined by the set scale command. As an example, the following command defines a C-major scale:

set scale (2 2 1 2 2 2 1) 60

The numbers in the parentheses denote the number of steps between successive intervals in the scale. The final number denotes the zero-point of the scale (where pitch 0 sounds). In this case is is MIDI note 60, which happens to be a C in the middle registers. MIDI note numbers are defined between 0 and 127.

With just the set scale command one can specify many scales and cycles, but without the set tuning command these are limited to the standard, even-tempered notes defined in western music. With the set tuning command, any note number can be mapped to any frequency:

set tuning 0 10.0
set tuning 1 11.0
set tuning 2 12.0

The above re-maps the first three MIDI notes to 10 Hz, 11 Hz, and 12 Hz, respectively.

The combination of set tuning and set scale affords great flexibility. Recall that a cycle is defined as a repeating pattern of quantized frequency (the normal concept of scale is a subset of this, one chosen from a limited set of patterns that that repeats at a doubling of frequency). With these two commands one can specify any arbitrary cycle.

Process Definition

Processes are defined with the process command, and contain actions (the sounding of notes and execution of other processes). Perhaps the simplest process is the sounding of one note:

process p1 {

When processes call other processes (or themselves), complexity can arise quickly. As an example, by adding a line to the above process we can create an infinite row of increasing pitches:

process p1 {
  p1 pitch 1 time 1

The above process sounds a note, and then calls itself translated in pitch and time by 1. This recursive definition will keep producing notes until an exit criteria is met (perhaps the notes fall out of the allowed range, or perhaps the maximum number of notes is exceeded).

When a process is defined multiple times a stochastic process is executed. The following two versions of process p1 would be chosen from a uniform probability:

process p1 {
  note pitch 1
process p1 {
  note pitch -1

Weights can be added to these definitions. Extending the example above, in the following the second version of p1 will be executed, on average, twice as often as the first:

process p1 1 {
  note pitch 1
process p2 2 {
  note pitch -1


Consider the following, in which a row of five notes is played:

process row {
  note time 0 pitch 0
  note time 1 pitch 1
  note time 2 pitch 2
  note time 3 pitch 3
  note time 4 pitch 4

This works as expected, but it is tedious and error-prone. To make defining these kinds of patterns easier, some syntactical sugar was added to the language. The following definition is equivalent, but easier to use:

process row {
  5 * (time 1 pitch 1) note

This syntax tells the interpreter to execute note five times, incrementing time and pitch by 1 on each successive execution. Repetitions can also be used with process calls:

process p1 {
  4 * (time 16 track 1) p2
process p2 {

A Simple, Complete Example

interesting, complex, and/or beautiful musics can be created with language, but for the sake of clarity I will give an example which is none of these. The following example plays a simple C-major scale:

start scaleprocess
set scale (2 2 1 2 2 2 1) 60


set volume 100
set duration 1.0
set tempo 80


process scaleprocess {
  8 * ( pitch 1 time 1) note


Miscellaneous Language Elements

This essay is not intended to provide a complete language reference; moreover it is meant to illustrate the basics of the language and to give the reader some of its flavor. Many options and values have not been discussed. As an example, the language also contains the following elements:

  • A macro-processor that allows files to be included in other files, and allows the creation of simple substitution variables.
  • Commands that are convenient in the creation of music, allowing one to specify pan, volume, and program settings for each MIDI track.
  • Etc.

Language Implementation

The software written to implement this language (and the execution thereof) was written in Python. Both command-line versions and simple GUI versions were created. The implementation was pure Python (although a cross-platform GUI toolkit was used for the GUI), so the software runs on any platform to which Python has been ported. A screen-shot of the GUI follows.


(Click to enlarge)

Creating Sound

Of course, a musical score--be it a conventional score, a MIDI file, or the language specification of an emergent music--is not an instantiated musical expression. A score must be translated into sound to be heard.

This is done by mapping the quanta of the voice dimension into sonic sources, and on a practical level this can take more time and effort than the creation of the score. Sonic design is heavily guided by one's own sense of aesthetics, and different people would probably come up with entirely different musical expressions from the same score.

Once the mapping of voice to sound is complete, the music can be captured. This too can be a time-intensive process.


An analytical example of a real score can be found here.