Formalizing nuance in classical music

David P. Anderson (with input from Rich Kraft)

1 May 2022

Introduction

This essay is concerned with nuance in western classical music. In this context, nuance can be loosely defined as the differences between the score for a work and a rendition of the work, such as a human performance. These differences can be roughly divided into:

Timing: tempo variation, rubato, pauses, rolled or other non-simultaneous chords, etc.
Dynamics: crescendos and diminuendos, accents, voicing, etc.
Articulation: legato, staccato, portamento, etc.
The use of pedal (sustain, soft, sostenuto).

We focus on keyboard instruments like piano. Other instruments (and voice) are more complex because a) their notes have additional properties (such as attack and timbre), and b) the properties may change continuously during the note. The ideas presented here apply to such instruments, but would have to be extended to encompass these factors.

For most classical music, nuance is a critical component of rendition. An example: "Wasserklavier" by Luciano Berio. The score is here.

Notice that in Grimaux's performance, no two beats have exactly the same duration, and no two notes have exactly the same volume. The nuance of the performance is a major factor in the beauty and expression of the rendition.

Where does nuance come from? Some scores have indications of nuance: tempo markings, slurs, crescendo marks, fermatas, pedal markings, etc. These do not completely describe the nuance in a rendition, because:

The indications are imprecise: e.g. a fermata mark doesn't say how long the sound lasts, or how much silence there is afterward.
The indications are incomplete: they describe the broad strokes of the composer's intended nuance, but not the details. Indeed, western music notation is unable to express basic aspects of nuance such as voicing (the relative dynamics of simultaneous notes). A computer rendition of a work using only the score's nuance indications still sounds sterile.

Some musical styles have associated conventions for nuance. In many styles, for example, upbeat notes are softer, and pieces end with a ritardando. Performers learn these conventions by osmosis. The modern conventions may differ from those of the composition's period.

Score markings and stylistic conventions are just guidelines. In the end, nuance is left up to the performer(s). Some nuance may be planned in advance. Some may be spontaneous during a particular performance. Some may be unintended consequences of the performer's technique.

Formalizing nuance

I propose creating a "formalism" to describe nuance. This formalism should have these properties:

It has precisely-defined syntax and semantics.
It can describe nuance at arbitrary levels of detail.
It can do so compactly and at a high level - for example, a crescendo is represented by a single "primitive" rather than lots of per-note volume adjustments.
It allows nuance to be "layered" - for example, a long accelerando can be superimposed on measure-level rubato.

I have developed a formalism with these properties: Music Nuance Specification (MNS). Its details are described here, and a Python implementation of it is here.

Why formalize nuance?

To many musicians, nuance is ineffable - it's something magical that happens during performances, and to analyze or formalize it would break that magic spell.

This viewpoint is understandable. But as music evolves, and as computers are increasingly important tools for composition, pedagogy, and performance, there are reasons to expand our ability to represent and manipulate nuance: to make it a first-class citizen, along with scores and sounds. Doing so will not replace the human component of nuance, or the spontaneity of performance; rather, it will provide tools that enhance these processes and that enable new ways of making music.

Let's assume that we have a formalism describing nuance, and that we have software tools that make it easy to create and edit "nuance specifications" for pieces. These capabilities would have several applications:

Composition

As a composer writes a piece, using a score editor such as MuseScore or Sibelius, they could also develop a nuance specification for the piece. The audio rendering function of the score editor could use this to produce nuanced renditions of the piece. This would facilitate the composition process and would convey the composer's intentions more clearly to prospective performers.

Virtual performance

Performers could create nuanced virtual performances of pieces, in which they render the piece using a computer rather than a traditional instrument.

Pedagogy

A teacher's instruction to a student could be represented as a nuance specification which guides the student's practice. This could be done in various ways. For example, as a student practices a piece they could be shown a "virtual conductor" that expresses (graphically, on a computer display) a simplified representation of the target nuance. Or a "virtual tutor" could make suggestions (musical and/or technical) to the student based on the differences between their playing and the nuance specification.

Ensemble rehearsal and practice

When an ensemble (say, a piano duo) rehearses together, they could record their interpretive decisions as a nuance specification. They could then use this to guide their individual practice (perhaps using the "virtual conductor" described above).

Sharing and archival

Web sites like IMSLP let people share musical scores and recordings of renditions. Such sites could also include user-supplied nuance descriptions for works. This would provide a framework for sharing and discussing the interpretation of these works.

User interfaces for editing nuance

What kind of UI (user interface) would facilitate creating and editing nuance specifications - in particular, for transcribing one's mental model of a performance?

This generally involves changing every parameter - start time, duration, volume - of every note. We can imagine a GUI that shows a piano-roll representation of the score and lets you click on notes to change their parameters. This low-level approach would let you do whatever you want, but it would be impossibly tedious.

Desirable properties of a UI for editing nuance:

You can describe nuance at a high level: if you want an accelerando from 80 to 120 from measures 8 to 13, you can express this directly rather than adjusting individual notes.
You can express repetition. E.g., if you want to emphasize the strong beats in each measure, you can define a pattern of emphases, and then apply it to multiple measures.
You can make an adjustment and hear the effect quickly and with a minumum of keystrokes and mouse clicks.

Some general approaches:

Integrate nuance editing with score editing (e.g. in Musescore). Nuance layers would be displayed as 'tracks' below the score. You can use the mouse to drag and drop nuance primitives, adjust their parameters, and hear the results. I think this might be the best approach.
It would also be possible to use the nuance to modify the way the score is displayed: e.g. to use color or note-head size to express dynamics. I discuss this here.
Express nuance in a programming language. I've done this in Numula, a Python-based system for virtual performance and algorithmic composition. It's quite powerful, but the user experience isn't great. You have think in terms of numbers. In general the nuance editing cycle (hearing part of the rendition, modifying the nuance, hearing it again) takes lots of clicks.

A research agenda for musical nuance

Having a formalism for nuance opens up range of possible research in musicology.

The most basic issue is what I'll call the "primitive selection problem". A nuance formalism (like MNS) provides a set of "primitives". Some of these define fluctuations in tempo or volume that affect lots of notes. Others apply random 'jitter' to timing and volume. Others affect sets of notes based on attributes such as position within a measure. Others apply to individual notes.

The goal in designing the set of primitives is to find a small "basis set" of transformations, each with a small number of parameters, that can achieve the desired specifications - for example, that can closely approximate typical human performances.

MNS, for example, has linear and exponential primitives for tempo change. This was easy to implement - but can these approximate ritardandos and accelerandos in practice? There may be better choices: Bezier curves, trig functions, polynomials.

I can imagine a research program to study this, by calculating the nuance in human performances, and finding the primitives that approximate it best.

The first step is to automatically extract nuance from human performances:

Get a corpus of performances as MIDI files. Audio recordings could be converted to MIDI files by software (though I'm not sure how well this works). For each performance you'd also need a representation of the score, e.g. as MusicXML or MIDI.
Write software that finds the correspondence of notes between performance and score (there might be mistakes or other noise).

We can then use software to find a transformation that maps the score to the performance. This tranformation would typically have multiple levels. A first level would model large-scale fluctuations. The second level would take the residue from this, and fit it, possibly with different types of primitives. At some point the residue presumably would be noise-like, and its statistical properties could measured.

Each level would consist of a set of primitives. The software would consider various families of primitives: in the case of continuous fluctuations this might include linear, polynomial, exponential, logarithmic, etc. The software would use data-fitting techniques to find an optimal basis set.

It may turn out that the optimal set of primitives depends on

the performance period;
the period and style of music being played;
the individual performer

and so on.

There has been some research in this general area. I've looked at some, but haven't found anything useful. Some papers study the statistics of deviation from the score, but not the actual modeling of it.