MNS: a formalism for musical nuance

David P. Anderson
May 2022

In another essay I argue the need for a formalism for musical nuance. Here I sketch the design of such a formalism. Let's call it MNS (Musical Nuance Specification). We borrow some ideas from Cascading Style Sheets (CSS), a formalism for specifying the appearance of web pages. A web page is described by a tree of nodes, specified by an HTML document or generated by a Javascript program. The nodes correspond to HTML elements and enclosed data. Nodes can be assigned "class" and "ID" attributes. A CSS file is a list of rules, each of which has

a pattern specifying the tags, classes, and IDs of the nodes to which it applies;
a set of "style" attributes (color, font size, alignment) to be applied to those nodes.

When a web page is displayed, the browser processes the associated CSS file(s) and applies the styles to the nodes matching the rules. CSS files can be "layered": later files modify or overwrite the effects of earlier files.

Score documents

We also take the approach of separating document and annotation. In our case, a "score document" describes a set of notes. This could have the form of a MusicXML file (which can be generated by most score editors) or a Music21 object hierarchy, or a MIDI file. The attributes of a note include:

Its start time and duration in units of "score time". The scale is arbitrary, but we find it convenient to adopt the convention that the unit is a 4-beat measure. Thus 1/4 is a quarter note and so on.
Its pitch index (typically a MIDI pitch number; 60=middle C).
A set of "tags" (character strings). For example, "rh" and "lh" could be used to tag notes in the right and left hand parts. In a fugue, tags could indicate that a note is part of the fugue theme, or a particular instance of the theme. Grace notes could have a "gr" tag.

As part of processing a document, some attributes of a note N are generated automatically:

Tags "top" or "bottom" are added if N has the highest or lowest pitch of notes sounding at its start time.
"N.nchord" is the number of notes with the same start time as N, and "N.nchord_pos" is N's pitch order in this set (0 = lowest, 1 = 2nd lowest, etc.).

In addition, the score document can specify a set of "measures". Each is described by its start time, its duration, and a "type" tag, a string representing its duration and structure (e.g. "2+2+3/8"). If measures are specified, each note N has two additional attributes:

N.measure_offset: the time offset from the last measure start.
N.measure_type: the type of the measure.

Note selectors

MNS provides a powerful way of identifying sets of notes within a score document. A "note selector" is a Boolean expression involving the attributes of a note N. We use Python syntax for these expressions. For example,

'rh' in N.tags and N.dur == 1/2

selects all half notes in the right hand. We could select notes in a particular time range, at a particular measure offset, and so on.

Note: CSS also has expressions.

Piecewise functions of time

In MNS, changes in tempo and dynamics can be described as functions of (score) time. These functions are specified as a sequence of "primitives", each of which represents a parameterized function defined over a time interval with a given duration. A function defined in this way is called a "piecewise function of time" (PFT). For example,

linear(80, 50, 2/1),
linear(50, 90, 4/1)

defines a tempo function that slows linearly from 80 BPM to 50 BPM over 2 measures, then speeds up to 90 BPM over 4 measures.

There are many possible PFT primitives: polynomial, exponential, trigonometric, etc. Research into nuance in actual performance will hopefully shed light on what primitive types are most useful.

In MNS, PFTs are used for two purposes:

A "tempo PFT" describes a changing tempo. Its value is integrated, so the value at segment boundaries isn't relevant.
A "value PFT" is used for dynamic control. Its value at segment boundaries is relevant, so the primitives have additional "closed_start" and 'closed_end" parameters indicating whether the domain includes the start and end times.

For tempo PFTs, an additional primitive is available:

delta(dt, after=True)

This inserts a pause of dt seconds. If after is True, the pause occurs after notes at the current score time.

Dynamic control

Note volume is represented as 0..1. Notes initially have volume 0.5. Volume adjustments are multiplicative factors.

The primitive

vol_adjust_pft(t₀, pft, selector=None)

adjusts the volume of a set of notes according to a function of time. "pft" is a PFT, and "selector" is a note selector The volume of a selected note N in the domain of the PFT is adjusted by the factor pft(t), where t is N.time - t₀.

This can be used to set the overall volume of the piece. It can be used to shape the dynamics of an inner voice by selecting the tag used for that voice.

Other primitives adjust volume explicitly (not necessarily as a function of time).

vol_adjust(factor, selector=None)
vol_adjust(func, selector=None)

These adjust the volumes of the selected notes. If the 1st argument is a function, its argument is a note and it returns an adjustment factor. Otherwise the 1st argument is an adjustment factor. For example,

vol_adjust(lambda n: random.normal()*.01)

makes a small normally-distributed adjustment to the volume of all notes.

vol_adjust(ns, .9, lambda n: n.measure_offset == 2)
vol_adjust(ns, .8, lambda n: n.measure_offset in [1,3])
vol_adjust(ns, .7, lambda n: n.measure_offset not in [0,1,2,3])

emphasizes the strong beats of 4/4 measures.

Timing control

The timing of a note (its start time and duration) is described in both "score time" (in which the unit is a 4/4 measures) and "performance time" (in which the unit is one second). Initially, performance time is score time times 4, so that 1 beat (i.e. a quarter note) equals 1 second. The score time of notes is fixed. Timing primitives manipulate their performance time.

tempo_adjust_pft(pft, selector=None, normalize=False, bpm=True)

This adjusts the performance time of the selected notes according to a function F specified by pft.

If bpm is False, the value of F is the rate of change of performance time with respect to score time. The performance duration of a score-time interval is the integral of F over that interval. We call this an "inverse tempo function" because larger values mean slower: 2.0 means go half as fast, 0.5 means go twice as fast.

If "bpm" is True, the value of F is in beats per minute. For example, 120 means go twice as fast. F represents tempo rather than inverse tempo.

If "normalize" is set, F is scaled so that its average value is one. This can be used, for example, to apply rubato a particular voice over a given period, and have it synch up with other voices at the end of that period.

The semantics of this function:

Make a list of all "events" (note start/end, pedal start/end) ordered by score time. Each event has a score time and a performance time.
Scan this list, processing events the satisfy the note selector (if any) and that lie within the domain of the PFT.
For each pair of consecutive events E and F, compute the average A of the PFT between the score times of E and F (i.e. the integral of the PFT over this interval divided by the interval size).
Let T be the difference in original performance time between E and F. Change the performance time of F to be T/A seconds after the (new) performance time of E.

pause_before(t, dt)

Add a pause of dt seconds before score time t. Earlier notes that end at or after t are elongated.

pause_after(t, dt)

Add a pause of dt seconds after score time t. Notes that start at t are elongated.

roll(t, offsets, is_up=True, is_delay=False)

Roll a chord. "offsets" is a list of time offsets (typically negative). These offsets are added to the performance start times of notes that start at score time t. If "is_up" is true, they are applied from bottom pitch upwards; otherwise from top pitch downward. If "is_delay" is True, the range of the offsets is added to subsequent notes, so that the roll adds a delay.

t_adjust_list(ns, offsets, selector)

"offsets" is a list of time offsets (seconds). They are added to the start times of notes satisfying the selector, in time order.

t_adjust_notes(ns, offset, selector)

The given time offset (seconds) is added to the start times of all notes satisfying the selector.

t_adjust_func(ns, func, selector):

For each note satisfying the selector, the given function is called with that note, and the result is added to the note's start time.

Articulation control

perf_dur_rel(factor, pred)

Multiply the duration of the selected notes by the given factor.

perf_dur_abs(t, pred)

Set the duration of the selected notes to the given value (seconds).

perf_dur_func(f, pred)

Set the duration of a selected note N to the value f(N).

Pedal control

An application of a pedal is represented by

pedal(start, end, type)

where "start" and "end" are in score time, and "type" is sustain, soft, or sostenuto.

Timing control primitives affect pedal events as well as notes.

Layering

For the most part, MNS primitives don't set note parameters; rather they adjust the parameters. This means that nuances can be "layered" as mentioned earlier. We could imagine having several MNS specifications, each in a separate file, applied in sequence to a score, analogously to CSS stylesheets.

Extending and refining MNS

MNS as described here is version 0. It's a good framework, but it's fairly crude and low-level. I find that each new piece I work on requires new MNS features. There's lots of room for improvement.

One area of work is PFT primitives. Linear primitives - all I've implemented so far - are not quite convincing for either tempo or volume control.

Another important area of work, I think, is the way notes are selected. What we have now is basic: notes are tagged based on chord position and metric position, you can tag them explicitly. I can imagine more sophisticated ways of selecting notes, based on musical semantics:

Harmony: tag various types of cadences and estimates of harmonic distance.
Phrase structure: suppose you want to add a small ritardando at the end of each phrase, or accent the high points of phrases. It would be good to express this at a high level.

... and so on.

It might be useful to parameterize MNS primitives, so that you can adjust the overall level of nuance by turning a single "knob".

Representing MNS specifications

The primitives described here have been implemented in a Python library called Numula.

For interoperability with other musicc software, we need a file format for MNS specifications. I lean toward JSON as a basis for this; one could also use XML, but it's unnecessarily verbose. An MNS specification in JSON might look like:

{
    "title": "MNS spec for Chopin's 1st Nocturne",
    "operators": [
        {
            "type": "tempo_adjust_pft",
            "start": "3/4",
            "segments": [
                {
                    "type": "linear",
                    "start": 90,
                    "end": 110,
                    "dur": "8/4",
                },
                {
                    "type": "delta",
                    "value": .15
                },
                {
                    "type": "linear",
                    "start": 110,
                    "end": 80,
                    "dur": "7/4"
                }
            ],
            "normalize": "True",
            "selector": "'rh' in tags"
        },
        {
            "type": "vol_adjust_pft",
            "start": 0,
            "segments": [
                {
                    "type": "linear",
                    "start": .3,
                    "end": .8,
                    "dur": "2/4",
                    "closed_end": "True"
                },
                {
                    "type": "linear",
                    "start": .7,
                    "end": .3,
                    "dur": "1/4",
                }
            ]
        }
    ]
}