Spatializing counterpoint

David P. Anderson
November 2021

Note: you'll need headphones for the audio examples below.

To fully appreciate counterpoint (e.g. most Bach) one must experience it as a set of independent melodies, or voices.

For ensemble music, this happens naturally: each voice is played by a separate musician, and the instruments usually have different timbres, making it easy to distinguish the voices. For example:

Various trio sonatas

For keyboard works, it's harder. A single performer plays all the voices; the performer must mentally "hear" them, and convey them to the listener.

On multi-manual organs, voices can be played in different stops. Similarly, to a lesser extent, on two-manual harpsichords.

But on a piano it can be hard to separate the voices, especially if there are more than two, or if some of them are in the same register or even overlapping in pitch. You must use nuances of timing, dynamics, and articulation to make the voices evident to the listener, and ideally to give them distinct personalities. A skillful perfomer can do amazing things; here's one of my favorite examples, played by Chiara Bertoglio:

An Wasserflüssen Babylon, arr. Perrachio

Listening to this, I can distinguish the voices most but not all of the time, and it takes some mental effort to do so.

Spatialization

Another way to distinguish voices is to separate them in space. Make it sound like each voice is coming from a distinct point in space: to the right or left, above or below, close or far. Make it sound like you're standing in front of, or in the middle of, a group of musicians, each playing a single melody. Maybe the musicians are moving around you, circling each other, getting closer and farther. Listening to Bach is already a wonderful experience. Spatialization, it seems to me, could take it to a new level, immersive and overwhelming.

Note: spatialization happens naturally in live multi-instrument performances, especially if you're sitting close to the musicians and the room acoustics are good. Here I'm interested in recorded music.

The psychoacoustics of spatialization are well understood. Something produces sound waves. They hit your head, and refract around your skull, through your hair, around the contours of your outer ears, and to your eardrums. The refraction varies with frequency, so what reaches your eardrum has different frequency content than the source. And because the speed of sound is finite, sounds arrive at the two ears at slightly different times.

So what each ear hears is different. Your brain uses these differences, as well as its a priori knowledge of the actual frequency content of common sounds, to estimate where in 3-space the sound originated. Not just right and left - you can distinguish between sounds coming from above and below you, or from behind.

The transformation from the original sound to what reaches the eardrum is called the Head related transfer function (HRTF). The HRTF depends to some extent on the listener: the shape of their head, length of hair, etc. You can figure out the HRTF for a given person by putting tiny microphones in their inner ears, then playing sounds coming from various points in space and comparing them with what the mics pick up. Acoustics researchers have done this with lots of subjects, and with thousands of points per subject. See the links here.

Note: This doesn't take into account room acoustics, but it can be extended to do so.

It's also possible to build an artificial head, with average skull dimensions, realistic hair and outer-ear structure, and put microphones where the eardrums would be. These are called "binaural microphones", and are commercially available:

Neumann head mic

Such microphones are used for various purposes, such as measuring the effective road noise in cars.

If you make a stereo recording using a binaural mic, and listen to it on headphones (or better, earbuds, which bypass the outer ear) what you hear is a good approximation of what you would have heard if you had been there, and your head had the position and orientation of the mic. If sounds come from different places, you hear them as such.

Example of binaural recordings (listen to these on headphones, otherwise you won't hear the effect):

Virtual Barber Shop
Rocket launch

Note: you can also get limited spatialization by making recordings with carefully placed and oriented directional microphones, and listening to these recordings on headphones or stereo speakers. But this is crude compared to the binaural approach. You can distinguish sounds only in a limited horizontal range, and they don't seem to originate from points.

Spatializing music

Let's get back to music, and the goal of making it easier to hear counterpoint by separating voices in space.

For ensembles, you can do this using binaural recordings. For example, you can put a binaural microphone close to a string ensemble:

Vivaldi in 3-D

or an orchestra:

Mahler 9 (also 360°video)

... and the voices are separated fairly well. Here's an interesting example, where the microphone is moved among the performers. The first part is in mono to show the difference.

Peter and Kerry in 3-D

It's a mystery to me why there are aren't more binaural recordings.

Back to the piano. How can we spatialize recorded contrapuntal piano pieces? Various possibilities:

Put a binaural microphone close to or inside a piano and record a performance. Depending on how the mic is oriented, bass might be on the left and treble on the right. That could be interesting, but the balance would be messed up (registers closer to the mic would be louder). And it's not clear that how effective it would be at separating closely-spaced voices.
Have the performer record one voice at a time, producing a mono recording of each voice. Assign each voice to a point in space - possibly varying over time. Using a computer, apply the appropriate HRTF to each sound, and combine these into a final stereo sound file.
This approach has obvious problems associated with recording the voices separately. You'd lose rhythmic coherence and tightness.
Put N pianos and N pianists in a room, have each pianist play one of the voices, record them separately. Proceed as in 2). Analogous problems.
Record a performance as a MIDI file, using a piano equipped with MIDI sensors. Manually separate this into voices. Convert each voice into a mono sound file using a digital piano sample and the appropriate software. Then (as above) assign a spatial position to each voice, apply HRTFs, and combine them into a stereo file.
Note: a digital piano sample is actually stereo. With a little DSP magic, we could have each voice appear to emanate not from a single point but from a line the width of a piano.
Remove the performer altogether and manually create the per-voice MIDI files, using software like Numula to add nuances of dynamics, rhythm, and articulation (that's a whole 'nother topic). Then do the remaining steps as in 3).

Conclusion

Enjoying counterpoint as a listener depends critically on distinguishing the voices. With conventional stereo recordings - especially for the piano - this can be hard to do. There's a wall of sound, and it's up to the listener to pry the voices out of it. Spatializing the voices could greatly improve the listening experience.

If anyone wants to discuss this - or work with me to make a demo - let me know.