random trip report
Note: you'll need headphones for the audio examples below.
J.S. Bach's music is almost entirely contrapuntal, and to fully appreciate counterpoint one must hear it as a set of independent melodies, or voices.
For ensemble music, this happens naturally: each voice is played by a separate musician, and the instruments usually have different timbres, making it easy to distinguish the voices. For example:
For keyboard works, it's harder. A single performer plays all the voices, and must mentally "hear" them, and convey them to the listener.
On multi-manual organs, voices can be played in different stops. Similarly, to a lesser extent, on two-manual harpsichords.
But on a piano it can be hard to separate the voices, especially if there are more than two, or if some of them are in the same register or even overlapping in pitch. You must use nuances of timing, dynamics, and articulation to make the voices evident to the listener, and ideally to give them distinct personalities. A skillful perfomer can do amazing things; here's one of my favorite examples, played by Chiara Bertoglio:
Listening to this, I can distinguish the voices most but not all of the time, and it takes a certain mental effort to do so. I find myself asking: how could we separate the voices more?
One way: separate the voices in space. Make it sound like each voice is coming from a distinct point in space: to the right or left, above or below, close or far. Make it sound like you're standing in front of, or in the middle of, a group of musicians, each playing a single melody. Maybe the musicians are moving around you, circling each other, getting closer and farther. If two voices have the same note at the same time, you'll hear two notes rather than one.
This - it seems to me - could take what is already wonderful experience, and move it to a new level, immersive and overwhelming.
The psychoacoustics of spatialization are well understood. Something produces sound waves. They hit your head, and refract around your skull, through your hair, around the contours of your outer ears, and to your eardrums. The refraction varies with frequency, so what reaches your eardrum has different frequency content than the source. And because the speed of sound is finite, sounds arrive at the two ears at slightly different times.
So what each ear hears is different. Your brain uses these differences, as well as its a priori knowledge of the actual frequency content of common sounds, to estimate where in 3-space the sound originated. Not just right and left - you can distinguish between sounds coming from above and below you, or from behind.
The transformation from the original sound to what reaches the eardrum is called the Head related transfer function (HRTF).
The HRTF depends to some extent on the listener: the shape of their head, length of hair, etc. You can figure out the HRTF for a given person by putting a tiny microphones in their inner ear, then playing sounds coming from various points in space and comparing them with what the mics pick up. Acoustics researchers have done this with lots of subjects, and with thousands of points per subject. See Resources below.
Note: the above doesn't take into account room acoustics, but you can extend it to do so.
It's also possible to build an artificial head, with average skull dimensions, realistic hair and outer-ear structure, and put microphones where the eardrums would be. These are called "binaural microphones", and are commercially available:
Such microphones are used for various purposes, such as measuring the effective road noise in cars.
If you make a stereo recording using a binaural mic, and listen to it on headphones (or better, earbuds, which bypass the outer ear) what you hear is a good approximation of what you would have heard if you had been there, and your head had the position and orientation of the mic. If sounds come from different places, you hear them as such.
Example of binaural recordings (listen to these on headphones, otherwise you won't hear the effect):
Note: you can also get limited spatialization by making recordings with carefully placed and oriented directional microphones, and listening to these recordings on headphones or stereo speakers. But this is crude compared to the binaural approach. You can distinguish sounds only in a limited horizontal range, and they don't seem to originate from points.
Let's get back to music, and the goal of making it easier to hear counterpoint by separating voices in space.
For ensembles, you can do this using binaural recordings. For example, you can put a binaural microphone close to a string ensemble:
or an orchestra:
... and the voices are separated fairly well. Here's an interesting example, where the microphone is moved among the performers. The first part is in mono to show the difference.
It's a mystery to me why there are aren't more binaural recordings.
Back to the piano. How can we spatialize contrapuntal piano pieces? Various possibilities:
1) Put a binaural microphone inside a piano and record a performance. Depending on how the mic is oriented, bass might be on the left and treble on the right. That could be interesting, but the balance would be messed up (the register closes to the mic would be louder). And it's not clear that how effective it would be at separating closely-spaced voices.
2) Have the performer record one voice at a time, producing a mono recording of each voice. Assign each voice to a point in space - possibly varying over time. Using a computer, apply the appropriate HRTF to each sound, and combine these into a final stereo sound file.
This approach has the obvious problems associated with recording the voices separately. You'd lose rhythmic coherence and tightness. <> 3) Put N pianos and N pianists in a room, have each one play one of the voices, record them separately. Proceed as in 2). Analogous problems.
4) Record a performance as a MIDI file, using a piano equipped with MIDI sensors (Yamaha makes them). Manually separate this into voices. Convert each voice into a mono sound file using digital piano sample and the appropriate software. Then (as above) assign a position to each voice, apply HRTFs, and combine them into a stereo file.
Note: a digital piano sample is actually stereo. With a little DSP magic, we could have each voice appear to emanate not from a single point but from a line the width of a piano.
5) Remove the performer altogether and manually create the per-voice MIDI files, using software mechanisms to add nuances of dynamics, rhythm, and articulation (that's a whole 'nother topic). Then do the remaining steps as in 3).
Enjoying Bach as a listener depends critically on distinguishing the voices. With current recordings - especially for the piano - this can be hard to do. There's a wall of sound, and it's up to the listener to pry the voices out of it. Spatializing the voices could greatly improve the experience of hearing Bach (and possibly lots of other music).
If anyone wants to discuss this - or work with me to make a demo - let me know.
3DTune-in: a C++ library for 3D audio, with a nice demo program. This looks extremely good.
OpenAL: another 3D audio library, targeted at video games though possibly useful.
Ambisonics: an approach to 3D sound using speakers rather than headphones.
Chris Chafe at Stanford