David P. Anderson
16 October 2023
When you hear or play a piece of music, you react in some way: love, hate, indifference, etc. The reaction depends on - among other things - your 'taste'. It seems to me that taste in classical music has these properties:
I think of music as having a sort of multidimensional normal distribution.
Most people start off listening to 1-sigma music. As time goes by, your ear expands: unusual harmonies and sonorities start to make sense. And, after hearing warhorses for the 10th time, you may get a bit tired of them. So you start listening to more 2-sigma music. Tastes start to differentiate.
This process continues. You discover 3-sigma music (by word of mouth or Internet exploration). Your taste moves into the fringes of the distribution, and becomes even more individual.
Aside: one hears phrases like "curated by experts", implying that there's an "ideal" musical taste. It think this is nonsense; it's a necessary position for centralized sources like radio stations and large-venue concerts, since they can only offer one program.
If we can infer the taste of a user, we provide discovery tools that are more efficient. We can show them items they'll probably like, and might not have discovered otherwise.
Data sources for inferring taste
Suppose we want to infer the musical taste of an online user. What sources of data can we use?
This notion of rating - often one to five stars - is pretty standard on the web. Amazon lets you rate products and shows you the average ratings; Hotels.com shows you average ratings of hotels, and so on. I've found this to be hugely valuable.
Music apps could collect ratings of all item types: compositions, recordings, performances, composers, etc.
To be useful for inferring taste, ratings need to be associated with a user. I.e. you have to be logged in to rate things. An app could collect anonymous ratings, but they'd be useful only for showing rating stats (e.g. average rating), not for anything personalized.
When requesting a rating, it's important to be clear about what's being rated. For example, people on Amazon sometimes give low ratings because their package arrived late, not because the item was bad.
When you listen to a recording, the following are potentially different:
A music rating system could separate these. And there are other factors. Maybe you like the 1st movement but not the 2nd; maybe you like the singer but not the pianist; maybe you don't like the sound quality. It would be a challenge (probably not worth it) to separate all of these.
Finer-granularity ratings have more predictive value. A rating of a performer has less value than separate ratings of a lot of recordings by that performer.
From the data science perspective, the more ratings the better. When a person attends a concert, it would be nice to get their ratings of each piece on the program. That way - for example - we could give listeners better recommendations for future concerts, and we could help performers create programs that people will like more.
But there's the danger of "rating overload". Marketers have discovered the value of ratings. When I rent a car, I get an email asking me to rate the experience on various axes. This is tedious, so I ignore it.
So collecting ratings has to be done judiciously. Maybe after attending a concert I get an email asking me to rate the pieces and the performers. It's all in the email - it doesn't take me to a web site. And maybe there's an incentive. This should be studied.
And we should keep in mind the problem of fake ratings. Composers might pad the ratings of their own compositions. Apps could discourage this by 1) making it a little hard to create a new identity: Recaptcha, respond to an email, etc. 2) use AI to identify fake ratings.
Features such as "Like", "Collect", "Follow" etc. are a low-impact binary rating. It's better than nothing, but:
Taste can potentially be inferred from user interactions:
Comments and reviews
Sites like Amazon and Hotels.com let users write reviews of items. This is a popular feature; people like to express their opinions to the world. I find reviews quite valuable in many cases.
Reviews may provide implicit ratings: Natural-language AI techniques can tell whether a review is good, bad, or indifferent.
Sites may let users rate reviews; for example, Amazon has "Helpful" and "Report" buttons.
Reviews, and ratings of reviews, could be used in various ways. Reviews can be used for 'social linkage' discovery features: E.g. when I read a review of an item I can look at other reviews by the same person, perhaps of items I already know about. That gives me some idea of whether I trust the reviewer.
We may be able to infer preference properties of compositions and recordings directly from their digital descriptions:
MPS could potentially compute these attributes, and use them for both for collaborative filtering and similarity estimates.
Modeling and inferring individual musical taste
Ratings (explicit and implicit) can be used for various discovery features. For example, an interface could say "people who like composition A also like B, C and D". This can be computed from ratings: find people who rated A highly, and see what else they rated highly. This isn't personalized; all users see the same thing.
A more powerful approach is collaborative filtering: which is a class of techniques for predicting an individual's rating of an item, based on the ratings of a large user population.
There are many collaborative filtering algorithms. For example:
I know about collaborative filtering because in the 1990s I used it to develop web sites that recommended movies and music. Collaborative filtering isn't widely used today. Netflix created a big ruckus about improving algorithms, but I'm not sure they actually use them.
The Music Preference Service
Music apps could collect ratings in various ways:
Collaborative filtering works best if all rating data, for all users, are pooled and analyzed together. If I rate a recording of Scarbo on YouTube and a performance of Scarbo on Groupmuse, these should end up in the same database, marked as ratings of the same piece.
Also, collaborative filtering works best if ratings of different item types (compositions, performances etc.) are pooled and analyzed together; that way the system can exploit correlations between types.
So the best way to do collaborative filtering is to establish a 'Music Preference Service' (MPS): a consortium of music apps that pool their rating data.
The apps in the consortium would give MPS all data relevant to preference: explicit and implicit ratings, comments and reviews, etc. MPS would define a way to link user identities across apps.
This requires that music apps standardize on the types of items being rated, and the way items are identified. This would use the Classical Music Index.
BTW, in the very early days of Amazon, I approached Jeff Bezos about using this approach to combine movie ratings (from my movie-recommendation web site, rare.com) with his book ratings. He liked this idea but didn't pursue it.
The MPS API
The Music Preference Service provides a web-based API.
result = predict(user, item, purpose)
user identifies a user.
item identifies an item of some type (composition, recording, concert, venue, person).
purpose describes the user's proposed use of the item: to play it, to hear a streamed recording, to hear a live performance, etc.
result is the pair of the predicted rating (0..1) and the confidence in that prediction (0..1). Or it could be a confidence interval.
results = top_predict(user, type, purpose, n)Return a list of the n items of the given type for which
result = similarity(item1, item2, mode)
Returns an estimate of the 'taste similarity' between a pair of items of the same type. If the type is 'person', there are two modes.
results = top_similarity(item, mode, n)Return a list of the n items whose similarity to the given item is greatest.
The data given to MPS by an app is private to that app; it's not shared with other apps other than by its effect on predictions. The apps trust MPS to enforce this privacy. Therefore there must be an organizational 'firewall' between MPS and the apps; for example there should be no common staff.
MPS needs to link user identities across apps, but should not store data from which real-world user identities could be inferred. This could be done using hashed versions of identifiers like email address and phone number.
Ratings and predictions involving listening are complicated by the fact that context can make a big difference:
It would be overkill to ask users what mood they're in when they rate something. But it may be possible to use contextual information to make better predictions:
How can the MPS bootstrap a new composition (perhaps by an unknown composer) or a new recording? I guess it can recommend the item to random people and try to get some ratings.
Even if there are no ratings of the composition, we may know something about the composer. For example, if they've rated things themselves, we can guess that their composition is similar to the music they like.
We could use the objective attributes of the item (see below); these don't involve ratings.
We can encourage composers to create synthesized sound files of their compositions (easy if they use a score editor); this is likely to get ratings faster.
We want to avoid a situation where new composition gets a few bad ratings and then no one ever sees it again. On the other hand, if a composition is truly bad, we don't want people to waste time on it.