David P. Anderson
1 January 2024
Back to top
Musical tasteWhen you hear or play a piece of music, you react in some way: love, hate, indifference, etc. The reaction depends on - among other things - your 'taste'. It seems to me that taste in classical music has these properties:
I think of music as having a sort of multidimensional normal distribution.
Most people start off listening to 1-sigma music. As time goes by, your ear expands: unusual harmonies and sonorities start to make sense. And, after hearing warhorses for the 10th time, you may get a bit tired of them. So you start listening to more 2-sigma music. Tastes start to differentiate. This process continues. You discover 3-sigma music (by word of mouth or Internet exploration). Your taste moves into the fringes of the distribution, and becomes even more individual. Aside: one hears phrases like "curated by experts", implying that there's an "ideal" musical taste. It think this is nonsense; it's a necessary position for centralized sources like radio stations and large-venue concerts, since they can only offer one program. If we can infer the taste of a user, we provide discovery tools that are more efficient. We can show them items they'll probably like, and might not have discovered otherwise. Data sources for inferring tasteSuppose we want to infer the musical taste of an online user. What sources of data can we use? Explicit ratingsThis notion of rating is pretty standard on the web. Amazon lets you rate products and shows you the average ratings; Hotels.com shows you average ratings of hotels, and so on. I've found this to be hugely valuable. Music platforms could collect ratings of all item types: compositions, recordings, performances, composers, etc. To be useful for inferring taste, ratings need to be associated with a user. I.e. you have to be logged in to rate things. A platforms could collect anonymous ratings, but they'd be useful only for showing rating statistics (e.g. average rating), not for anything personalized. When requesting a rating, it's important to be clear about what's being rated. For example, people on Amazon sometimes give low ratings because their package arrived late, not because the item was bad. When you listen to a recording, the following are potentially different:
A music rating system could separate these. And there are other factors. Maybe you like the 1st movement but not the 2nd; maybe you like the singer but not the pianist; maybe you don't like the sound quality. It would be a challenge (probably not worth it) to separate all of these. Finer-granularity ratings have more predictive value. A rating of a performer has less value than separate ratings of several recordings by that performer. From the data science perspective, the more ratings the better. When a person attends a concert, it would be nice to get their ratings of each piece on the program. That way - for example - we could give listeners better recommendations for future concerts, and we could help performers create programs that people will like more. But there's the danger of "rating overload". Marketers have discovered the value of ratings. When I rent a car, I get an email asking me to rate the experience on various axes. This is tedious, so I ignore it. So collecting ratings has to be done judiciously. Maybe after attending a concert I get an email asking me to rate the pieces and the performers. And maybe there's an incentive. This should be studied. And we must keep in mind the problem of fake ratings. Composers might pad the ratings of their own compositions. Platforms could discourage this by 1) making it hard to create a new identity: require Recaptcha, response to an email, etc. 2) using AI to identify fake ratings. Rating scalesThere are various scales for ratings:
A system for analyzing ratings should be able to use data in any of these scales. Implicit ratingsTaste can potentially be inferred from user interactions:
Comments and reviewsSites like Amazon and Hotels.com let users write reviews of items. This is a popular feature; people like to express their opinions to the world. I find reviews quite valuable in many cases. Reviews may provide implicit ratings: Natural-language AI techniques can tell whether a review is good, bad, or indifferent. Sites may let users rate reviews; for example, Amazon has "Helpful" and "Report" buttons. Reviews, and ratings of reviews, could be used in various ways. Reviews can be used for 'social linkage' discovery features: E.g. when I read a review of an item I can look at other reviews by the same person, perhaps of items I already know about. That gives me some idea of whether I trust the reviewer.
Objective attributesWe may be able to infer properties of compositions and recordings directly from their digital descriptions:
MPS could potentially compute these attributes, and use them for both for collaborative filtering and similarity estimates.
Descriptive tagsAll Music Guide (AMG) has a scheme where albums are assigned English 'tags'. There are two classes of tags: 'Moods' (Dreamy, Dark, Airy, Soothing, ...) describe the music itself; 'Themes' (Dance Party, Street Life, Protest, ...) describe the contexts in which one might most enjoy the music. AMG's editors decide on the set of tags, and the assignment of tags to albums. A system like this would be problematic if the public could assign tags; people might have much different ideas of the meaning of a tag. AMG has discovery tools that involve tags. If you click on a tag you get a list of albums with that tag. Their 'Advanced Search' feature lets you including tags in a query that includes other attributes like genre, release date, and artist name. Tags could be used in other, less explicit, ways; for example, collaborative filtering (see below) could consider two people to have affine taste if they like music with similar sets of tags. This feature is interesting because it leverages 'experts' to (potentially) aid in personalized discovery. Other platforms use 'experts', to create (non-personalized) 'curated playlists'. The implication is that the experts have 'ideal' musical tastes that the rest of us should aspire to. Modeling and inferring individual musical tasteThe above sources of data can be used as a basis for various discovery features. Some of these are non-personalized; all users see the same thing. For example, an interface could say "people who like composition A also like B, C and D". This can be computed from ratings or usage data: find people who like A, and see what else they like. Simple personalized features are possible:
Usage-based collaborative filteringCollaborative filtering is the general idea of using data about a large population of users to make predictions about one user. A simple form of collaborative filtering is based on usage data, such as the recordings that users have listened to. Suppose a user U has listened to recordings {R1...Rn}. For recordings R not in this set, compute the number N(U,R) of pairs (Ri,U') where U' is a user who has listened both to R and to Ri ∈ {R1...Rn}. If N(U,R) is large, we know that people who listened to the same things as U also listened to R, and therefore (perhaps) U will like R. Rating-based collaborative filteringMore sophisticated algorithms are possible if we have rating data (explicit or implicit). For example:
I know about collaborative filtering because in the 1990s I used it to develop web sites that recommended movies and music. I don't think collaborative filtering is widely used today. Netflix created a big ruckus about improving algorithms, but I'm not sure they actually use them. The Music Preference ServiceMusic platforms can collect the various types of data listed above. In particular, they could collect ratings in various ways:
The preference-prediction methods listed above work better with more data. They work best if all data, from all platforms and for all users, are pooled and analyzed together. If I rate a recording of Scarbo on YouTube and a performance of Scarbo on Groupmuse, these should ideally end up in the same database, marked as ratings of the same piece by the same person. Also, collaborative filtering works best if ratings of different item types (compositions, performances etc.) are pooled and analyzed together; that way the system can exploit correlations between types. So the best way to enable personalized discovery is to establish a 'Music Preference Service' (MPS): a consortium of music platforms that pool their data.
The platforms in the consortium would give MPS all data relevant to preference: explicit and implicit ratings, comments and reviews, etc. MPS would define a way to link user identities across platforms. This requires that music platforms standardize on the types of items being rated, and the way items are identified. This would use the Classical Music Index.
BTW, in the very early days of Amazon, I proposed to Jeff Bezos that we use this approach to combine movie ratings (from my movie-recommendation web site, rare.com) with his book ratings. He liked this idea but didn't pursue it; instead he offered me a job, which I declined. The MPS APIThe Music Preference Service would provide a web-based API.
result = predict(user, item, purpose) user identifies a user. item identifies an item of some type (composition, recording, concert, venue, person). purpose describes the user's proposed use of the item: to play it, to hear a streamed recording, to hear a live performance, etc. result is the pair of the predicted rating (0..1) and the confidence in that prediction (0..1). Or it could be a confidence interval.
results = top_predict(user, type, purpose, n)Return a list of the n items of the given type for which predict(user, item, purpose) is greatest.
results = top_predict_group(users, type, purpose, n)Return a list of the n items of the given type for which the minimum (or average, or RMS) of predict(user, item, purpose) is greatest.
This can be used, for example, to recommend compositions
that a soloist and accompanist will both like,
or to find concerts that everyone in a group will like.
result = similarity(item1, item2, mode) Returns an estimate of the 'taste similarity' between a pair of items of the same type. If the type is 'person', there are two modes.
results = top_similarity(item, mode, n)Return a list of the n items whose similarity to the given item is greatest. Data privacyThe data given to MPS by a platform would be private to that platform, and not shared with other platforms other than by its effect on predictions. The platforms would trust MPS to enforce this privacy. Therefore there must be an organizational 'firewall' between MPS and the platforms; for example there should be no common staff. MPS needs to link user identities across platforms, but should not store data from which real-world user identities could be inferred. This could be done using hashed versions of identifiers like email address and phone number. Listening contextRatings and predictions involving listening are complicated by the fact that context can make a big difference:
It would be overkill to ask users what mood they're in when they rate something. But it may be possible to use contextual information to make better predictions:
Bootstrapping new itemsHow can the MPS bootstrap a new composition (perhaps by an unknown composer) or a new recording? It could recommend the item to random people and try to get some ratings. Even if there are no ratings of the composition, we may know something about the composer. For example, if they've rated things themselves, we can guess that their composition is similar to the music they like. We could use the objective attributes of the item (see above); these don't involve ratings. We can encourage composers to create synthesized sound files of their compositions (easy if they use a score editor); this is likely to get ratings faster. We want to avoid a situation where new composition gets a few bad ratings and then no one ever sees it again. On the other hand, if a composition is truly bad, we don't want people to waste time on it. |