David P. Anderson
1 January 2024
Back to top
Sub-essay: A database schema for classical music Describing classical music in a general way involves various item types:
Each item has various attributes, including links to items of other types (e.g. recording to composition and performer, composition to composer, etc.). This information is called 'metadata'. Metadata for classical music is more complex than, say, for popular music. Many music platforms (e.g. Spotify) were designed for popular music, and ignore many aspects of classical music (such as multi-movement works). This limits the search mechanisms that these platforms can offer; see an essay on NPR. As far as I can tell, current music platforms (IMSLP, YouTube, Spotify, MuseScore, etc.) don't share a standard for musical metadata; each platform has its own schema. CDs and MP3 files have a provision for metadata, but it's limited and not standardized. There would be a number of advantages in standardizing classical music metadata. For example:
I propose forming an entity - the Classical Music Index (CMI) - to create and maintain a database of classical music metadata. The CMI would be an independent non-profit organization, funded by a consortium of music platforms and other music-related organizations. GoalsExpressive powerCMI would provide Web-based interfaces and APIs for querying the database. It should allow queries like Show arrangements, for piano 4 hands, of string quartets composed by 19th-century French women. ... and similarly complex queries. Such queries should run in a reasonable amount of time (seconds, not hours). Open editing
Data modelsThe relational modelIn the 'relational model' there are object types (tables), and each object has a fixed set of attributes, some of which can be references to other objects. The semantic network modelMuch of the work in metadata is based on the 'semantic network model', in which data consists of (object, predicate, subject) triples. This model was first (1960s) used in AI to represent knowledge. It was later (1990s) used as the basis for the 'semantic web'. In this context there are W3 standards for
You can convert a relational schema into a semantic network by making each attribute into a triplet. In this way, two different relational schemas can be unified, sort of, if you can agree on predicate names. And the network model is more flexible because you can add connections without changing the schema. My take: the semantic network world is populated by academics, not engineers. Academics tend to make things abstract and complex, and to describe them obtusely. We end up with lots of research papers and few usable and scalable systems. As far as I can tell this direction is dying. Web sites are full of broken links and haven't been updated in years. So I prefer to use the relational model. Initializing the databaseIMSLPCurrently the broadest and deepest source of classical music metadata is IMSLP, which has data about published composers (including obscure composers whose works are out of print) as well as contemporary unpublished composers, who can upload their scores to IMSLP and enter the metadata. The IMSLP data has some problems:
However, it's the best current data source that I know of. So I propose that CMI start with a cleaned and structured version of IMSLP data. We can then merge other existing databases into this. MusicBrainzAnother possible starting point is MusicBrainz (see below). This is a volunteer-based database of music metadata. They use a relational DB (PostgreSQL) and their schema is rich and well-documented. They have permanent IDs (MBIDs) for everything. For our purposes, MusicBrainz has a couple of shortcomings: it focuses on recordings (not scores) and on popular music, not classical. Their database can't answer the '4-hands arrangements' query above. However, assuming we start with IMSLP data, it would be good to import the MusicBrainz data, add items that aren't in IMSLP, and add MBIDs to the CMI tables where possible. DB updates and access controlCMI would offer both web interfaces and APIs for adding and modifying items. These interfaces would encourage data consistency; for example entering a person name would use auto-complete. Anyone (not just consortium members) would be able to add content to CMI. In particular, musicians would be able to create 'person' entries for themselves, composers would be able to add info on their compositions, and so on. This create some issues:
I think that (as in Wikipedia) curation is the best way to address these issues: CMI staff (or volunteers) would review and 'vet' new items and edits. Unvetted items would not be visible. Platforms and other users would interact with CMI as shown here:
This would introduce some complexity for platforms. Each platform has an existing database, whose schema presumably is a subset of CMI. They'd need
Related workMusic librariesThe Music Library Association has links to various things related to music metadata. See Music Discovery Requirements and Metadata for music resources. Much of the following is based on following those links. OntologiesThe U.S. Library of Congress has ontologies (schemas) for things like person names, media types, and music notation forms. FOAF is an RDF ontology for describing people. The Getty research institute has 'vocabularies' for things like geographical names. The Music Library Association site has:
The Music Ontology is an RDF ontology for music. Example: the class 'MusicArtist' is described as "A person or a group of people (or a computer :-) ), whose musical creative work shows sensitivity and imagination". It's an academic project, and seems to be dead. DatabasesMusicBrainzMusicBrainz is volunteer-based effort to collect and export musical metadata. It offers a GUI app called Picard that allows volunteers to add or edit metadata for CDs in a way that encourages DB consistency. Their schema is relational (they use PostgreSQL) and well-documented. The entire database is exported as JSON or PostgreSQL. There are some related projects:
MusicBrainz comes close to fulfilling the goals of CMI and MPS. But there are some gaps:
DoremusDoremus is a French academic project that is both a schema (an RDF ontology) and (I think) a database of some French archives. Its schema is detailed (you can say what specific instrument was used in a performance, and who tuned it) and correspondingly ornate; see this example. Their search interface doesn't work. YouTubeYouTube offers a Web API for querying their database of videos. You can search on title, duration etc. It lacks music-specific features. Movie/TV metadataMovie and TV metadata is related to music metadata: there are works, people (cast, crew), ratings, etc. It's interesting to look at existing data sources.
|