The Future is Here:  Query by Humming as an example of Content-Based Music Information Retrieval

Samantha Sinanan

Library Student Journal: The future is here: Query by humming as an example of content-based music information retrieval (2010)

The future is here: Query by humming as an example of content-based music information retrieval

Samantha Sinanan
MLIS Candidate, SLAIS
University of British Columbia
Vancouver, British Columbia, Canada

Library Student Journal,
April 2010

Abstract

The following article discusses the state of the art technology in a type of content-based music information retrieval known as query by humming (QbH). Systems that support QbH use complex sets of algorithms to match a sung or hummed query to a piece of music contained in the system’s database. The article discusses intentions, uses, and challenges of QbH in addition to providing a literature review, discussion of four current QbH systems and a brief commentary on the implications of QbH for library use. Of the four systems examined in detail, the first two are intended to benefit the research community while the latter are aimed at a commercial audience.

Glossary of Terms

Content-based information retrieval:: This kind of information retrieval is based on the content of a sought after item as opposed to metadata, as is the case with traditional, text-based searching (Birmingham, Dannenberg and Pardo, 2006). In the case of music this means that a sample of a piece of sought after music is used as a query instead of the name of the piece or the composer. Content-based information retrieval can be used to retrieve several different kinds of media including music, images and video.
Dynamic Time Warping (DTW):: This is a matching technique that takes into account the difference in timing and or tempo between a query and an item sought. In QbH systems it is used to address the fact that sung/hummed queries will rarely be consistent in tempo and will rarely be sung/hummed at the same tempo as the original piece. DTW was originally used in speech recognition therapy, and accounts for timing differences though a complex set of algorithms (Kosugi, Morimoto and Sakurai, 2004).
Hidden Markov Models (HMM):: This is a type of search algorithm that has a high tolerance for error in a sung query. HMMs allow a flawed query to “achieve a high similarity score against its target” (Birmingham, Dannenberg, Hu, Meek, Pardo, and Tzanetakis, 2007) and distinguish between two types of errors: local (momentary, isolated) and cumulative (those which follow a trend, for example getting progressively flatter). “Johnny Can’t Sing” is one type of HMM that takes both local and cumulative errors into account. HMMs use matching algorithms to pair queries and targets together (Birmingham et al., 2007).
Music Fingerprinting:: This is a type of content-based music information retrieval where the query is based on part of a sought after recording. This is a more exact technique than QbH, as elements from the actual recording are used to find the piece of music in question. These elements form a “fingerprint” from which the piece of music can be found. This only works when the query is taken from the exact piece of music being sought. Any alternate form of the piece, such as a hummed sample, will not work in this kind of system (Birmingham et al., 2006).
Musical Instrument Digital Interface (MIDI):: MIDI is a symbolic representation of musical notes. A piece of music is stored in a file which contains instructions for which notes to play, including the order, tone, length and volume (Birmingham et al., 2006). In MIDI, music notes are represented by numbers from 0 to 127. A melody is stored as a sequence of numbers (Kankanhalli and Zhu, 2003).
N-grams:: This is a method of indexing in QbH systems. It takes “N” consecutive notes and represents them as a single symbol. The number of N-grams is proportional to the length of the piece of music in question. This can result in large indexes for large music databases, which in turn will slow down query processing time (Kim, Park and You, 2008).
Query by Humming (QbH):: This is a type of content-based music information retrieval whereby music is retrieved based on queries that include part of a song, whether it be sung, hummed or whistled (Birmingham et al., 2006). The QbH system will retrieve songs similar to the sung/hummed query. Most QbH systems have built in mechanisms to deal with a certain degree of error in sung/hummed queries (Kim et al., 2008).

Introduction

Music Information Retrieval (MIR) is a growing field in a number of disciplines including Library and Information Studies. Within MIR, there are several subfields, all subjects of current research and cutting edge technology. University of Illinois scholar J. Stephen Downie has been an MIR pioneer, responsible for a large body of research in the field. This report will focus on a type of content-based music information retrieval known as Query by Humming (QbH). Content-based information retrieval uses a piece of the sought after item as a query instead of using metadata such as name or composer. This is useful when information such as creator, title or publisher is not known. Content-based information retrieval offers an alternative to traditional, semantic searching that uses text based queries. In the context of MIR, content-based retrieval systems use a piece of the music, whether it is a sample of the original recording or a hummed tune to search for the full item. QbH systems use complex algorithms and innovative technology to convert sung or hummed queries into melodies that can be searched in a music database. The name Query by Humming is slightly misleading, as queries on many such systems are not restricted to humming, but also support sung or whistled requests. This report will take an in-depth look at the state of the art of QbH, examining intentions, uses, and challenges, as well as providing a review of the current research, several examples of existing systems and a brief discussion on implications of QbH for library use.

Query by Humming: Intentions, Uses and Challenges

QbH is but one type of content-based information retrieval. With the move towards Web 2.0 and its associated interactive technologies, information retrieval as a field is experimenting with new ways to locate and retrieve various types of information. Video and image searching are also making use of content-based retrieval systems. As far as music information retrieval is concerned, QbH is poses a solution to the problem of being able to search for a piece of music without knowing anything but the tune itself. Most people can relate to the frustration associated with having an unknown tune stuck in their head. QbH systems now provide a way to search for such tunes.

There are a variety of contexts in which a QbH system is useful for music retrieval. The first and most obvious is for music retrieval in a digital music library. Additionally, Birmingham et al. (2007) suggest several commercial uses for QbH systems including searching for music on the Internet, on a portable mp3 player, or at a kiosk selling music. In fact, search engines which use QbH are now starting to emerge on the Internet. Two examples of such systems are Midomi and Sloud, which are discussed in greater detail later in the report. Kosugi et al. (2004) cite several other uses for QbH, including using it as a game or for personal challenge, using it for training purposes to test pitch, and using it to find a specific point in a song. Birmingham et al. (2006) also suggest QbH use for personal music libraries and for song retrieval in karaoke systems. Public and academic libraries with a music collection may also be interested in employing QbH systems as a service for patrons. Hence, QbH systems are useful within the music community and beyond.

While the idea of QbH is appealing and relatively easy to understand, the inner workings of these systems involve complex sets of algorithms and calculations difficult to grasp. There are a number of different techniques used in the various QbH systems discussed in the literature below. Some of these techniques include the use of hidden Markov models, dynamic time warping or N-grams. These terms are defined in ordinary language in the glossary above. A true understanding can only come with a background in this kind of research. The terminology and processes associated with these concepts are highly technical. This report seeks to discuss the basic technical nature of some of the current QbH systems.

Though each QbH system is slightly different, the end goal is the same: accurate and efficient music retrieval. Most of the literature agrees that having one or more indices that can be searched prior to entering the full database behind the system is useful for faster processing of requests. Kim, Park and You (2008) discuss a two step approach whereby a request is sent to one index before entering the full database, as well as a three step approach in which a request is sent to multiple indices before going to the database behind a given system. If the result can be returned from scanning an index, which contains just a portion of the song often called a theme or target, the retrieval time will be significantly reduced. Kim et al. cites the three step approach as the most efficient. In most QbH systems there is a back end where the majority of the computation is happening, and a user interface which relays information between the user and the system. While it is important to have an attractive and user friendly interface, the focus in most of the literature is on the back end and the calculations that take place in order to match a hummed or sung query to a piece of stored music.

The idea behind content-based information retrieval is “similarity searching” (Kim et al., 2008). In the context of QbH, this means that a system will search for similarity between a sung or hummed query and a piece of stored music. The query is transcribed into a searchable melody, often expressed in symbols such as MIDI, which is a coding system that assigns musical notes to numbers between 0 and 127 (Kankanhalli and Zhu, 2003). Once transcribed, the system can search for the song in question. The way in which sung queries and original music are matched varies depending on the QbH system. There is debate within the MIR community as to the most accurate “melody matching” (Kankanhalli and Zhu, 2003) technique.

There are quite a number of challenges that make the design of a robust QbH system difficult. The biggest challenge is how to deal with inaccurate queries. Users come from a variety of backgrounds and degrees of musical training. This, along with other factors such as familiarity with the sought after song, ambient noise, and key and time signatures of the sung query, will affect the accuracy of retrieval. The theme of built-in error calculation and other ways of dealing with inaccuracies came up repeatedly in the literature. Chew, Narayanan and Unal (2004) did a series of experiments to collect statistical data on the uncertainty factor in sung queries. They found retrieval accuracy for people with musical training to be 94% while it was only 72% for non-trained users (Chew et al., 2004). With a gap this large, accuracy of sung queries continues to be a challenge for system developers and researchers. Pardo and Shamma (2006) have devised an innovative way of dealing with the user training issue. They have developed a game called Karaoke Callout that is accessible via a downloadable application for compatible cell phones. This application turns sung query accuracy into a game, whereby a user sings up to 10 seconds of a song into his or her phone, and the performance is rated. The user can then choose to challenge a friend and send the challenge to a contact via text message (Pardo and Shamma, 2006). The rationale behind this game is to train users to improve their ability to sing queries as well as to collect data that can be used to help a QbH system deal with errors and inaccuracies. Each performance is collected and stored on the Karaoke Callout server (Pardo and Shamma, 2006).

Other challenges for QbH systems include, as mentioned above, the processing time for requests. Many researchers including Birmingham et al. (2007) and Kim et al. (2008) cite slow retrieval time as a key challenge for QbH systems. Birmingham et al. (2007) did a comparative analysis of different “melodic similarity algorithms”, and though their results were inconsistent, they did note that some forms of similarity searching were slower than others, and that slow processing time is a problem for QbH systems in general. As mentioned above, Kim et al. (2008) suggests using one or more indices to speed up processing wait times. The size of the database will obviously affect the speed with which a query can be addressed, with bigger databases taking longer to search. One last challenge is the difficulty posed by polyphonic music, or music that has harmony lines in addition to a melody line. QbH systems must be able to pick out and search for a melody line. Birmingham et al. (2006) discuss MIDI as a possible solution to this problem. Since MIDI is a series of numeric representations of notes, it can break apart a polyphonic piece of music into a number of searchable monophonic (single melody) lines (Birmingham et al., 2006).

Literature Review

In his 2003 article Music Information Retrieval, MIR expert J. Stephen Downie examines some of the challenges facing the content-based music information retrieval research community and discusses the growing importance of this field and implications for the future of music information retrieval. Downie’s discussion gives a broad overview of some of the general challenges faced by the MIR community. He gives mention to query by humming throughout. Downie has identified a number of “multifaceted challenges” that face MIR researchers. Factors mentioned throughout the literature such as pitch and duration of notes in sung queries as well as the fact that music is “multicultural, multi-experiential and multidisciplinary” all add to the difficulty in creating efficient MIR systems (Downie, 2003). Downie mentions the MELDEX system developed at the University of Waikato in New Zealand in the late 1990s as an example of how a robust and comprehensive MIR system will look in the future. MELDEX is part of the New Zealand digital library and contains a variety of searching methods including QbH. At the time of writing, MELDEX had a database containing 100,000 songs, prompting Downie to name it the “gold standard” in content-based MIR systems (Downie, 2003). In his concluding remarks, he predicts that MIR, including QbH, is only in its infancy and that new search systems may surpass the big web search engines and change the way we perceive and interact with music (Downie, 2003).

While Downie speaks of MIR in general, there is a large degree of specific research happening in the field of QbH. The last decade has seen a growing amount of scholarship, experimentation and pilot projects to test the accuracy of various matching techniques for QbH. These systems are continually being improved and adapted to deal with rapidly changing technology. As noted above, most of the researchers agree that the biggest challenge is in dealing with errors and inaccuracies in sung queries. Chew, Kuo, Narayanan, Shih and Unal (2003) and Chew et al. (2004) published two reports that deal with the gathering of statistical data to evaluate user uncertainty in hummed queries. The goal in their research was to analyze the variability between users with and without musical training in order to assess differing levels of errors that a QbH system will encounter (Chew et al., 2003). In both reports, they found a significant increase in retrieval accuracy for those with some background in music. They hope to be able to use their work to create a user-centric system that accounts for both variability in musical background and uncertainty in humming. They also plan to make their findings available to other researchers in the field so that their work may contribute to the overall improvement of current QbH systems.

In 2003, Kankanhalli and Zhu published an article detailing an innovative way to deal with the problem of hummed queries being in a different key that the sought after piece of music. A user will rarely be able to sing a query in the same key as the original piece was written. The researchers refer to this issue as key transposition. They propose a complex system whereby the key of the hummed query is calculated and the root note is extracted and placed into a music scale (most often a major or minor scale) which is then transcribed into the root note that will be used to match the query with the original piece of music (Kankanhalli and Zhu, 2003). In this way, the music scale is being used as a tool to help resolve the key transposition issue. At the time of publishing, this was a novel technique (Kankanhalli and Zhu, 2003). The proposed system uses a set of calculations to estimate the similarity between the root of the sung query and the root of the original piece of music. The authors cite improved retrieval accuracy using this method and hope that in the future their system can integrate a variety of scale types beyond the major and the minor, such as the Chinese music scale (Kankanhalli and Zhu, 2003).

Another challenge that has been addressed by current research in the field is that of slow retrieval times. As mentioned above, there appears to be a consensus around the use of single or multiple indices to cut down on processing wait times. If a song can be found from a sample in an index, results will be produced much faster than if the query has to wade through entire pieces of music in a sizable database. Kim et al. (2008) have designed an efficient system that uses a three step approach to music retrieval. Firstly, a matching algorithm converts a sung or hummed query into a melody string. This string is what will be used to search for the piece of music. In order to increase processing time, they decided to build a series of indices to be searched before entering the full database. The indices were made up of samples from the database’s most popular tunes. Their database consisted of 3000 songs (Kim et al., 2008). They performed a series of experiments, with one, two and three step methods. The one step method immediately entered the database to search for music, while the two step method scanned an index before entering the database. The three step method sent the query out to multiple indices before entering the database. The three step approach proved to be the most efficient (Kim et al., 2008). In an age where users expect instantaneous results, wait times may discourage users from returning to the system in question. The researchers’ aim in this experiment was to create an effective indexing system to reduce query processing time.

Another system that uses indexing to increase efficiency is SoundCompass, designed by Kosugi et al. to retrieve Karaoke songs. Originally released in Japan in 2000, the developers are constantly working on improvements and releasing updated versions. SoundCompass is connected to a search engine with “distributed indices” called Keren (Kosugi et al., 2004). Keren consists of one global manager and several database managers. This is in fact a four step process. A user will hum or sing a query, which will be sent to the SoundCompass server. The request will then be sent to Keren, where the global manager will distribute it among the various database managers. The global manager will merge the results before returning them to the user via the SoundCompass server (Kosugi et al., 2004). Though it sounds incredibly complicated, as Kim et al.’s results show, a three or four step method involving the use of multiple indices does in fact speed up processing times in QbH systems. The use of samples in these indices does present its own set of challenges however, namely that of which part of the song to include in the sample. The entire process of creating a robust QbH system comes with a unique set of difficulties at each level.

Birmingham et al. (2006) discusses some of the challenges inherent in designing a working QbH model, but remains positive as they believe in the utility of these systems. They compare QbH to music retrieval that uses Music Fingerprints, which are simply pieces of an original recording used to find the recording in its entirety. They argue in favour of QbH, as it allows for a greater degree of flexibility and uncertainty. QbH, though perhaps less precise than Music Fingerprinting, is more of a general search tactic that can result in greater recall. It is also less restrictive, accounting for the fact that there may be multiple versions of one song. Music Fingerprinting is limited in that it can only return results that are exact matches with the sample used in the query (Birmingham et al., 2006). Birmingham et al. echo many of the concerns of their peers, including the difficulties associated with searching for polyphonic music and slow search times. They recommend a system with searchable themes to cut down on the time it takes to process a request. They also note, however, that deciding on suitable themes can be a challenging process, pointing to multiple themes present in classical music as an example of such difficulty (Birmingham et al., 2006). Their system of choice is VocalSearch, designed at the University of Michigan and Carnegie Mellon University. In spite of the various obstacles QbH designers must overcome, Birmingham et al. argue that existing systems are fairly accurate and contain exciting possibilities for the future of music libraries, music education, and karaoke systems (Birmingham et al., 2006).

While most of the research focuses on a specific system or a matching technique of choice, there appears to be little comparative work done across systems. In 2007, a group of researchers came together to compare a number of searching algorithms using a common test bed called the MUSART project. They cited the lack of shared databases and queries as the reason why there is a gap in comparative literature. The MUSART project allowed the researchers to test five different types of searching algorithms, including N-grams and hidden Markov models, and compare results (Birmingham et al., 2007). While the details and nuances of each type of similarity searching are highly technical, a few general results can be drawn from their work. Previous research has found that retrieval accuracy is highly dependent on the quality of the sung query. The two most important factors affecting query quality are the pitch and duration of notes. The existing melodic similarity algorithms using different techniques are all highly competitive, vying for the most efficient calculation of error probability (Birmingham et al., 2007). Different algorithms will incorporate accuracy of pitch and duration in different ways. MUSART compiled two databases on which to run its tests: one comprised of Beatles songs and one of various popular and traditional songs. Two foci of the research were to study how different algorithms deal with errors in sung queries and how the size of a database affects retrieval time (Birmingham et al., 2007). In their comparison, they found that hidden Markov models, which have a high tolerance for error, outperformed the use of N-grams, a method of indexing that is directly proportional to the length of a piece of music (Birmingham et al., 2007). They found that systems employing the use of N-grams seemed to take longer processing requests than systems using other types of matching algorithms. Kim et al. (2008) also discuss some of the limitations of N-grams in their work. As far as the size of the database is concerned, Birmingham et al. found that while processing time increases as does database size, the increase is minimal. Though some of their results were inconsistent, this type of experiment is important as it allows for direct comparison of multiple types of similarity searching using a common test bed.

A new and exciting direction for QbH in recent years has been its use in mobile technology. Two of the articles studied in this report discuss the how QbH is being used through cell phones. Han, Hwang, Kim, and Rho presented a prototype of mobile music retrieval at the 15th International Conference on Multimedia in Augsburg, Germany in 2007. Their system, called Mobile Music Semantic Indexing and Content-Based Retrieval (M-MUSICS for short), allows users to search for music using QbH via their cell phone. Such a system is based on “client-server architecture”, as the phone itself does not have the processing power to handle retrieval requests (Han et al., 2007). The client accesses a user friendly interface from their phone, into which they sing a query, which is then sent to the mobile server to be processed. If the results returned are unsatisfactory, a user has the option of selecting feedback and trying again. M-MUSICS works on a system of relevance feedback, where feedback is collected from the client and returned to the server which uses an algorithm to reformulate the query and try again for the correct song (Han et al., 2007). The system will first scan an index for the result, and then go into the full database if necessary. Preliminary research and experimentation with M-MUSICS has returned satisfactory results (Han et al., 2007).

The second system using QbH with mobile technology is Karaoke Callout, mentioned in some detail in the previous section. In 2006, Pardo and Shamma, who are also involved in researching the VocalSearch system, devised a downloadable application that helps to train users to sing better queries as well as allows for the collection of data to help a QbH system deal with error probability calculation. Karaoke Callout is currently available for download free with certain cell phones from http://music.cs.northwestern.edu/karaoke/. As with M-MUSICS, this system works on client-server architecture. Pardo and Shamma have used social networking to achieve their dual purposes of training users and gathering data for error calculation. Karaoke Callout is an interactive game that allows cell phone users to challenge friends in their contact lists. They also collect data on the locations of their users, hoping that in the future, they can create challenges by location, and encourage users to become the best singer in their neighbourhood (Pardo and Shamma, 2006). This is an innovative approach to some of the challenges posed by current QbH systems, and Pardo and Shamma have done well at targeting the interactive nature of technology use among potential clients.

As much of the current literature suggests, researchers seem to be in agreement as to the biggest challenges that stand in the way of creating robust QbH systems. There is much competition and some disagreement as to the best and most efficient melody matching algorithms and techniques, but as this is a new and rapidly evolving field, there is still much work to be done and future research will likely continue to tackle error calculation and query processing speed.

Review of Current QbH Systems

Without delving too deeply into technical details, this section will examine the following four QbH systems: VocalSearch, M-MUSICS, Midomi and Sloud.

VocalSearch

VocalSearch is a QbH system that was developed by researchers at the University of Michigan and Carnegie Mellon University with assistance from the National Science Foundation (Birmingham et al., 2006). Birmingham et al. chose this system to study for its relative simplicity in design and user interface, as well as its performance record (Birmingham et al., 2006). VocalSearch contains a database comprised of Beatles songs, and has themes (samples) from songs as a starting point for each search. As with other indexed systems, this allows for shorter query processing time. Each song has a theme from the verse and a theme from the chorus (Birmingham et al., 2006). In order to match the sung query with songs in the database, VocalSearch uses a probabilistic sting alignment algorithm to transcribe the query into musical notation and compare it to a number of themes in the database. The theme that is determined to be the most similar will be chosen as the best match (Birmingham et al., 2006). VocalSearch has a built in mechanism for calculating error probability. The system is “trained” on a set of sung melodies for each song in the database. Integrating the errors in these sung melodies helps to train the system on how to take into account potential future errors. Sung melodies are transcribed into musical notation and are recorded against the original pieces in order to build an error probability model (Birmingham et al., 2006). Birmingham et al. note that VocalSearch has enjoyed relative success in comparison with other current QbH systems.

Figure 1. Query processing in VocalSearch (Birmingham et al., 2006)

M-MUSICS

As mentioned briefly above, the Mobile Music Semantic Indexing and Content Based Retrieval (M-MUSICS) system has taken MIR technology to the next level, in creating a prototype for QbH searching that can be used via a cell phone. As cell phones are now often used to access the Internet, information retrieval systems must adapt accordingly to stay on the cutting edge of search technology and deliver effective and convenient services to their users. M-MUSICS was developed by a group of researchers from Korea University and Ajou University. They identified a niche in searching and cell phone technology and developed their system accordingly. As the phone itself is not able to process QbH requests, the information is sent to the client’s cell phone server for processing. Han et al. (2007) sought to provide a user friendly interface to enable searching in a wireless environment. Built on “client-server architecture”, the client side is responsible for interacting with the user, through a friendly interface, and the server side is responsible for the back end computation and processing of the query (Han et al., 2007). Users can enter a query not only by using QbH, but also by using other search modes known as query by example (QBE) and query by musical notation (QBMN) (Han et al., 2007). When a user enters a query, it is then transcribed into a musical notation string expressed in symbols. The first step involves scanning of an index for a possible match. If this fails, the database is then searched. A list of ranked results is returned to the user’s phone via the M-MUSICS server. If none of the results are satisfactory, the user has the option of selecting feedback, which uses an algorithm to re-formulate the query and try again for the right match. This process can be repeated until the correct piece of music is found (Han et al., 2007). Han et al. found their experiment returned satisfactory results, and are hopeful for system improvements as cell phone and MIR technologies continue to evolve. A sample of the M-MUSICS system can be seen in the Figure 2 below.

Figure 2. The M-MUSICS System (Han et al., 2007)

Midomi

Midomi (http://www.midomi.com/) is an example of a commercial search engine that uses QbH. Developed by the Melodis Corporation, it is only briefly mentioned in the literature, as it was developed for commercial rather than research purposes. In the true spirit of Web 2.0, Midomi web and the new Midomi Mobile encourage collaboration and user contribution. The Midomi web search engine allows users to search for popular tunes using QbH and also to perform their own renditions of favourite songs to add to Midomi’s database (Midomi — About, 2008). Users are encouraged to sign up for a free account to store their recorded music, share music with others and communicate with friends through instant messaging.

Users also have the option of using QbH to search for music for sale in Midomi’s virtual store (Midomi — About, 2008). Modeled after social networking sites such as Facebook, Midomi integrates the ability to search for music using QbH with the ability to be a star and record music as well as communicate with other users and purchase music for sale. According to Melodis’ website, Midomi, which is available in ten languages, was named one of the top global innovations of 2007 by Popular Science magazine (Melodis — Products, 2008). Melodis claims that Midomi is the world’s largest searchable music database (Melodis — Products, 2008). Midomi mobile is an extension of Midomi web, and much like the M-MUSICS prototype, allows users to search for music via their cell phones. Queries are entered by singing, humming or playing samples of a recording, and Midomi mobile will return results, giving users the option to buy music or view relevant videos on YouTube. Midomi mobile also provides links to further information about individual musicians and bands (Melodis — Products, 2008). It is not known what type of similarity searching or matching algorithms Midomi and Midomi mobile use, however, it can be inferred that both speed and accuracy are important in web and mobile environments, so it is likely that Melodis is working hard to utilize the latest developments in QbH technology.

Sloud

Sloud Query by Humming (http://www.sloud.com/) is one of several products developed by the Sloud (Search out Loud) company. Much the same as Midomi, Sloud QbH allows users to search for a piece of music by humming or singing, using an online search interface. Compared to Midomi, however, Sloud does not appear to have a commercial purpose at this time. An examination of Sloud’s website indicates a heavier focus on MIR than Midomi, which is as much about social networking and commercial interest as it is about music retrieval. Sloud uses the ActiveX applet to enable the web browser to accept sung or hummed queries. Sloud QbH is presently only available for use with Internet Explorer (Sloud, 2009). According to their website, Sloud uses “fuzzy search algorithms” to allow for a certain degree of error in sung queries (Sloud — Query by Humming, 2009). They liken the successful retrieval of search results to a text based system, whereby a number of common spelling mistakes are tolerated, however, the system will ultimately search for what the user types, not what they meant to type (Sloud — Query by Humming, 2009). The Sloud interface is user friendly, with indicators that pick up sound, and let the user know if they are not singing loudly enough. It provides some useful tips for singing accurate queries and notes that someone with musical ability is likely to meet with greater success than an untrained user. There is an option for Advanced searching, which allows the user to tell the system to either regard or disregard the rhythm of the sung query, as well as limiting results by a number of factors including composer or band (Sloud — Query by Humming, 2009).

As with M-MUSICS and Karaoke Callout, Sloud works on a client-server system, including a user interface which recognizes and indexes sung notes, and a database which contains stored music (Gordeyev, Lobaryev, and Sokolov). The search system operates on a two-step process whereby an index is scanned before the full database is searched. Queries are converted and matched either in the index or the database, and results are ranked and delivered back to the user via the search interface (Gordeyev et al.). The indicator that picks up sound on the interface also gives the user visual feedback, as it operates in real time on a system that uses colour to assess pitch and show the user how well they are singing. This immediate feedback allows the user to correct or improve their singing before the query is complete (Gordeyev et al.). Gordeyev et al. from Sloud Inc. conducted a series of experiments comparing Sloud QbH to some of the more than twenty experimental QbH currently in existence. Their results showed the Sloud’s performance to be among the best (Gordeyev et al.). As the experiment was performed by three employees of Sloud Inc., the results should be taken with a grain of salt. The use of ActiveX technology and visual feedback do, however, work in Sloud’s favour.

Implications of QbH for Library Use

QbH presents exciting opportunities for music libraries, or for any type of library that houses a music collection. As digital libraries become more commonplace, technologies such as QbH will complement other content-based information retrieval systems in improving and expanding the ways in which patrons can formulate queries and perform searches. Having a variety of search techniques including semantic searching and content-based searching shows a willingness to recognize the fact that patrons express their information needs in a variety of ways. Though QbH has the potential to revolutionize the way people search for music, such systems will likely never replace the human element of a librarian. Rather than a replacement, QbH and other new ways of searching can be seen as valuable tools to help librarians better serve patron needs.

QbH will likely become part of the larger movement in the information profession to bring library services to the user. The move towards content-based searching, not only in music information retrieval but also in video and image retrieval is but one instance in which new technologies allow users to search for information in non-traditional ways. Mobile technology is an increasingly popular means of bringing information to the user. WorldCat Mobile is an example of how library services are being marketed to patrons via a mobile interface. QbH systems such as M-MUSICS, Karaoke Callout and Midomi Mobile have taken advantage of mobile web browsing to offer their services. As mobile searching is a relatively new phenomenon, it is likely that future research in QbH will involve this method of engaging information seekers. Libraries offering QbH systems would do well to make such services available to patrons in a mobile environment.

Libraries have embraced a multitude of Web 2.0 technologies and services including social networking utilities such as Facebook and Twitter to reach their users. In order to make QbH palatable and user friendly, complex systems designed for libraries and research communities might like to model themselves after commercial entities such as Midomi and Midomi Mobile. As discussed above, Midomi is not just about searching for music, but encourages users to participate in a number of ways including recording and uploading their own music. The interactive nature of Web 2.0 is wildly popular, as is evident by the millions who use social software on a regular basis. Libraries and library services have to compete with commercial outfits to capture the loyalty of their user base. QbH systems aimed at a library users will likely benefit from mimicking commercial enterprises in their attempts to reach information seekers and deliver quality products and services in an electronic environment.

Concluding Thoughts

Research in the field of query by humming is still relatively new, however it is rapidly evolving. In many ways the future is here, and new QbH systems allow users to enter the next level of searching, rendering text-based information retrieval old fashioned. The research seems to indicate that improved web user interfaces and the ability to use wireless and mobile technology to search for music are the direction for future QbH systems. Midomi is a good example of what the future of MIR may look like. Both Midomi and Midomi Mobile integrate several different types of technology resulting in a product that not only retrieves music, but also encourages user input, contribution and communication. Libraries that house music collections will likely want to take advantage of non-traditional ways of searching and employ systems that use QbH to deliver the highest quality of service to their patrons. Perhaps search systems of the future will not be limited to a specific type of media, such as music, but will allow users to search for music, video, images and text simultaneously using a combination of content-based and text-based search techniques. With large web search engines like in existence today, such systems are not difficult to envision. The ongoing challenges for researchers in the field will be to improve matching algorithms and similarity searching in order to reduce query processing times and design systems that allow for the greatest degree of error recognition possible. Following the example set by Karaoke Callout, researchers may also want to find more innovative ways of training users to improve their sung or hummed queries. A few short years ago, no one would have been able to imagine a search system that retrieved a piece of music based on a hummed query. In the coming years, we will likely be amazed at advancements made in content-based music information retrieval. As J. Stephen Downie predicted in his 2003 article, MIR systems of the future may outshine the big web search engines and will almost certainly change the way we interact with music (Downie, 2003).

References

Birmingham, W., Dannenberg, R., and Pardo, B. (2006). Query by humming with the VocalSearch system. Communications of the ACM, 49(8), 49-52. doi: http://doi.acm.org/10.1145/1145287.1145313

Birmingham, W. P., Dannenberg, R. B., Hu, N., Meek, C., Pardo, B., and Tzanetakis, G. (2007). A comparative evaluation of search techniques for query by humming using the MUSART testbed. Journal of the American Society for Information Science & Technology, 58(5), 687-701. doi: 10.1002/asi.20532

Chew, E., Kuo, C. C., Narayanan, S. S., Shih, H. H. and Unal, E. (2003). Creating data resources for designing user-centric frontends for query by humming systems. Proceedings of the 5th ACM SIGMM international Workshop on Multimedia information Retrieval, 116-121. doi: http://doi.acm.org/10.1145/973264.973284

Chew, E., Narayanan, S. S., and Unal, E. (2004). A statistical approach to retrieval under user-dependent uncertainty in query-by-humming systems. Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 113-118. doi: http://doi.acm.org/10.1145/1026711.1026731

Downie, J. S. (2003). Music information retrieval. Annual Review of Information and Science Technology, 37(1), 295-340. doi: http://dx.doi.org/10.1002/aris.1440370108

Gordeyev, A., Lobaryev, V., and Sokolov, G. (2005-2009). Sloud query-by-humming search music engine. Retrieved April 16, 2009 from http://www.sloud.com/download/Sloud_QBH_Search_Music.pdf

Han, B., Hwang, E., Kim, M., and Rho, S. (2007). M-MUSICS: mobile content-based music retrieval system. Proceedings of the 15th international Conference on Multimedia, 469-470. doi: http://doi.acm.org/10.1145/1291233.1291345

Kankanhalli, M., and Zhu, Y. (2003). Music scale modeling for melody matching. Proceedings of the Eleventh ACM international Conference on Multimedia, 359-362. doi: http://doi.acm.org/10.1145/957013.957091

Kim, I., Park, S., and You, J. (2008). An efficient frequency melody indexing method to improve the performance of query by humming systems. Journal of Information Science, 34(6), 777-798. doi: 10.1177/0165551507087712

Kosugi, N., Morimoto, M. and Sakurai, Y. (2004). SoundCompass: a practical query-by-humming system; normalization of scalable and shiftable time-series data and effective subsequence generation. Proceedings of the 2004 ACM SIGMOD international Conference on Management of Data, 881-886. doi: http://doi.acm.org/10.1145/1007568.1007677

Melodis — Break the Silence — Products — Midomi. (2008). Retrieved April 16, 2009 from http://www.melodis.com/products/midomi

Melodis — Break the Silence — Products — Midomi Mobile. (2008). Retrieved April 16, 2009 from http://www.melodis.com/products/midomi-mobile

Midomi — About. (2008). Retrieved April 16, 2009 from http://www.midomi.com/index.php?action=main.about_us

Pardo, B. and Shamma, D.A. (2006). Karaoke callout: using social and collaborative cell phone networking for new entertainment modalities and data collection. Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, 133-136. doi: http://doi.acm.org/10.1145/1178723.1178743

Sloud — Query by Humming (2009). Retrieved April 16, 2009 from http://www.sloud.com/technology/query_by_humming/

Author's Bio

Samantha Sinanan is a second year MLIS student at the School of Library, Archival and Information Studies at the University of British Columbia in Vancouver, B.C. Samantha has a keen interest in new information retrieval technologies using interactive and user based approaches. She is encouraged by new methods and modes of instruction and is interested in researching innovative and effective ways of engaging information seekers.

Go to Top

Abstract
Glossary
Introduction
Query by Humming: Intentions, Uses and Challenges
Literature Review
Review of Current QbH Systems
Implications of QbH for Library Use
Concluding Thoughts
References
Author's Bio

^{international · peer reviewed · open access}