The "F" word: Federated search engines (2011)
Federated search engines, or discovery tools, are a popular new technology for library websites. But are they worth the trouble and money? Federated search engines take a lot of maintenance and cost thousands of dollars a year. They were created to search through multiple databases at one time, but they are not always accurate and have many flaws. Oftentimes they leave out valuable sources because they cannot accurately and equally search through all databases, and thus give skewed results. For very specialized libraries, such as medical and health science libraries this can be detrimental to the researcher who relies on finding all possible materials on a subject.
Patrons may know federated search engines as a search box on the library's homepage that claims to search like magic through everything the library owns—e-books, databases, e-journals—as well as content available on the open web. The federated search engine attempts to be the Google of your library's resources by making it possible for searchers to find resources from multiple databases with one search, instead of having to search several databases individually. The Maguire Medical Library at the Florida State University College of Medicine uses the federated search engine WebFeat. As a graduate assistant at this library, I was chosen to evaluate Webfeat and to research other federated search tools. This essay will focus mostly on my findings regarding WebFeat; however, the issues discussed apply to many, if not to most, federated search engines. If you have ever done similar research, you already know that my findings were unpleasant.
The effectiveness of federated search engines is the focus of this essay. De Groote & Appelt (2007) did a study on the effectiveness of searches via WebFeat versus direct database searches, focusing specifically on searches in the health sciences. I discovered that here at the Maguire Medical Library, four years later, WebFeat was having the same problems described in the study. I decided to conduct my own test on our WebFeat tool. I started by entering a simple term, such as "gout," in our WebFeat federated search engine box and began searching. Then I performed the same search in several of the databases individually that WebFeat was set up to search through. In this fashion I found that WebFeat retrieved less than 1% of the results of the same search performed directly in the databases. I furthered my research on federated search engines by doing similar test runs on other academic health and medical libraries' federated search engines—Velocity (Vivisimo), 360 Search (Serials Solutions), Summons (Serials Solutions), and Explorit (Deep Web technologies) (see Appendix).
As researchers have argued for the last several years, patrons have grown to expect one-box searching. Spoiled by Internet search engines like Google, people are "accustomed to Google and other Web search engines [that] tend to weigh results in favor of simplicity and ease of use rather than usefulness" (Vaughn & Callicot, 2004). Indeed, the convenience of one-box searching is familiar and would seem to make searching multiple databases much quicker, but federated search engines that make this possible are not as magical as we'd like to believe. Federated search engines (FSEs) often retrieve millions of irrelevant or unorganized results; they do not handle advanced or complicated searches well; they do not offer useful search features that can be found in many databases; and they take a lot of time and money to maintain.
Because FSEs usually search real-time, not by relevance, a patron who searches for something simple in a library's federated search engine, like "dogs," would find all of the library's books, journals, e-books, and e-journals with the word "dogs" somewhere in the text or in the metadata in no particular order of relevance. Real-time refers to the results that a federated search engine brings back being in order from those most recently added to the database they came from, to the oldest. FSEs bring back so many results because they often duplicate results from different sources, despite the incorporation of features designed to limit duplication. So, a search often results in millions of irrelevant results in no particular order.
Going back to our "dogs" search scenario, seeing this veritable onslaught of results, a patron would probably then refine her or his search by adding words using advanced search options and/or Boolean operators. But therein lies another problem: when the searcher starts complicating the search (as one might in another search engine like Google), he or she runs the risk of confusing the FSE and ending up with too few or irrelevant results. According to De Groote and Appelt, WebFeat did not successfully handle a search that included both the Boolean operators AND and OR:
When "heart attack" was entered in the first text box and the Boolean operator OR was selected before entering "myocardial infarction" and "Viagra" in the second and third text boxes of WebFeat, the search was run as "heart attack or (myocardial infarction and Viagra)" and not "(heart attack or myocardial infarction) and Viagra," resulting in highly irrelevant retrieval (2007).
Other advanced search commands, such as field searching and truncation, are rarely possible in an FSE . De Groote & Appelt's findings confirm that, for whatever reason, the WebFeat search engine either failed to search databases, or changed the search strategy being run behind the scenes "in a way that often resulted either in zero results or large and irrelevant results." This supports my findings regarding my own test run with the tool as described in the second paragraph.
Concept boxes are multiple search boxes provided in a search engine in order to search multiple phrases or words with varied qualifiers ("search this exact phrase," "author," etc.) at one time. Many databases offer advanced search options, such as unlimited concept boxes; however, FSEs don't offer these advanced search limits. This lack of limits means an increase of irrelevant results, which increases the search time rather than expediting the search. This is especially harmful to searchers in specialized libraries, such as health sciences and law libraries, which especially need timely and accurate information. The limits FSEs do have, like search field limiters (see Figure 1) can sometimes be useless and having them as options can be misleading. For instance, the "keyword" search field limiter in WebFeat—its default search field option—typically resulted in the least results. And, as I've noticed, patrons rarely change the default search option, opting instead to use "safe" settings such as Keyword or Full-Text.
In addition, most FSEs don't offer other useful features such as searching by standardized subject headings or mapping, that individual databases provide. Many databases, such as PubMed, have their own unique standardized subject headings. For instance, PubMed's subject heading for the term "heart attack" is Myocardial Infarction. If you type "heart attack" into the search box, PubMeb will automatically search for all sources the library has cataloged with the word "myocardial infarction" as well as sources with the term "heart attack" in them. In addition, the PubMed database also has automatic mapping features for showing connections to other related subject headings, because you can't expect searchers to know the entire subject heading vocabulary of any given database. WebFeat tended to perform well if the correct subject heading was used. Unfortunately, each database has its own unique set of subject headings, so WebFeat could theoretically only perform the best search for one database at a time. To find the correct subject headings for every database the FSE searches, you'd have to go into each database anyway. As De Groote and Appelt concluded: "The searcher would need to search each of the databases directly in order to perform a thorough search on [the] topic." Hmm, that's like not having an FSE at all.
Finally, after spending thousands of dollars of your library's budget on this magic tool, you, the librarian, will still have to set most of it up yourself, which can be complicated and time-consuming (it has taken me 11 months... and we still don't have connections to all of the databases we want our tool to search through!). Additionally, when inputting the databases you subscribe to into the federated search engine to search, you may find that your FSE is not compatible with all of the databases to which your library subscribes. For example, the Maguire Medical Library's FSE can only search 17 databases—a far cry from the many to which we subscribe. Furthermore, databases can change their data and search algorithms, which would require the FSE to take those changes into account for their connection to the database to continue to search it properly. And because all databases store and present data differently, it takes a lot of constant behind-the-scenes effort to make and keep results compatible—efforts that FSEs claim their fame on but which you may end up having to do because FSEs do not constantly revise their connections with all of the databases they search. Basically, unless all databases are created equal (i.e. made with the same metadata structures and search algorithms) and never changed, there is no faultless way for an FSE to search through multiple ones at once.
For the libraries to which an FSE is a necessary evil, here is a checklist I have come up with for libraries evaluating new FSEs, or reevaluating their existing FSEs:
After an FSE is implemented:
Woods (2010) points out that "issues of relevancy ranking, de-duplication, cost, statistics, and issues involving latency and connectivity remain unresolved, and the author believes they cannot be overcome by federated search as it is implemented today." Perhaps federated search engine vendors should recognize that they must start catering to each library or kind of library as a separate entity with separate information needs, rather than mass-producing products ill-suited in one way or another to all libraries (FSEs).
Although they look pretty and sounds high-tech, the FSEs may actually be harming searchers in their searches because it is a crude tool that is still in its infancy and has not been treated for all of its shortcomings. An FSE may be of more use to public libraries than academic or specialized libraries because public libraries have so much varied information that a discovery tool is actually very useful. But, the specificity and already very narrow categories used in medical and other specialized libraries make federated searching less of a necessity and more of a hindrance.
I was born and raised in South Florida (Ft. Lauderdale). I attended the Florida State University where I earned my undergraduate degree in English Literature with minors in Business and Spanish, and my Master of Science in Library Studies. I currently reside in San Diego, California.
Copyright, 2013 Library Student Journal | Contact