National Public Library statistics: a literature and methodology review, 1999-2009
Public libraries gather a large number of quantitative data yearly. Some statistics are reported to governing agencies, such as the Texas State Library, or as required for membership in regional library systems, but much goes under-analyzed, or even un-analyzed. This research will examine the literature and methodologies used regarding national public library data. With awareness of national public library data storehouses, basic statistical analysis know-how, and the multiple methods with which these data can be used, public libraries are in a better place to identify gaps, analyze their impact, and focus future marketing, advocacy, and administrative efforts.
Public libraries gather a large amount of quantitative data yearly. Some statistics are reported to governing agencies, such as the Texas State Library, or as required for membership in regional library systems, but much goes under-analyzed, or even un-analyzed. This research will examine the literature and methodologies used regarding national public library data.
Librarianship is both a qualitative and quantitative field. With emerging concepts such as outcome assessment and evidence-based practice, the discipline is showing an inclination towards the quantitative (Dilevko, 2007). Libraries gather data, but then how are the data used? Little statistical analysis is done at the local level. Most data are passed on to national storehouses of statistics. State and national agencies file them after occasionally publishing summary reports, or they make the data available electronically on websites in data storehouses. Sometimes those storehouses are then used by various ranking indexes to generate lists of the supposed best libraries in various categories.
Attitudes have changed since the 1960s when an American Library Association “Library Statistics” handbook urged libraries not to gather data about patron registrations because they “bear little relation to library use” (Lear, 2006, p. 476). Though there are many similarities between the attributes gathered and ranking by the various national data stores and ratings systems, there does not exist a comprehensive list. In 2006, on behalf of the Public Library Administration, Richard Boss made a list of “required data elements” to be collected by public libraries at a minimum. At sixty-nine items, it includes data points in population and registered borrowers, budget, facilities, collections, and usage. This set of recommendations also includes tips on collecting, manipulating, and processing the data. The extent to which public libraries follow these recommendations is not known.
Also not well known is what public libraries then do with what information they do collect and report (Liu & Zweizig, 2000; Liu, 2001). Although there exist various handbooks and textbooks on inferential and descriptive statistics for libraries, there are few published empirical studies since 2000 on the analysis done with the data reported to state and national agencies. This review will examine the literature about public libraries and statistics, what is collected, and how that information is used.
A combination of various permutations of the following keywords was used to locate research: library, libraries, statistics, quantitative, qualitative, research, science, public, evaluation, measurement, analysis, assessment, services, survey, standards, performance, and metrics. Included are multiple articles about statistical techniques, usage, and ratings systems, dating from 1999-2009. Databases utilized include LISTA (Library, Information Science & Technology Abstracts with full text), Library Literature, Academic Search Premier, and ERIC.
Where are the data?
There are two principal national public library data clearinghouses (Davis, 2008; Lynch, 1999), the Public Library Data Service report (PLDS) and the National Center for Education Statistics (NCES) public library data compilation. PLDS is published by the Public Library Association and is an annual compilation of voluntary self-reported data from over 800 public libraries nationwide. In 2010, the Advisory Board for this project will revise how they collect the data, and will mail survey instructions to over 9,000 public libraries (S.G. Waxter, personal communication, October 19, 2009). The PLDS data are available through an annual print report, or through an online database. For a fee, the survey coordinator will prepare a customized report with consultation and data analysis.
The Institute of Museum and Library Services (IMLS) took over the NCES public library data compilation in 2007 (Davis, 2008). This nationwide system collects from all public libraries. Formerly called the Federal-State Cooperative System for Public Data (FSCS), some think it is the best data collection effort available today (Liu & Zweizig, 2001; Molyneux, 2005). This federal data collection system started as a Department of Education effort in 1988.
How do the data differ?
As stated above, the PLDS report currently includes around 800 public libraries. In 2007, this accounted for only 9.6% of the total amount of national public libraries; in 2008, the response rate was 9.2% (Davis, 2008). This is merely a small portion of the total amount of national public libraries, which makes it more difficult to find all of one’s peer institutions, do worthwhile comparisons, and identify trends. In past years, the PLDS report was not presented to all public libraries nationwide to complete. In 2010, the first attempt to reach over 9,000 public libraries will take place.
A benefit, however, is that the PLDS report is issued far ahead of the IMLS data (Davis, 2008), about six months after the data is gathered. IMLS can take up to two years (Liu & Zweizig, 2001). The relatively fast turnaround of the PLDS allows for quick initial comparison, even though the number of participants is fairly small. Getting data published quickly is imperative to making valuable analysis; as Molyneux put it, “We need now data” (2005, p. 11).
The small response rate is not the only statistical issue that the PLDS report has. Because it is a voluntary response survey, this makes the data set a non-random sample (Molyneux, 2005). Without randomness, the sample is biased. Davis (2008) showed that the bias is toward large public libraries (those serving over 100,000), and because of this, the PLDS report is not as reliable a source for smaller public libraries. IMLS data report a response rate of over 97% (Liu & Zweizig, 2001), which makes its data, although tardy, more statistically significant.
From time to time there are differences between the data figures reported by PLDS and IMLS. One noted discrepancy is budgeting and expenditures. Sometimes public libraries report budgetary planning figures, and occasionally actual expenditures. Standardization of input and output measures would make identifying true peer libraries through these national storehouses easier (Spindler, 2009). Lyons (2008a) also pointed out that attribute definition differences, reporting errors, and sampling issues may lead to inaccurate figures in IMLS data. For example, one library might consider circulation to include renewals of library materials, and another might not include renewals in its count.
How could data be more effectively analyzed?
Although some feel that librarianship has few people doing pure scientific data analysis (Molyneux, 2005), there are various types of basic statistical analysis that could be done on public library data garnered from national storehouses at the local level. Cross-tab, chi-square tests, correlations, and variance testing (Byrne, 2007) are some methods that would provide descriptive and inferential conclusions about the data. It is important to apply the appropriate technique and to be able to interpret results properly in order to draw correct inferences about the data and not inflict any undo correlation errors. Librarians should also be able to determine if their data hold “significance,” which is a scientific manner of saying something is “unlikely to have occurred by chance” (Byrne, 2007, p. 47).
Benchmarking analysis (Spindler, 2009), ratio metrics (Spindler, 2009), and differentiating input and output measures (Lyons & Kaske, 2008) are a few of the simple statistical methods that could be employed by public libraries to analyze local and national data. Liu and Zweizig (2001) reported even more basic methods: per capita figures, percentage analysis, and averages. These are minimal statistical processes of which public libraries should gain knowledge.
There are software packages available that can aid in analysis (Byrne, 2007). SirsiDynix unveiled its Normative Data Project in 2005. It is a subscription database that combines Census data with IMLS data attributes for deeper analysis (Imholz & Arns, 2007; Molyneux, 2005). It is similar to Baker & Taylor’s Bibliostat Connect product, except that Bibliostat Connect combines PLDS data with Census data in a similar platform (Molyneux, 2005).
How can the analyzed data be used?
Multiple methods of data usage, aside from satisfying governmental bodies, are reflected in numerous articles, but a few trends emerge. Peer comparison is the most common use (Dilevko, 2007; Lance & Lyons, 2008; Liu, 2001; Liu & Zweizig, 2000; Molyneux, 2005). Finding libraries of similar size, portraying the supposed status of a library against peers, and bragging rights are manifestations of these comparisons.
There are many other practical uses for public library data. Management, budgetary forecasts and defenses (Molyneux, 2005), and marketing and advocacy decisions (Lyons, 2008a, 2008b) are but a few. These can be done on a local, state, and national level. Staff size comparisons, ratio of professional staff to paraprofessional staff, and personnel policy contrasts can also be compiled from data collected. Planning for services improvement is another potential use for public library data. Circulation output measures, reference desk statistics, customer assists, and other indicators can show public libraries where their strengths and weaknesses may lie.
Public library data also provide intrinsic and historical value. The data can show the impact of major local events on a community, such as the impact of economic changes or immigration on usage statistics (Lear, 2006). They also provide administrators and directors with an inherent understanding of their communities, operations, and services. Classes attended, programs preferred, and materials circulated all indicate to community leaders what their constituents are interested in and value. National library data also provide support for funding requests by public libraries.
Statistical analysis can lead to trend identification and future planning (Dilevko, 2007; Lear, 2006; Lyons, 2008a; Molyneux, 2005; Spindler, 2009). Evidence-based decision making is also an evolving movement. By studying changes over time, public libraries can make more accurate predictions for the future.
Liu and Zweizig conducted a survey in 1999 of public library directors that yielded insight into whether or not public library directors used national data storehouses and, if so, how. Their study is interesting because it indicates the least used methods for utilizing public library storehouse data. These include program planning, interlibrary loan improvement, technical services improvement, and salary negotiating. Their findings are also in alignment with other literature that indicate peer comparison and future planning as high methods of use (Liu & Zweizig, 2001).
What about national ranking systems?
The idea of using public library data to determine quality or value is noted (Dilevko, 2007), and national ratings are one way of comparing libraries against each other in these measures. These should be judged critically, however, because although ratings may be designed with statistically sound methods, the attributes used in comparison can be subjective; in other words, those that are valuable to one library may not be of value to another. Also, national systems provide statistically significant data for analysis that supposedly represent the totality of public libraries nationwide.
Rating systems can be used for libraries to understand their operations and services, and they can provide guidance in decision making, but they do not inherently define a library’s excellence or uniqueness (Lance & Lyons, 2008; Lyons & Kaske, 2008). As Lyons noted in his 2008 primer on national ratings, each set of ratings “cannot satisfy all possible viewpoints…on quality” (Lyons, 2008a, ¶ 3). Further, rankings could be a waste of comparison, because libraries are so distinct in their communities, and their values differ on a wide scale, so perhaps the criteria for comparison should be done on a local level (Lance & Cox, 2000).
For years, the most recognized national public library rating index was the Hennen’s American Public Library Ratings (HAPLR) (Lyons, 2008b). Starting with analysis of 1996 data from NCES, Thomas Hennen has compared input and output variables (such as operating expenditures, number of volumes, and reference transactions) against each other, which are then assigned weights, which result in a total score. Libraries with the top 100 scores are rated as the best in their class. Hennen continues this system annually, now using the data from the IMLS storehouse.
One way to improve a rating system is to narrow the focus of analysis. In 2008, Lance and Lyons proposed a new rating system to counteract perceived statistical analysis errors in the HAPLR, including biased weighting and informal surveys. They also claimed Hennen makes use of extraneous statistics that display little to no correlation to each other, such as circulation-per-staff-hour and circulation-per-visit (Lance & Cox, 2000; Lyons, 2008b; Lyons & Kaske, 2008). By using only four per-capita output measures (library visits, circulation, program attendance, public Internet computer use), their “LJ Index” (named for Library Journal, which supports the new system) eliminates irregular comparison on expenditures and staffing that can thwart results (Lance & Lyons, 2008). In a 2009 letter to the editor of Library Journal, Hennen defended his ranking system, saying that weighting variables is necessary, input measures are useful in comparative study, and the scoring system of the newer LJ Index is not clear. He concluded with a concession that having two major ranking systems is good in that it will further the search for a better system.
In studying the use of public library statistics, there are different methodologies used by researchers. Each of the studies in this review used at least one of the following research methodologies: survey analysis, comparative study, content analysis, audit, theory, or historical documentation. Some articles published on the topic of national public library statistics, and on statistical analysis for libraries, do not contain any experimental research. Several reports are expansions in the field, some are recommendations on future practices, and a few are explanations of the current state of things.
Yan Liu (2001), occasionally in collaboration with Douglas Zweizig (2000, 2001), has done experimental research on the use of national public library statistics by directors of public libraries. Their research was conducted through analysis of surveys given to directors. After receiving responses, they derived a random sample on which to conduct statistical tests. The aspects they explored include date of last use of national statistics, ways of use, frequency of using certain attributes, satisfaction with statistics, and statistics other than PLDS and NCES data used (Liu & Zweizig, 2001). To further their research done on American public library directors, Liu then used those responses in a comparative study with Chinese library directors.
In examining national ranking systems, Ray Lyons (2008b) conducted a content analysis of the Hennen Annual Public Library Ratings (HAPLR) system. Lyons deconstructed the statistical methods used in the system, reviewed criticisms, and investigated trends in national public library data. His multiple articles (Lance & Lyons, 2008; Lyons, 2008a, 2008b; Lyons & Kaske, 2008) offer a detailed assessment of the HAPLR, and deconstruct the methods used by Hennen in his approach to public library rankings. This thorough analysis has been published in multiple journals and has provided public librarians with details on how to approach such rankings skeptically and with caution.
In 2007, Juris Dilevko published a study exploring how inferential statistics were used in library science research articles. His audit, or review of processes, provides a background for the argument that librarians need to learn about inferential statistics in order to understand performance measures and analyze data. Byrne’s 2007 article, a primer of inferential and descriptive statistical methods, gives a basic starting point for librarians wishing to learn more about data analysis. Her thorough but simple introduction clearly explains commonly used techniques. With a goal of improving the process of data collection and data analysis, the above studies could be seen as a form of action research.
Some theories have been developed regarding national public library data. Through an examination of the people involved in public library data collection and dissemination, Bob Molyneux (2005) established that knowledge of data analysis is not sufficient for public librarians to make operational decisions. Molyneux proposed new methods of data usage, and explained novel ways to encourage the creation of library data analysts. He suggested welcoming the help of private firms and getting trained faculty in library schools to “fan embers” of interest in statistics. Imholz and Arns agreed in their 2007 piece about the growing field of library evaluation and successfully cited examples of library valuation studies to encourage further research in analytical tools borrowed from the economics field. These broad reports acknowledge, though, that their findings may not be immediately applied.
Another form of statistical use is historical documentation. Both Sumison (2001) and Lear (2006) gave a historical treatise on the use of public library data, although for different countries and with slightly different emphases. Their balanced reports help explain events leading up to the current state of statistical use and present findings for the future. Lynch’s 1999 treatise on where to find statistical data for all types of libraries is brief, informative, and straightforward.
Although there are many more types of research methods, those that materialize most often in the discussion of use of national public library statistical data appear to be historical summaries and audits (see Figure 1). The evolving use of mixed methods research could be a boon to the constant use of historical summaries and audits by providing a manner in which to combine the historical summaries with audits, or combine one or the other with other methods. For example, a researcher could conduct a history of public library data use by directors every five to ten years to show trends or cause-and-effect relationships between economic conditions and use of the data.
With the wealth of statistical information available, it also seems that there is room for much more empirical research in the form of surveys and usability testing. A cohort study could measure usage of PLDS data after training in the online version of the report. Case studies could also be used to determine individual libraries’ use of data, or library school directors’ opinions of statistical instruction in their institution.
Library science is not unique in its query into statistical use. The fields of education, business, and medicine also gather large quantities of data and have varying uses for this information (see Figure 2). Although not the only disciplines that have similar research, the fields analyzed are comparable in their national data gathering, evaluation techniques, and possible statistical analysis and inferential errors.
The field of education, for example, also has a federal agency that collects national statistics. The United States Department of Education’s Institute of Education Sciences created the National Center for Education Statistics (NCES) as the “primary federal entity for collecting and analyzing data related to education” (NCES, 2009, ¶ 2). This agency provides national rankings, such as the “Nation’s Report Card,” and analysis tools for peer comparison and evaluation. There is also a push toward standardizing terminology and the collection of the data. The agency also provides an online data analysis tool called the Data Analysis System, for public access to the data. Occasionally these data are used for ranking purposes, as Borden did in his comparison of community college enrollment growth in 2008. At other times, the national data are used for correlation studies, such as the connections between public library programs and early literacy success (Lance & Marks, 2008).
Business, and especially its subfield of management, places analogous value on its data. Business data are gathered for annual reports and used extensively in process evaluation, identifying best practices, benchmarking, and forecasting. Although further advanced in their application of statistical analysis tools, these purposes are identical to library purposes for data usage. There are numerous research articles that involve use of national business statistical data, for example, from the U.S. Bureau of Labor Statistics, for such purposes, including a recent article in the March 2009 Journal of Public Health Management and Practice that discussed illustrating the public health workforce by using such data. The authors mentioned the lack of definition standards and full data sets, which is similar to the current situation with national public library data.
As in library science, there are books and research dedicated to the interpretation and use of public health statistics. Examples include Bailar’s Medical Uses of Statistics and the entire journal Biostatistics, published by the Oxford University Press. There are also national storehouses of medical data, such as the United States Department of Health and Human Services’ Integrated Data Repository, and the Center for Disease Control and Prevention’s National Center for Health Statistics. The users of these data sets are not confined to directors of institutions but include the public as well. Comparisons between states and countries are possible with some of the data, and the information garnered through evaluation helps to channel policies and planning.
Avenues for additional research.
There are many implications for future research given the limited amount of recent literature on national public library statistics usage. Valuation, response rates, and ranking systems are specific propositions. Statistical analysis would provide a good basis for the growing field of valuation studies. In their article from 2007, Imholz and Arns mentioned that research in valuation of public libraries benefits from analytical tools. They acknowledged the evolution of library technology and that, with education, libraries could analyze their own data.
Also of interest is the low response rate for data stores such as the PLDS. Are directors not receiving the information on how to submit statistics? Are they passing this task on to someone else and then not following up? Is the form perplexing some people? As mentioned above, efforts are being made to more clearly state the definitions and standards used in the survey (S.G. Waxter, personal communication, October 19, 2009). This is important, as others acknowledge the lack of standards in terminology (Molyneux, 2005). Another issue is the slow reporting time of IMLS data. The value of these data for usage by public libraries is high, so how can we improve the reporting time?
Another route of study would be to discern how much of a spirit of competition these national library rankings and data stores impart on public library directors. Some of the above studies mention the use of the services for comparison and to locate peers, but taken further, is there a drive to improve services and increase statistics based on the rankings? What actions could public libraries take given their placement? There is no nationally accepted standard to define the “best” library, so further investigation into an ultimate rating system is warranted (Lyons, 2008b).
What data do libraries collect locally that are not reported to national storehouses? Perhaps this includes some data that could be evaluated to get a fuller picture of their services beyond what is represented in national rating systems? This is part of a bigger picture of public library data. Although some may argue that the primary intent of national public library rankings is commercial (Lyons, 2008b), they provide valuable information for everyone, so far as those gathering, reporting, and interpreting the data recognize what they are doing and endeavor for improvement.
In 2001, Sumison recognized the criticism of library statistics for not moving along with the times. Merely relying on unanalyzed local data does not give a deep understanding of public library work. With awareness of national public library data storehouses, basic statistical analysis know-how, and the multiple methods with which these data can be used, public libraries are in a better place to identify gaps, analyze their impact, and focus future marketing, advocacy, and administrative efforts.
Boss, R. W. (2006). Rethinking library statistics in a changing environment. Retrieved from http://www.ala.org/ala/mgrps/divs/pla/plapublications/platechnotes/rethinking.pdf
Lear, B. A. (2006). Tis better to be brief than tedious: The evolution of the American public library annual report, 1876-2004. Libraries and the Cultural Record, 41(4), 462-486. doi:10.1353/lac.2006.0060
Lyons, R. (2008a). National ratings basics. Retrieved from http://www.libraryjournal.com/article/CA6566452.html
National Center for Educational Statistics (NCES). (2009). About us. Retrieved from http://nces.ed.gov/about/
Sian Brannon is a doctoral student at Texas Woman’s University, studying public libraries, management, and statistics. Her full-time job is being the North Branch Manager of the Denton Public Library. In the future she hopes to spread the love of statistics to more librarians.
Copyright, 2013 Library Student Journal | Contact