Embedded metadata: friend or foe to our digital collections?

Abigail L. Dansiger
San José State University, School of Library & Information Science
San José, California, United States

Library Student Journal,
January 2011

Abstract

This paper examines the emerging trend of embedded metadata in regards to the successful preservation of digital collections, particularly those comprised of non-textual works. Since the long-term effects of using embedded metadata are not fully understood yet, it is either considered to be an important tool for digital preservation and the interoperability of resources or an unstable process that will only lead to data corruption. As a result, the use of embedded metadata is far from widely accepted or standardized as a best practice in the United States.

Various working groups and initiatives across professions and industries are exploring the value and reliability of embedded metadata with the goal of improving interoperability and preservation of digital collections, including the Visual Resources Association's Embedded Metadata Working Group (EMWG); Metadata Working Group (MWG); Picture Licensing Universal System (PLUS) Coalition; and the Universal Photographic Digital Imaging Guidelines (UPDIG). The professional photography community is especially noteworthy as active users and early adopters of the software and technology often utilized by libraries, archives, and museums. Additionally, embedded metadata in documents as a legal issue in the United States is also considered in juxtaposition to the specific needs of the cultural heritage community.

Information professionals working with digital collections have much to gain by following the progress of the various entities working with embedded metadata while sharing our own expertise and experiences. Through collaboration across professions, we can better understand embedded metadata and contribute to the knowledge base necessary for the survival of our digital collections.

Introduction

Metadata is all around us and has been for many years in several forms, including newspaper photo captions, library card catalogs, and online databases. However, it has only recently entered into daily discussions of copyright, preservation, and the interoperability of digital resources. With the introduction of the Dublin Core Metadata Element Set in 1995 (Caplan, 2003), the term became part of the library and information science community's vernacular. For librarians and archivists, the importance of metadata becomes practically overwhelming upon looking past the surface issues.

During this research, the author discovers several uses of embedded metadata that could be considered either welcome or troublesome. In short, embedded metadata is either being hailed as a boon to saving our collective digital heritage or feared to be a contributor of data corruption and information overload.

What Exactly is Metadata?

Metadata is often simply defined as data about data, but it plays a very complex role in the life cycle of a digital object. To avoid further confusion, the author seeks a definitive answer to the question: what is metadata? NISO (2004), the National Information Standards Organization, offers the following definition:

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. (p. 1)

Different types of metadata exist that are responsible for various tasks, including tracking provenance or history, providing descriptions for indexing and retrieval, and documenting physical characteristics to enable future migration or transfer of files. These varieties are usually referred to as preservation, administrative, technical, descriptive, or structural metadata. As such, NISO (2004) also suggests that when these subsets are combined, "metadata is key to ensuring that resources will survive and continue to be accessible into the future" (p. 2).

Metadata and Interoperability

The study and creation of metadata today is a dynamic field. It is constantly changing and evolving, and experiences growing pains as the technology and infrastructure supporting it also improves. This means that to share metadata and achieve interoperability it is necessary that common standards be agreed upon and adhered to, and that appropriate systems are created to support the harvesting, sharing, and reuse of metadata.

Libraries, archives, and museums are the institutions that serve as our cultural gatekeepers. They are also the most active users and creators of metadata. However, since libraries, archives, and museums traditionally serve unique functions as well as provide vastly different services, their needs vary. This has led each group to the tools and standards they have come to independently create and use. As Elings and Waibel (2007) point out, "The historic record shows that communities often experimented extensively with each other's specifications, only to find them wanting" (Conclusion section, para. 1).

The cultural heritage community as a whole in recent decades is starting to understand the scholarly and economic advantages of sharing resources, metadata, and access. As a result, cultural heritage institutions now aim to provide access to their collections via the Web by providing user-friendly online catalogs, joining consortiums, and creating digital exhibitions. But still, the root of the problem is how to organize and aggregate resources from various organizations for retrieval—both effectively and economically.

One example of this is the growing acceptance of Extensible Markup Language (XML) in recent years as the forerunner in data exchange formats. XML has also received recognition when it became a World Wide Web Consortium (W3C) Recommendation in 1998. As another example, many of the important and widely used metadata schemas by the library and information science profession, such as the Dublin Core Metadata Initiative's Dublin Core and the Visual Resources Association's VRA Core—both of which work with XML—have been reassessed since their creation to include both qualified and unqualified, and restricted and unrestricted versions, respectively. This is due to the fact that although ease of use for the masses was a priority in the initial implementation stages, the focus then turned to the need for more granularity and specificity to further the goal of interoperability.

Tools of the Trade

(What a Wonderful) World Wide Web

It could be argued that perhaps the greatest tool for sharing metadata electronically is the invention of the World Wide Web by Tim Berners-Lee in 1989. This was the year he submitted a paper titled Information Management: A Proposal. In this paper, he details his plans for a global hypertext system while working for CERN, the European Organization for Nuclear Research (Cailliau, 1995). Since then, we have relied on the Internet and the World Wide Web for entertainment, such as social networking, but also for our daily information gathering and communication needs.

Electronic mail (e-mail) has been around for over thirty years and has now supplanted all traditional forms of communication such as regular mail and telephone conversations, and even face-to-face meetings (Smithsonian Institute Archive Center, 2007). Electronic commerce (e-commerce) has become a crucial part of our global economy and given rise to powerful companies such as Amazon and eBay. A perfect example of electronic learning (e-learning) is the School of Library and Information Science program at San José State University of San José, California, which now operates on an all-online platform. Classes are held asynchronously in a virtual learning environment by utilizing online-based tools, including a course management system and Web conferencing software.

In terms of the overall dissemination and organization of information, the reach of is unprecedented. The work of this one entity has already generated many important conversations in the library and information science profession alone, as well as shifts in our collective culture. Recently, health care in the United States has undergone a major change with the use of electronic medical records, which are becoming the de facto standard over paper. This has raised concern over the security and privacy of patient information, but the convenience of remote access is also considered lifesaving when major displacements occur during emergencies and disasters such as Hurricane Katrina (Kozat, Vlachos, Lucchese, van Herle, & Yu, 2009).

The Web offers seemingly endless possibilities for global collaboration and sharing of information. To that end, what follows is a sampling of the promising tools available to librarians and archivists today. These tools enable successful preservation of digital resources, and achieve interoperability across institutions and collections.

Controlled Vocabularies

Controlled vocabularies effectively lead users to the correct resources by grouping all search results together under an established individual concept; additionally, the presence of "see" and "see also" references for related terms can alleviate the problems of synonyms and homonyms (Riecks, n.d.). However, controlled vocabularies can be a point of contention in the realm of information retrieval. Some argue they are not necessary to keep maintaining, since many of today's users only perform keyword, or free-text, searches with natural language.

Perhaps the most widely used controlled vocabulary is the Library of Congress Subject Headings (LCSH), which is now over one hundred years old (Stone, 2000). A leader in the recent development of specialized controlled vocabularies is the Getty Research Institute's Vocabulary Program, which has produced the Union List of Artist Names (ULAN), the Getty Thesaurus of Geographic Names (TGN), and the Art & Architecture Thesaurus (AAT). A new vocabulary, the Cultural Objects Name Authority (CONA), is currently being developed.

XML

As a way to share information electronically, XML is not a cure-all, but it is very powerful. A subset of the international open encoding standard for documents, Standard Generalized Markup Language (SGML), XML has independency from all software, which alleviates migration issues and allows it to work with many popular metadata schemas. As previously mentioned, XML has already triggered a multitude of developments in metadata schemas for libraries, archives, and museums, such as MARCXML, MODS, Dublin Core, VRA Core, and CDWA Lite.

METS

The Metadata Encoding and Transmission Standard (METS) is a standard known as a wrapper, which means different types of metadata expressed in XML can be encoded and essentially attached to another digital file. Developed by the Digital Library Federation and maintained by the Library of Congress, METS can be used to store, manage, and share a complete package of information about any type of digital object (Library of Congress, 2010).

XMP

Extensible Metadata Platform (XMP) is Adobe's answer to the call for a way to enable easy transfer of embedded metadata across platforms. As an open source, non-proprietary infrastructure, files are no longer dependent on specific software and users can create custom schemas.

An example of XMP's flexibility is the IPTC Core Schema for XMP, an artwork and image-specific metadata schema produced through Adobe's recent collaboration with the International Press Telecommunications Council (IPTC), a consortium of worldwide news entities. XMP is also gaining wider acceptance among hardware and software developers, including Apple and Microsoft (Reser & Bauman, 2009). Additionally, Creative Commons, a nonprofit organization that provides legal tools for establishing copyright, also recommends XMP to their users as the preferred format for embedding metadata within files and even offers a XMP template as part of their licensing process (Creative Commons, 2008).

RDF

Also a W3C Recommendation, Resource Description Framework (RDF) is a method to represent various Web resources in a common language so the metadata can be processed by and shared among different applications without loss of meaning (Manola & Miller, 2004).

OAI

The Open Archives Initiative (OAI) focuses on the development of standards and the sharing of digital resources through projects such as the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and the Open Archives Initiative Object Reuse and Exchange (OAI-ORE). OAI-PMH enables digital repositories to achieve interoperability by promoting the use of structured metadata that can then be easily harvested for reuse. OAI-ORE supports the aggregation of multimedia Web resources with standards that facilitate the identification and exchange of this variety of data.

OAIS Reference Model

As reported by Lavoie (2004), the Consultative Committee for Space Data Systems (CCSDS) in partnership with the International Organization for Standardization (ISO) set to work in 1990 developing formal standards to ensure the long-term preservation for the digital data produced by space missions, but discovered an even greater need for a framework to perform this task. In response, CCSDS put forth a proposal for the development of an open source, public framework that enables an archive to preserve and provide access to digital information to a designated community of users. The result of their actions is a reference model for an open archival information system, known today as the OAIS Reference Model or OAIS model.

Now also recognized as an official international standard, ISO 14721:2003, the OAIS model is the most widely used framework by digital repositories seeking to establish themselves as trustworthy. Full compliance with the OAIS model is hard to achieve. However, institutions can follow the Trusted Repositories Audit & Certification (TRAC) checklist to at least provide a self-assessment based on certain criteria, and at best receive certification as a trustworthy repository (Steinhart, Dietrich, & Green, 2009). Another option for self-assessment is the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) toolkit, developed by the Digital Curation Centre (DCC) and Digital Preservation Europe (DPE). Institutions can also acquire additional guidance through the UK National Archives' free technical format registry PRONOM, as well as JSTOR/Harvard Object Validation Environment (JHOVE) for assistance with format identification and validation.

PREMIS

Preservation Metadata: Implementation Strategies (PREMIS) began as a joint effort between the United States' Online Computer Library Center (OCLC) and Research Libraries Group (RLG) to research current developments in digital preservation, and provide a common groundwork for repositories' processes by identifying the most overarching concepts and functions of preservation metadata. Two of the main goals of PREMIS were to develop a core metadata set with a coinciding data dictionary, the purpose of the latter being the identification of "the 'core' information most repositories will need in order to preserve digital content over the long term" (Caplan, 2009, p. 16). The most recent developments of this project can be found on the PREMIS Web page with the Library of Congress.

Digital Repositories

The future of digital preservation and metadata interoperability is becoming increasingly dependent on digital repositories utilizing the aforementioned tools. Not only is it more efficient and economical to centralize various institutions' resources, but by doing so there is also a better chance that best practices and standards will emerge and as a result, reliable preservation and retrieval. HathiTrust, SHERPA Project, and Stanford University's LOCKSS are among the current leaders in developing systems and partnerships capable of establishing trustworthy digital repositories.

Embedded Metadata

Embedded metadata is emerging as a promising tool to achieve interoperability and enable successful long-term digital preservation. Supporters of embedding metadata into digital files believe it will ensure continuous access to the digital object over time since everything will be in one package that travels securely and permanently together. However, opponents of embedded metadata feel it could actually cause data corruption and lead to migration issues. Due to this disagreement, there is a lack of established standards, software applications, schemas, and file formats. These are all major obstacles preventing embedded metadata from becoming completely streamlined and standardized. As a result, embedded metadata is still far from being fully understood and accepted as a best practice of digital preservation.

Embedded Metadata and Its Uses

Intentionally embedding metadata is performed to guarantee that the information about a digital object travels safely with it at all times and does not become separated at any point. In this capacity, metadata refers to the vital technical, descriptive, administrative, and preservation information about an object that is embedded into digital files using various formats, programs, schemas, and standards. A digital file is any resource that exists in a digital format, such as a Word document, PDF, JPEG, MOV, or WAVE file, just to name a few.

Certain types of metadata are already pre-populated in some files without any additional effort by the creator or user. Much of the technical metadata (date of creation, size of file) of a Word document or PDF can be found under Properties in the File menu. Additionally, this technical information is now automatically embedded in digital image files at the point of imaging by equipment such as cameras and scanners. However, this is not the case with descriptive metadata (subjects, creator), preservation metadata (provenance or history, and copyright), or structural metadata (information about the technology used for creation or rendering). Embedding metadata can provide the means for documentation of the creation of the digital object, provide identification and description of the content, and preserve the technical details to support rendering in future environments (Arms & Fleischhauer, 2005).

The crux of the matter at hand is determining if embedded metadata, with its promise of providing a secure and streamlined package of information, will be an improvement over current practices. As of now, most common formats of text files such as DOC and PDF, as well as JPEG and TIFF files for images, can be transferred between most systems and viewed without issues, but the corresponding metadata is not always perpetually linked. Particularly with non-textual based works such as images, this metadata is typically stored in a sidecar file, which is a separate file that stores information about the digital object. Many times the software being used in conjunction with the digital object file cannot interpret the information stored in this separate file and as a result, the sidecar file's data must often be manually transferred between databases (Reser, 2009a).

Embedded Metadata in Still Images

Libraries, archives, and museums are at the forefront as both creators and users of metadata. These institutions have found a need for multiple metadata schemas to provide access to the variety of formats that are now in their collections. With the increase in importance of an online presence, many of the items that can utilize metadata are not only digital but also non-textual. In a 2007 metadata workflow survey of eighteen Research Libraries Group (RLG) partner institutions, the top three types of material described within their collections were reported to be still images, followed by textual works and then moving images/video (Smith-Yoshimura, 2010).

As part of this trend, many cultural heritage institutions are embracing the use of embedded metadata as an important tool in linking the necessary technical, administrative, and descriptive information to digital objects, particularly image files. Lui (2007) tells us "one of the most important problems cultural heritage professions face in the early twenty-first century is the long-term preservation of information in digital form" (p. 62). In regards to non-textual works such as images, this problem becomes increasingly challenging. In other fields, embedding metadata in images is receiving attention from various parties, such as digital asset managers, professional photographers, federal agencies, and lawyers. These groups have identified embedding metadata as the best way to ensure copyright protection and avoid future litigation issues or orphan works, which result when copyright holders cannot be identified.

Many different entities are currently producing and working with digital images and metadata. While their reasons for involvement may differ, these disparate groups all recognize that having metadata easily accessible is a fundamental necessity for interoperability and the survival of a digital image. This understanding is further exemplified by joint projects across professions, such as the Metadata Working Group (MWG), Picture Licensing Universal System (PLUS) Coalition, and the Universal Photographic Digital Imaging Guidelines (UPDIG). The following table (Table 1) reflects the most active participants organized by communities:

Cultural heritage organizations The Getty Research Institute, the Library of Congress, and the Visual Resources Association (VRA)
Digital camera manufacturers Canon, Nikon, Sony
Professional photographers' associations American Society of Media Photographers (ASMP), the International Press Telecommunications Council (IPTC), and the Stock Artists Alliance (SAA)
Proprietary software developers Adobe, Apple, and Microsoft
United States' government-directed standards developers American National Standards Institute (ANSI), the Federal Agencies Digitization Guidelines Initiative, and the National Information Standards Organization (NISO)
Table 1. Entities working with embedded metadata and images.

Overall, these groups demonstrate strong support for embedding metadata in images. Gilliland (2008) warns that, "in any instance in which it is critical that metadata and content coexist, it is highly recommended that the metadata become an integral part of the information object, that is, that it be 'embedded' in the object and not stored or linked elsewhere" (p. 12). Gilliland also points out that embedding is a better way to preserve and migrate metadata so it does not get lost, and as a result will ensure that the digital object continues to be accessible and intelligible over time.

The professional commercial photography community is also utilizing embedded metadata. It is used as a way to improve workflow through bulk editing features in software such as Adobe Photoshop and Bridge, and helps guarantee intellectual property rights by allowing full copyright and contact information to travel consistently with the image file. These users are an important community to look at because they are very interested in and aware of the latest developments in the technology and software that support image production. Much like the proprietary manufacturers, the end goals of many commercial photographers differ from those of the cultural heritage community. The former is ultimately looking for income and a successful career, and the latter wishes to provide long-term preservation and interoperability of historical resources. However, they do complement each other well from a research perspective, and work well together for generating workflow procedures and standards with today's technology.

Even those who strongly advocate for embedding metadata in images acknowledge problems that need to be resolved. Metadata Deluxe, the public forum for the Visual Resources Association's Embedded Metadata Working Group (EMWG), lists the following as the primary risks and concerns with using embedded metadata: the maintenance and synchronization of embedded data with external sources; data corruption of the image file itself; and accidental deletion of the metadata, resulting in an orphan work (Reser, 2009b). An example of the latter is the problem discovered with the Save for Web setting with Adobe Photoshop CS2 and earlier, in which embedded copyright metadata was being stripped out during the transfer of files between systems (Pro-Imaging, n.d.).

The DAM Forum, a blog and discussion list run by Peter Krogh, a contributor to UPDIG and author of The DAM Book: Digital Asset Management for Photographers, is an up-to-date source of what tools digital photographers are using and offers valuable troubleshooting advice. For example, a popular discussion topic on The DAM Forum revolves around the use of the still relatively new Digital Negative (DNG) format. The DNG format, even though it requires an extra step, will wrap a RAW image file (named for the initial "raw" and unprocessed data) with all of the added metadata and any color corrections along with a thumbnail preview. However, DNG is not yet an established or widely adopted format. This means not all camera and software manufacturers support it, so there is no way of knowing for sure if it will continue to be offered and developed (Krogh, 2009).

Active discussions and announcements regarding embedded metadata can also be found on VRA-L, the listserv of the Visual Resources Association (VRA), and Metadata Deluxe. Like The DAM Forum, both of these venues are valuable resources for learning about new tools both currently under development and already available to read, edit, import, and extract embedded metadata. Recent recommendations by Reser (2010, April 26) on VRA-L include Phil Harvey's ExifTool, the Metadata Extraction Tool from the National Library of New Zealand, and Jeffrey Friedl's Exif Viewer. Metadata Deluxe also shares valuable resources and news, such as recent introductions of FileMind and MetadataTouch as products to explore for managing embedded metadata in various types of files. In addition, Metadata Deluxe announced that a Photo Metadata Toolkit compatible with Adobe CS3 through CS5 products has been released by IPTC and the PLUS Coalition (Reser, 2010).

IPTC is involved with many partnerships and projects focused on working toward interoperability. One of the most notable is the previously mentioned development of the IPTC Core schema by a working group including IPTC and Adobe. The IPTC Core has five fields that can be shared with similar fields in Dublin Core, the popular metadata schema used widely in the library community (Stock Artists Alliance, 2006). Another recent collaborative effort includes the IPTC Extension Schema for XMP, a supplemental schema to the IPTC Core approved in June 2009. Some of the new fields included in this version are specific to cultural heritage images by allowing a title, a creator, a creation date, and source information to be entered (International Press Telecommunications Council, n.d.).

On a national level in the United States, the Federal Agencies Digitization Guidelines Initiative has created a subgroup under the Still Image Working Group to focus specifically on the creation of guidelines for embedded metadata. Comprised of participants from federal agency partners as well as metadata experts from various professional backgrounds, the group's charter states that it "identified embedded metadata as a high priority based on the important role of metadata in the management, use and sustainability of digital assets, and the lack of clear, comprehensive, and uniform guidelines in this area" (Federal Agencies Digitization Guidelines Initiative Still Image Working Group, 2008, p. 3).

Embedded Metadata in Text Documents

Embedded metadata within documents is receiving a lot of attention lately in the United States regarding how it can be used legally, especially if it should be allowed as factual evidence. As already discussed, the technical metadata is often automatically created (albeit not entirely visible) when a document is created or different versions are saved. Additionally, use of the Track Changes, Fast Saves, Comments, or Versions features in Word are also a part of the document's history and can be recovered at a later time (Hricik & Scott, 2008). As a result, if a receiving lawyer accesses information that was not intended to be seen or was not known about in the first place by the sending lawyer, an ethical dilemma may arise and the attorney-client privilege can be compromised (Babaeva, 2007).

Currently, there is no federal ruling in the United States on the legality of using embedded metadata in documents as factual evidence. Instead, judges are referring to case law to see how this issue evolves. The closest official federal opinion is the December 2006 e-discovery amendments to the Federal Rules of Civil Procedure, which attempt to "address the myriad issues associated with the discovery and production of information in digital form—what the amendments call 'electronically stored information'" (Withers, 2006, para. 7). In other words, embedded metadata is not specifically addressed, which leaves it open to interpretation on a case-by-case basis as to what constitutes its legal use. In contrast, the American Bar Association's (ABA) Standing Committee on Ethics and Professional Responsibility issued Formal Ethics Opinion 06-442 on metadata in August 2006, stating that "if an attorney receives a document with confidential information embedded as metadata, then that attorney can both review and use such information" (Babaeva, 2007, p. 14).

Without a federal ruling, states can interpret the 2006 e-discovery amendments differently. This becomes especially problematic when sending and receiving lawyers are in different states. Babaeva (2007) suggests that sending lawyers should assume the responsibility of using clean templates before creating documents, or scrub documents before sharing them. She offers suggestions of tools and methods in her article. Scrubbing each document individually does not seem realistic in practice and also risks spoliation, the "destruction, significant alteration, or non preservation of evidence relevant to pending or reasonable foreseeable litigation" (Grenig & Gleisner, as cited in Bradberry, 2010, p. 4). Other suggestions made by Babaeva (2007) include having both counsels agree to not seek the metadata in transferred documents, print out hard copies, or arrange a transfer of information via telephone or fax. Unfortunately, these do not seem like viable activities with the sheer volume of born-digital information today, and a real solution is dependent on the federal courts' ability to produce a timely and firm ruling. As noted by Bradberry (2010), "Production of such vast and often incomprehensible stores of information requires comprehensive discovery rules" (p. 2).

In October 2009, the Arizona Supreme Court heard the case of Lake v. City of Phoenix. The plaintiff, Phoenix police officer David Lake, was seeking the release of the electronic version of his performance review so the metadata with the history of his supervisor's notes could be read and included as part of the public record. The court held that under public records law, any embedded metadata in the electronic version is subject to disclosure (Lake v. City of Phoenix, 2009).

This landmark ruling overturned the original ruling by the Superior Court of Maricopa County, which denied the officer's request. Lake appealed, and his request for a new hearing was granted. Judge Norris argued that Maricopa was incorrect in viewing the metadata as separate from the record itself when in fact, "metadata is not an 'electronic orphan,' but is instead part of the requested electronic document" (Lake v. City of Phoenix, 2009, p. 4). Unfortunately, most clients and their lawyers are not even aware of metadata until it is too late. This is no longer acceptable in today's world of digital communication, and the American legal profession now has an obligation to their clients and the general public to stay informed both about technology and the progress of our right to access metadata (Babaeva, 2007; Bennett & Cloud, 2010).

Embedded Metadata in Moving Image and Audio Files

Much like the activity surrounding embedded metadata in still images and text documents, the uses and tools for embedding metadata into moving image and audio files are still in their infancy. While MP3, MOV, MPEG, and WAVE are audio-visual file formats that are increasingly becoming more common and widespread in use, there are few industry standards currently in place for production as well as preservation. Additionally, as with the variety of still image files, these formats have different technical specifications and functionality.

One major issue with audio media is that it involves electronic signals. Embedding metadata into these signals can introduce "noise" or distortion, which disrupts the original signal and as a result, produce a greater risk for damaging the original content. A study performed by Kozat et al. (2009) found that the researchers' methods of embedding patient information within electrocardiograms (ECG) proved to be successful without altering the significant properties that ensure the proper diagnosis of the patient. Part of the team's main concern was also being able to successfully extract the embedded metadata as well as prevent illegal tampering. Based on the results of this research experiment, the group's methods are also deemed viable for these purposes as well.

The Audio-Visual Working Group of the United States' Federal Agencies Digitization Guidelines Initiative is also a participant in the Embedded Metadata subgroup, and has recently received approval to explore proposed interim guidelines for embedding metadata into historical and cultural heritage digital audio files (Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group, 2009). This group's Web page features a section for Resources and Industry Standards, which offers a variety of information from international sources on the progress of audio-visual digitization and preservation guidelines.

Finally, in an effort to help protect intellectual property and copyright, Creative Commons has presented an option for its users to embed metadata in the form of a verification link so copyright licenses can travel with different files around the Web. At the present time, however, the compatibility is contingent on the software used by the creator of the file.

Final Thoughts

As mentioned previously, one of the primary obstacles with embedded metadata is sustainability—there is no guarantee if the proprietary companies currently making the technical equipment and software will continue supporting and producing certain formats (Krogh, 2009). Ultimately, these companies are driven by profit, and if users don't adopt the formats, they will be discontinued and replaced. According to Arms and Fleischhauer (2005):

Adoption refers to the degree to which the format is already used by the primary creators, disseminators, or users of information resources. A format that is widely adopted is less likely to become obsolete rapidly, and tools for migration and emulation are more likely to emerge from industry without specific investment by archival institutions. (p. 3)
Mitchell and Surratt (2005) agree that the more institutions invest in and use a given format, the higher the probability that the format will be maintained, and they also highly recommend adopting common file formats with broad support and well-defined standards.

Understanding and working with metadata is already a vital tool for 21st century information professionals, but we are not on a paved road yet. As Hillmann and Westbrooks (2004) suggest: "The current metadata environment seems like the Wild West as seen from the point of view of a Boston Brahmin—very messy, and with armed cowboys behind every rock" (p. xiii). Even though this statement was made over six years ago and much has been learned since that time, it still rings true for our challenges today. Metadata itself is not new but it has renewed importance for the library and information science profession's core services, specifically the preservation, retrieval, and access of information.

The research conducted for this paper is revelatory in the sense that there were pleasant discoveries of different professions collaborating to make embedded metadata a viable tool for digital preservation for various users. The recognition of the importance of standards and adoption across communities seems more prevalent with this process today than what has occurred in the past with the creation of different bibliographic-based data content standards, specifically AACR, DACS, and CCO by libraries, archives, and museums respectively.

While right now it seems that using embedded metadata could significantly improve workflows, preservation, and interoperability, there are still important concerns. Primarily, these are the lack of established standards, schemas, and formats. This is due to the fact that local policies and specific needs often obscure the "big picture" of enabling the sharing of resources across institutions. Another major barrier to implementing embedded metadata across institutions is the ability to ensure that there will be no loss of authenticity or file corruption. Despite these many challenges, the developments happening across professions and industries mentioned throughout this paper can certainly be described as promising.

The continual changes that arise when working with metadata always provide valuable learning experiences. With embedded metadata in particular, we are still at the point where not enough time has passed to understand how it behaves over the long term. However, if the library and information science community continues on a cooperative path with the various entities also working on these tasks, we will be able to determine if embedded metadata can be relied upon as a digital preservation tool and ultimately ensure the survival of our collective digital heritage.

References

Arms, C. R., & Fleischhauer, C. (2005). Digital formats: Factors for sustainability, functionality, and quality [PDF document]. Retrieved November 17, 2010, from http://www.digitalpreservation.gov/formats/intro/papers.shtml

Babaeva, E. V. (2007). Keep it clean: The ethical challenge of managing metadata in documents. Tennessee Bar Journal, 23(12), 14-21.

Bennett, S. C., & Cloud, J. (2010). Coping with metadata: Ten key steps. Mercer Law Review, Winter 2010.

Bradberry, K. (2010). Electronic discovery in Georgia: Bringing the state out of the typewriter age. Georgia State University Law Review, Winter 2010.

Cailliau, R. (1995). A little history of the World Wide Web. Retrieved November 17, 2010, from http://www.w3.org/History.html

Caplan, P. (2003). Metadata fundamentals for all librarians. Chicago: ALA Editions.

Caplan, P. (2009). Understanding PREMIS [PDF document]. Retrieved November 17, 2010, from http://www.loc.gov/standards/premis/bibliography.html

Creative Commons (2008). XMP. Retrieved November 17, 2010, from http://wiki.creativecommons.org/XMP

Elings, M. W., & Waibel, G. (2007). Metadata for all: Descriptive standards and metadata sharing across libraries, archives, and museums. First Monday, 12(3). Retrieved November 17, 2010, from http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/issue/view/225

Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group (2009). Broadcast WAVE metadata. Retrieved November 17, 2010, from http://www.digitizationguidelines.gov/audio-visual/documents/wave_metadata.html

Federal Agencies Digitization Guidelines Initiative Still Image Working Group (2008). Still Image Working Group Embedded Metadata Sub-group charter version 1.0 [PDF document]. Retrieved November 17, 2010, from http://www.digitizationguidelines.gov/stillimages/sub-embeddedcharter.html

Gilliland, A. J. (2008). Setting the stage. In M. Baca (Ed.), Introduction to metadata (Online Ed. Version 3.0) [PDF document]. Retrieved November 17, 2010, from http://www.getty.edu/research/publications/electronic_publications/intrometadata/index.html

Hillmann, D. I., & Westbrooks, E. L. (2004). Metadata in practice. Chicago: American Library Association.

Hricik, D., & Scott, C. E. (2008). Metadata: The ghosts haunting e-documents. FindLaw. Retrieved November 17, 2010, from http://articles.technology.findlaw.com/2008/Mar/25/11138.html

International Press Telecommunications Council (n.d.). IPTC Core & Extension = The IPTC photo metadata standard. Retrieved November 17, 2010, from http://www.iptc.org/cms/site/index.html?channel=CH0099

Kozat, S. S., Vlachos, M., Lucchese, C., Herle van, H., & Yu, P. S. (2009). Embedding and retrieving private metadata in electrocardiograms. Journal of Medical Systems, 33(4), 241-259.

Krogh, P. (2009). The DAM book: Digital asset management for photographers (2nd ed.). Sebastopol, CA: O’Reilly.

Lake v. City of Phoenix. 222 Ariz. 547, 218 P.3d 1004 (Ariz. 2009).

Lavoie, B. F. (2004). Technology watch report: The open archival information system reference model: Introductory guide. DPC Technology Watch Series Report 04-01 [PDF document]. Retrieved November 17, 2010, from http://www.dpconline.org/advice/technology-watch-reports

Library of Congress (2010). METS: Metadata encoding & transmission standard. Retrieved November 17, 2010, from http://www.loc.gov/standards/mets/mets-home.html

Lui, J. (2007). Metadata and its applications in the digital library. Westport, CT: Libraries Unlimited.

Manola, F., & Miller, E. (Eds.) (2004). RDF primer: W3C recommendation 10 February 2004. Retrieved November 17, 2010, from http://www.w3.org/TR/rdf-primer/

Mitchell, A. M. & Surratt, B. E. (2005). Cataloging and organizing digital resources: A how-to-do-it manual for librarians. New York: Neal-Schuman Publishers.

National Information Standards Organization (2004). Understanding metadata [PDF document]. Retrieved November 17, 2010, from http://www.niso.org/publications/press/

Pro-Imaging (n.d.). Metadata and copyright. Retrieved November 17, 2010, from http://www.pro-imaging.org/content/view/157/131/

Reser, G. (2009a). What problems are we solving. Retrieved November 17, 2010, from http://metadatadeluxe.pbworks.com/What-problems-are-we-solving

Reser, G. (2009b). Disadvantages to/concerns with embedding complete metadata. Retrieved November 17, 2010, from http://metadatadeluxe.pbworks.com/w/page/20792231/Disadvantages

Reser, G. (2010). News. Retrieved November 17, 2010, from http://metadatadeluxe.pbworks.com/w/page/20792254/News

Reser, G. (2010, April 26). Re: camera metadata [Msg 244]. Message posted to http://listserv.uark.edu/scripts/wa.exe?A2=ind1004&L;=vra-l&T;=0&P;=28014

Reser, G., & Bauman, J. (2009). Embedded metadata, part I: The basics and a history. Images, the newsletter of the VRA, 6(6). Retrieved November 17, 2010, from http://www.vraweb.org/publications/imagestuff/vol6no6.html

Riecks, D. (n.d.). What is a controlled vocabulary, and how is it useful?. Retrieved November 17, 2010, from http://www.controlledvocabulary.com/

Smith-Yoshimura, K. (2010). New directions for metadata workflow. In Scene setting: New directions for metadata workflows across libraries, archives, museums [PowerPoint slides]. Retrieved November 17, 2010, from http://www.slideshare.net/RLGPrograms/calgary-scene-setting-final

Smithsonian Institute Archive Center (2007). Responsible record keeping: Email records [PDF document]. Retrieved November 17, 2010, from http://www.siarchives.si.edu/research/main_pubs.html

Steinhart, G., Dietrich, D., & Green, A. (2009). Establishing trust in a chain of preservation: The TRAC checklist applied to a data staging repository (DataStaR). D-Lib Magazine, 15(9/10). Retrieved November 17, 2010, from http://www.dlib.org/dlib/september09/steinhart/09steinhart.html

Stock Artists Alliance (2006). Metadata manifesto. Retrieved November 17, 2010, from http://www.stockartistsalliance.org/metadata-manifesto-3

Stone, A. (2000). The LCSH century: A brief history of the Library of Congress subject headings and introduction to the centennial essays. Cataloging & Classification Quarterly, 29(1-2). Retrieved November 17, 2010, from http://catalogingandclassificationquarterly.com/ccq29nr1-2ed.htm

Withers, K. J. (2006). Electronically stored information: The December 2006 amendments to the federal rules of civil procedure. Northwestern Journal of Technology and Intellectual Property, 4(2). Retrieved November 17, 2010, from http://www.law.northwestern.edu/journals/njtip/v4/n2/3/

Author's Bio

Abigail Dansiger is a December 2010 MLIS candidate at San José State University's School of Library & Information Science (SJSU SLIS) in San José, California, where she has focused her studies primarily on information organization and retrieval. She is the recipient of the 2010 SLIS NewsBank Scholarship, endowed to support students interested in fields related to digital resources and information retrieval.

Go to Top

Contents

  1. Abstract
  2. Introduction
  3. What is Metadata Exactly?
  4. Metadata and Interoperability
  5. Tools of the Trade
  6. (What a Wonderful) World Wide Web
  7. Controlled Vocabularies
  8. XML
  9. METS
  10. XMP
  11. RDF
  12. OAI
  13. OAIS Reference Model
  14. PREMIS
  15. Digital Repositories
  16. Embedded Metadata
  17. Embedded Metadata and Its Uses
  18. Embedded Metadata in Still Images
  19. Embedded Metadata in Text Documents
  20. Embedded Metadata in Moving Image and Audio Files
  21. Final Thoughts
  22. References
  23. Author's Bio


Copyright, 2011 Library Student Journal | Contact