Στις 05-ο6 Οκρωβρίου του 2009 πραγματοποιήθηκε στο San Francisco το 6ο Διεθνές Συνέδριο Διατήρησης Ψηφιακών Αντικειμένων. Παρακάτω παραθέτουμε τις περιλήψεις των 31 εισηγήσεων που παρουσιάστηκαν στο συνέδριο και παραπέμπουμε στο πλήρες κέιμενο τους.
In order better to meet the needs of its diverse University of California constituencies, the California Digital Library UC Curation Center is re-envisioning its approach to digital curation infrastructure by devolving function into a set of granular, independent, but interoperable micro-services. Since each of these services is small and self-contained, they are more easily developed, deployed, maintained, and enhanced; at the same time, complex curation function can emerge from the strategic combination of atomistic services. The emergent approach emphasizes the persistence of content rather than the systems in which that management occurs, thus the paradigmatic archival culture is not unduly coupled to any particular technological context. This results in a curation environment that is comprehensive in scope, yet flexible with regard to local policies and practices and sustainable despite the inevitability of disruptive change in technology and user expectation.
Undoubtedly, long-term preservation has raised a great deal of attention worldwide, but in a broader perspective the attention is out of proportion compared to the number of real operational solutions – not to mention the big picture of a comprehensive digital preservation infrastructure. Up to now, the existing digital preservation infrastructure mainly consists of a small number of scattered trusted long-term archiving data repositories, which have the control of and the responsibility for our digital heritage. The ongoing discussion on e-Infrastructure points out that we are still lacking reliable structures which support the expected integrated digital preservation infrastructure provided jointly by cultural heritage organizations, data centers and data producers. The slow development towards a global integrated digital preservation infrastructure in combination with adjacent pressing questions for the information infrastructure in general has led to an extended strategic discussion within the last years in Europe and the US: In studies, position papers and evaluation approaches we find elaborated building blocks for a roadmap and definitions for a more advanced landscape. With a focus especially on the German situation and on existing and ongoing practical experiences, the paper discusses reasons and strategic aspects which maybe can illuminate the prohibitive factors for the unassertive progress.
The National Digital Stewardship Alliance Charter: Enabling Collaboration to Achieve National Digital Preservation
The Library of Congress proposes extending the success of the NDIIPP (National Digital Information Infrastructure and Preservation Program) network by forming a national stewardship alliance of committed digital preservation partners.
The Human Face of Digital Preservation: Organizational and Staff Challenges, and Initiatives at the Bibliothèque nationale de France
The process of setting up a digital preservation repository in compliancy with the OAIS model is not only a technical challenge: libraries also need to develop and maintain appropriate skills and organizations. Digital activities, including digital preservation, are nowadays moving into the mainstream activity of the Library and are integrated in its workflows. The Bibliothèque nationale de France (BnF) has been working on the definition of digital preservation activities since 2003. This paper aims at presenting the organizational and human resources challenges that have been faced by the library in this context, and those that are still awaiting us. The library has been facing these challenges through a variety of actions at different levels: organizational changes, training sessions, dedicated working group and task forces, analysis of skills and processes, etc. The results of these actions provide insights on how a national library is going digital, and what is needed to reach this longstanding goal.
For millions of legacy documents, correct rendering depends upon resources such as fonts that are not generally embedded within the document structure. Yet there is significant risk of information loss due to missing or incorrectly substituted fonts. In this paper we use a collection of 230,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.
TIPR, Towards Interoperable Preservation Repositories, is a project funded by the Institute of Museum and Library Services to create and test the Repository eXchange Package (RXP). The package will make it possible to transfer complex digital objects between dissimilar preservation repositories. For reasons of redundancy, succession planning and software migration, such repositories must be able to exchange copies of archival information packages with each other. Every different repository design, however, describes and structures its archival packages differently. Therefore each type produces dissemination packages that are rarely understandable or usable as submission packages by other repositories. The RXP is an answer to that mismatch. Other solutions for transferring packages between repositories focus either on transfers between repositories of the same type, such as DSpace-to-DSpace transfers, or on processes that translate a specific dissemination format into a specific submission package. Rather than build translators between many dissimilar repository types, the TIPR project has defined a standards-based package of metadata files that can act as an intermediary information package, the RXP, a lingua franca all repositories can read and write. In this paper we present the assumptions and principles underlying the TIPR concept of repository-to-repository exchange, and proceed to describe three aspects of the TIPR project: the RXP format itself; the tests we are conducting to prove and improve the use of the RXP; and finally, issues that have arisen in the course of the project so far.
The challenge of digital preservation of scientific data lies in the need to preserve not only the dataset itself but also the ability it has to deliver knowledge to a future user community. A true scientific research asset allows future users to reanalyze the data within new contexts. Thus, in order to carry out meaningful preservation we need to ensure that future users are equipped with the necessary information to re-use the data. This paper presents an overview of a preservation analysis methodology which was developed in response to that need on the CASPAR and Digital Curation Centre SCARP projects. We intend to place it in relation to other digital preservation practices discussing how they can interact to provide archives caring for scientific data sets with the full arsenal of tools and techniques necessary to rise to this challenge.
Effective digital preservation depends on a set of preservation services that work together to ensure that digital objects can be preserved for the long-term. These services need digital preservation metadata, in particular, descriptions of the properties that digital objects may have and descriptions of the requirements that guide digital preservation services. This paper analyzes how these services interact and use this metadata and develops a data dictionary to support them.
The long term preservation is a responsibility to share with other organizations, even adopting different preservation methods and tools. The overcoming of the interoperability issues, by means of the achievement of a flawless exchange of digital assets to preserve, enables the feasibility of applying distributed digital preservation policies. The Archives Ready To AIP Transmission a PREMIS Based Project (ARTAT-PBP) aims to experiment with the adoption of a common preservation metadata standard as interchange language in a network of cooperating organizations that need to exchange digital resources with the mutual objective of preserving them in the long term.
The term "Significant Properties" has been given a variety of definitions and used in various ways over the past several years. The relationship between Significant Properties and the OAIS term Representation Information has been a puzzle. This paper proposes a definition of Significant Properties which provides a way to clarify this relationship and indicates how the concept can be used in a coherent way. We believe that this approach is consistent with the actual use of the concept and does not invalidate the previous pieces of work but rather provides a clear and consistent view of the concept. It also links together Authenticity and Provenance which are also key concepts in digital preservation.
This paper will describe the tools and infrastructure components which have been implemented by the CASPAR project to support repositories in their task of long term preservation of digital resources. We address also the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. Moreover examples of ways to evaluate a variety of preservation strategies will be discussed as will examples of integrating the use of these infrastructure components and tools into existing repository systems. Examples will be given of a rich selection of digital objects which encode information from a variety of disciplines including science, cultural heritage and also contemporary performing arts.
Integrating Metadata Standards to Support Long-Term Preservation of Digital Assets: Developing Best Practices for Expressing Preservation Metadata in a Container Format
This paper explores the purpose and development of best practice guidelines for the use of preservation metadata as detailed in the PREMIS Data Dictionary for Preservation Metadata within documents conforming to the Metadata Encoding and Transmission Standard (METS). METS is an XML schema that provides a container format integrating various forms of metadata with digital objects or links to digital objects. Because of the flexibility of METS to serve many different functions within digital systems and to support many different metadata structures, integration guidelines will facilitate common practices among institutions. There is constant tension between tighter control over the METS package to support object exchange versus each implementation's unique preservation metadata requirements given the different contexts and implementation models among PREMIS implementers. The PREMIS in METS Guidelines serve primarily as a standard for submission and dissemination information packages. This paper details the issues encountered in using the standards together, and how the METS document changes as events pertaining to the lifecycle of digital assets are recorded for future preservation purposes. The guidelines have enabled the implementation of an exchange format and creation/validation tools based on the PREMIS in METS guidelines.
Specimens of early computer systems stop working every day. One storage medium that was popular for home computers in the 1980s was the audio tape. The first home computer systems allowed the use of standard cassette players to record and replay data. Audio tapes are more durable than old home computers when properly stored. Devices playing this medium (i.e. tape recorders) can be found in working condition or can be repaired as they are made out of standard components. By re-engineering the format of the waveform the data on such media can then be extracted from a digitized audio stream. This work presents a case study of extracting data created on an early home computer system, the Philips G7400. Results show that with some error correction methods parts of the tapes are still readable, even without the original system. It also becomes clear, that it is easier to build solutions now when the original systems are still available.
The Danish Ministry of Culture is currently funding a project to set up a model for costing preservation of digital materials held by national cultural heritage institutions. The overall objective of the project is to provide a basis for comparing and estimating future financial requirements for digital preservation and to increase cost effectiveness of digital preservation activities. In this study we describe an activity based costing methodology for digital preservation based on the OAIS Reference Model. In order to estimate the cost of digital migrations we have identified cost critical activities by analysing the OAIS Model, and supplemented this analysis with findings from other models, literature and own experience. To verify the model it has been tested on two sets of data from a normalisation project and a migration project at the Danish National Archives. The study found that the OAIS model provides a sound overall framework for cost breakdown, but that some functions, especially when it comes to performing and evaluating the actual migration, need additional detailing in order to cost activities accurately.
This paper addresses a particular domain within the sphere of activity that is coming to be known as personal digital papers or personal digital archives. We are concerned with contemporary writers of belles-lettres (fiction, poetry, and drama), and the implications of the shift toward word processing and other forms of electronic text production for the future of the cultural record, in particular literary scholarship. The urgency of this topic is evidenced by the recent deaths of several high-profile authors, including David Foster Wallace and John Updike, both of whom are known to have left behind electronic records containing unpublished and incomplete work alongside of their more traditional manuscript materials. We argue that literary and other creatively-oriented originators offer unique challenges for the preservation enterprise, since the complete digital context for individual records is often of paramount importance—what Richard Ovenden, in a helpful phrase (in conversation) has termed “the digital materiality of digital culture.” We will therefore discuss preservation and access scenarios that account for the computer as a complete artifact and digital environment, drawing on examples from the born-digital materials in literary collections at Emory University, the Harry Ransom Center at The University of Texas at Austin, and the University of Maryland.
Mainstreaming Preservation through Slicing and Dicing of Digital Repositories: Investigating Alternative Service and Resource Options for ContextMiner Using Data Grid Technology
A digital repository can be seen as a combination of services, resources, and policies. One of the fundamental design questions for digital repositories is how to break down the services and resources: who will have responsibility, where they will reside, and how they will interact. There is no single, optimal answer to this question. The most appropriate arrangement depends on many factors that vary across repository contexts and are very likely to change over time. This paper reports on our investigation and testing of various repository "slicing and dicing" scenarios, their potential benefits, and implications for implementation, administration, and service offerings. Vital considerations for each option (1) efficiencies of resource use, (2) management of dependencies across entities, and (3) the repository business model most appropriate to the participating organizations.
In this paper, I consider whether virtual worlds are history in two senses of the word. The first explores the implications of the life-cycle of virtual worlds, especially of their extinction, for thinking about the history of computerbased technologies, as well as their use. The moment when a virtual world “is history” – when it shuts down – reminds us that every virtual world has a history. Histories of individual virtual worlds are inextricably bound up with the intellectual and cultural history of virtual world technologies and communities. The second sense of the virtual world as history brings us directly to issues of historical documentation, digital preservation and curation of virtual worlds. I consider what will remain of virtual worlds after they close down, either individually or perhaps even collectively.
The ingest and its preparation are crucial steps and of strategical importance for digital preservation. If we want to move digital preservation into the mainstream we have to make them as easy as possible. The aim of the NESTOR guide "Into The Archive" is to help streamlining the planning and execution of ingest projects. The main challenge for such a guide is to provide help for a broad audience with heterogeneous use cases and without detailed background knowledge on the producer side. This paper will introduce the guide, present first experiences and discuss the challenges.
Only a small part of the research which has been carried out to date on the preservation of digital objects has looked specifically at the preservation of software. This is because the preservation of software has been seen as a less urgent problem than the preservation of other digital objects, and also the complexity of software artifacts makes the problem of preserving them a daunting one. Nevertheless, there are good reasons to want to preserve software. In this paper we consider some of the motivations behind software preservation, based on an analysis of software preservation practice. We then go on to consider what it means to preserve software, discussing preservation approaches, and developing a performance model which determines how the adequacy of the a software preservation method. Finally we discuss some implications for preservation analysis for the case of software artifacts.
The Chronopolis Digital Preservation Initiative, one of the Library of Congress' latest efforts to collect and preserve atrisk digital information, has completed its first year of service as a multi-member partnership to meet the archival needs of a wide range of cultural and social domains. In this paper we will explore the major themes within Chronopolis.
Blog archiving and preservation is not a new challenge. Current solutions are commonly based on typical web archiving activities, whereby a crawler is configured to harvest a copy of the blog and return the copy to a web archive. Yet this is not the only solution, nor is it always the most appropriate. We propose that in some cases, an approach building on the functionality provided by web feeds offers more potential. This paper describes research to develop such an approach, suitable for organisations of varying size and which can be implemented with relatively little resource and technical know-how: the ArchivePress project.
The creation of most digital objects occurs solely in interactive graphical user interfaces which were available at the particular time period. Archiving and preservation organizations are posed with large amounts of such objects of various types. At some point they will need to, if possible, automatically process these to make them available to their users or convert them to a valid format. A substantial problem in creating an automated process is the availability of suitable tools. We are suggesting a new method, which uses an operating system and application independent interactive workflow for the migration of digital objects using an emulated environment. Success terms for the conception and functionality of emulation environments are therefore devised which should be applied to future long-term archiving methods.
The Planets project is developing a service-oriented environment for the definition and evaluation of preservation strategies for human-centric data. It focuses on the question of logically preserving digital materials, as opposed to the physical preservation of content bit-streams. This includes the development of preservation tools for the automated characterization, migration, and comparison of different types of digital objects as well as the emulation of their original runtime environment in order to ensure longtime access and interpretability. The Planets integrated environment provides a number of end-user applications that allow data curators to execute and scientifically evaluate preservation experiments based on composable preservation services. In this paper, we focus on the middleware and programming model and show how it can be utilized in order to create complex preservation workflows.
nestor, the German network of expertise in digital preservation started as a time-limited project in 2003. Besides the establishment of a network of expertise with an information platform, working groups, and training opportunities, a central goal of the project phase was to prepare a sustainable organization model for the network's services. In July 2009, nestor transformed into a sustainable organization with 6 of the 7 project partners and 2 additional organizations entering into a consortium agreement. The preparation of the sustainable organization was a valuable experience for the project partners because vision and mission of the network were critically discussed and refined for the future organization. Some more aspects were identified that also need further refinement in order to make nestor fit for the future. These aspects shall be discussed in the paper.
In the last few years digital preservation has started to transition from a theoretical discipline to one where real solutions are beginning to be used. The Planets project has analyzed the readiness of libraries, archives and related organizations to begin to use the outputs of various digital preservation initiatives (and, in particular, the outputs of the Planets project). This talk will discuss the outcomes of this exercise which have revealed an increasing understanding the problem. It has also shown concerns about its scale (in terms of data volumes and types of data) and on the maturity of existing solutions (most people are only aware of piecemeal solutions). It also shows that there is a new challenge emerging: moving from running digital preservation projects to embedding the processes within their organisations.
Preserving the Digital Memory of the Government of Canada: Influence and Collaboration with Records Creators
Library and Archives Canada has a wide mandate to preserve and provide access to Canadian published heritage, records of national significance, as well as to acquire the records created by the Government of Canada, deemed to be of historical importance. To address this mandate, Library and Archives Canada has undertaken the development of a digital preservation infrastructure covering policy, standards and enterprise applications which will serve requirements for ingest, metadata management, preservation and access. The purpose of this paper is to focus on the efforts underway to engage digital recordkeeping activities in the Government of Canada and to influence and align those processes with LAC digital preservation requirements. The LAC strategy to implement preservation considerations early in the life cycle of the digital record is to establish a mandatory legislative and policy framework for recordkeeping in government. This includes a Directive on Recordkeeping, Core Digital Records Metadata Standard for archival records, Digital File Format Guidance, as well as Web 2.0 and Email Recordkeeping Guidelines. The expected success of these initiatives, and collaborative approach should provide a model for other digital heritage creators in Canada
The Web is increasingly becoming a platform for linked data. This means making connections and adding value to data on the Web. As more data becomes openly available and more people are able to use the data, it becomes more powerful. An example is file format registries and the evaluation of format risks. Here the requirement for information is now greater than the effort that any single institution can put into gathering and collating this information. Recognising that more is better, the creators of PRONOM, JHOVE, GDFR and others are joining to lead a new initiative, the Unified Digital Format Registry. Ahead of this effort a new RDF-based framework for structuring and facilitating file format data from multiple sources including PRONOM has demonstrated it is able to produce more links, and thus provide more answers to digital preservation questions - about format risks, applications, viewers and transformations - than the native data alone. This paper will describe this registry, P2, and its services, show how it can be used, and provide examples where it delivers more answers than the contributing resources.
DANS (Data Archiving and Networked Services), the Dutch scientific data archive for the social sciences and humanities is engaged in the MIXED project to develop open source software that implements the “smart migration” strategy concerning the long-term archiving of file formats. Smart migration concerns the conversion upon ingest of specific kinds of data formats, such as spreadsheets and databases, to an intermediate XML formatted file. It is assumed that the long-term curation of the XML files is much less problematic than the migration of binary source files and that the intermediate XML file can be converted in an efficient way to file formats that are common in the future. The features of the intermediate XML files are stored in the so-called SDFP schema (Standard Data Formats for Preservation). This XML schema can be considered as an umbrella as it contains existing formal descriptions of file formats developed by others. SDFP contains also a schemas developed by DANS, e.g. a schema for file oriented databases. It can be used e.g. for the binary "DataPerfect" format that was used on a large scale about twenty years ago and for which no existing XML schema could be found. The software developed in the MIXED project has been set up as a generic framework, together with a number of plug-ins. It can be considered as a repository of durable file format conversions. The MIXED project is at its ending phase and this paper contains an overview of the results.
Representatives from a variety of distributed digital preservation initiatives will serve as panel members and discuss the technical adaptability, economics, and functionally compelling benefits of using cooperative distributed digital preservation networks to preserve the vast array of at-risk digital content produced by our societies and their institutions.
This paper will provide an overview of developments from the two phases of the LIFE (Lifecycle Information for E-Literature) project, LIFE1 and LIFE2, before describing the aims and latest progress from the third phase. Emphasis will be placed on the various approaches to estimate preservation costs including the use of templates to facilitate user interaction with the costing tool. The paper will also explore how the results of the Project will help to inform preservation planning and collection management decisions with a discussion of scenarios in which the LIFE costing tool could be applied. This will be supported by a description of how adopting institutions are already utilising LIFE tools and techniques to analyse and refine their existing preservation activity as well as to enhance their collection management decision making.
Important legal and economic motivations exist for the design and engineering industry to address and integrate digital long-term preservation into product life cycle management (PLM). Investigations revealed that it is not sufficient to archive only the product design data which is created in early PLM phases, but preservation is needed for data that is produced during the entire product lifecycle including early and late phases. Data that is relevant for preservation consists of requirements analysis documents, design rationale, data that reflects experiences during product operation and also metadata like social collaboration context. In addition, also the engineering environment itself that contains specific versions of all tools and services is a candidate for preservation. This paper takes a closer look at engineering preservation use case scenarios as well as PLM characteristics and workflows that are relevant for long-term preservation. Resulting requirements for a long-term preservation system lead to an OAIS (Open Archival Information System) based system architecture and a proposed preservation service interface that respects the needs of the engineering industry.