There is a vast and rapidly increasing quantity of scientific, corporate, government, and crowd-sourced data published on the emerging Data Web. Open Data are expected to play a catalyst role in the way structured information is exploited on a large scale. This offers a great potential for building innovative products and services that create new value from already collected data. It is expected to foster active citizenship (e.g., around the topics of journalism, greenhouse gas emissions, food supply-chains, smart mobility, etc.) and world-wide research according to the “fourth paradigm of science”.

 

Published datasets are openly available on the Web. A traditional view of digitally preserving them by “pickling them and locking them away” for future use, conflicts with their evolution. There are a number of approaches and frameworks, such as the Linked Data Stack, that manage a full life-cycle of the Data Web. More specifically, these techniques are expected to tackle major issues such as the synchronisation problem (how to monitor changes), the curation problem (how to repair data imperfections and add value over time), the appraisal problem (how to assess the quality of a dataset), the citation and provenance problem (how to cite a particular version of a linked dataset, how to keep the lineage/provenance of the data), the archiving problem (how to retrieve the most recent or a particular version of a dataset), and the sustainability problem (how to support preservation at scale, ensuring long-term access).

 

Managing the evolution and preservation of linked open datasets poses a number of challenges, mainly related to the nature of the Linked Data principles and the RDF data model. Since resources are globally interlinked, effective citation measures are required. Another challenge is to determine the consequences that changes to one LOD dataset may have implications to other datasets linked to it. The distributed, dynamic nature of LOD datasets furthermore introduces additional complexity, since external sources that are being linked to may change or become unavailable. Finally, another challenge is to identify means to afford on-going access to continuously assess the quality of such dynamic datasets.