Ligue des Bibliothèques Europeénnes de Recherche, Groupe des Cartot hécaires de LIBER


TRANSLATE ENGLISH to Français, Deutsch, Italiano, Português, Español! Explanation


DIGITAL METADATA, STANDARDS FOR COMMUNICATION AND PRESERVATIONRef. 1
Jan Smits, Koninklijke Bibliotheek, The Netherlands

© Jan Smits
Published from: The LIBER Quarterly, the journal of European research libraries, ISSN 1435-5205, Vol. 6(1996)4.
E-mail: Jan Smits


Abstract
Digital datasets need deeper access than the current bibliographic record if we want to cater more for the wishes of users. As digital spatial datasets already are so overwhelmingly produced the creation of metadata is more urgent than in any other field. This led to the creation of American standards to produce the desired metadata. With these standards as example the library community is urged to make proposals to political bodies to create International Content Standards for Digital Metadata, which incorporates all kinds of digital datasets. These proposals should incorporate the enlistment of the producers of the digital datasets, who are designated to produce the metadata. These metadata are a prerequisite for collection development. They are a source for bibliographic descriptions and might also be used as a tool in understanding the problems of preserving digital datasets.

Trials and tribulations of knowledge
For millennia people have gathered information to help understand and govern the functioning of society and its natural environment. To make this knowledge more permanent they have recorded it, amongst others, on stone, clay, bark, papyrus, paper, etc., and in this recent age on electronic media.
The written/printed knowledge can be divided into two distinct classes: raw data and processed data. Many administrative (parts of) organisations have records of raw data, which, if not (purposefully) destroyed, found their way into archives. Through the ages much raw data has been processed in one or more stages into intermediate and final forms and been deposited in archives, libraries, museums, private households and so on.
Libraries contain many written and printed records which illustrate a process of selection, editing, renewal of sources, reworking of data, gaps of lost data leading to speculation, and so on. Sometimes the intermediate stages are in time so overwhelming that only ghosts of the original data are discernible. Raw data records, however, are the lesser part of a libraries holdings.

From hardcopy to digital: changes in sources?
As with so many developments the digital age started out with substituting. I.e., the ultimate form in which knowledge was reproduced remained the same, but the processes to arrive at this stage were different and faster. Input was/is still governed by output. This means that derivative processes like cataloguing have not undergone much significant change.
But the digital age creates also its own innovative developments. Has manipulation of data been a very tedious process in the past, digital technologies can overcome much of the difficulties. Not only brings it within our reach the manipulation of the aforementioned intermediate and ultimate stages of data. Because of its many available algorithms, its processing speed and the availability of enormous memory banks it can also help to reprocess the original or raw data. This will not happen much, but (re)processing can get closer to the original source than we are used to in libraries. Because of the available technology the value of the final result of processed information becomes less permanent. Andrew Tatham (Keeper of the Map Collection of the Royal Geographical Society) sees our future consequently as follows:

"We shall no longer provide the users with someone else's selection and presentation of data, but with the data itself and with the means by which the user can make their own selection and presentation of this data to inform or to mould their own or other people's image of the world"Ref. 2
Of course the older media still will exist keeping all the inherent functions of libraries intact, but if his words will come true this means an enormous addition to our functions.

The case of spatial data
Compared with the information most departments of research libraries collect, map departments are on a way-of-no-return when speaking of digital data.
The bigger part of spatial data, that is all geo-referenced data including statistical data, are produced by governmental agencies, federal, state, provincial, municipal, etc. Mainly so because private organisations cannot bear the financial burden of keeping up a permanent framework in which these are gathered. This is especially true for geophysical, meteorological and demographic data, including aerial photographs and remote sensing images. For these governmental organisations the digital age came in the nick of time, because the imbalance between the amount of data produced on all levels and the number of personnel and the technology available to process them was getting bigger and bigger. Now almost all governmental agencies are creating or have created digital spatial databases which are the basis for their products. As Patrick McGlamery of the University of Connecticut, U.S.A., said:

"We have reached the point in spatial information revolution where the amount of spatial information available outstrips the ability to represent it cartographically".Ref. 3
Not that hard-copy will not be produced anymore by the agencies, but they are a selection of all data available. Even if we want to we never can put these vast amounts of data on paper. And when we talk of vast amounts it is closer to terabytes than gigabytes. But these unused data may have potential for other or future users. It also means that we, map curators, must adapt quickly to digital practises or opt-out and become museums as some of us think will happen.
Digitizing means more than creating the mental ability to manipulate the raw data time and again. During the last map curator's conference one of the issues was in how far map collections are willing to offer cartographic software and support the use of them to their clientsRef. 4 . Chris Perkins has arranged cartographic software packages in increasing order of difficulty and increasing functionality. They range from fixed pre-defined electronic views of data with limited interaction to complete Geographical Information Systems (GIS) Ref. 5 . This presupposes the availability of digital infrastructure (soft- and hardware) to realize this manipulation. It does not only raise the problem of continuous education to map curators, but also the issue of preservation.
"By rigorously reducing the complexity of the matter in hand, we can say that the [digital] system we need consists of data, software and hardware. Each of these elements has its own life-cycle."Ref. 6

Figure 1: Succession of hardware and software generations
(From: BÜTIKOFER, Niklaus: Archiving electronic information: some aspects. In: The LIBER quarterly, Vol. 5, 1995, No.3, p. 276)

This poses the problem that the (cultural) value of digital data cannot be viewed independent from the environment it functions in. But the linchpin is the purveyor, in our case the maplibrarian.

"Probably the most important factor for the map library is the complexity of the software and the level of interaction it allows. This is important because of the degree of library staff input required, and the nature of tasks which can be performed." Ref. 7

We are constrained by our present knowledge and practices. The problem is, as Andrew Tatham stated:

"Does the map curator, as an individual, and does his or her institution, have the confidence to help bring about the future? ." Ref. 8
That means, are we prepared to advance our knowledge and do we want to think digital, without losing that special understanding of spatial relations and patterns which has been the case with map curators until now. But if we want to change, we have to built on our strength: collecting, describing and providing access. And then we have to live with the uncommon practise that we may not own part or the whole of datasets.

Interacting problems
The digital age confronts us with (at least) two problems: How to access digital data and, if we own them, how to preserve them and their functionality. In library-practise up till now these problems were two distinct tasks of the institution, though the one supported the other.
In order to be able to access and preserve digital data we have to understand their functionality, use and the technical infrastructure in which they function Ref. 9 . This is only possible when we know more about their quality, their technical functions, their availability etc. It means we have to think digital. But are we able to do so? Can we acquire this knowledge in due time or are there better options?
If we do not act soon I'm afraid that, because of digital technology evolution, a lot of valuable digital data will fade away from the collective memory and leave a gap in our history.

We are not an island
Is the situation sketched in the paragraphs above only true for geo-referenced data or is it to come true for other science-fields the libraries cater for? We see that libraries are trying to adapt themselves to accessing CD-ROMs, other digital end-products and on-line sources mainly from the Internet. The library-community is aware of the fact that the purveyance of information is changing, especially information with a high added value.
But we built only partly on our strength, i.e. selection and access. What about preserving? Can we trust that the purveyors think about how to try to preserve the data they produce? The American Commission on Preservation and Access states in one of her reports that

"Without ... a fail-safe mechanism, preservation of the nation's cultural heritage in digital form will likely be overly dependent on marketplace forces, which may value information for too short a period and without applying broader, public interest criteria"Ref. 10
Maybe we can help them by creating structures in which selection, access and preservation have equal value. Since the Paris Principles we have created the ISBD and MARCs to cope with the problems of ever rising amounts of information. Can we not advance this knowledge to the digital age, and I don't mean with that the ISBD(CF) and resulting changes in MARC-formats.

But ideas are sooner posed than realized. Though we might be wanting to think that ISBD's and MARC-formats were created by the sheer will of librarians to make valuable information more accessible the underlying drive was economics. By creating these standards libraries were able to enter the digital cataloguing age and produce cost-efficient enough to meet the ever increasing amount of information that was being produced, and at the same time placate governments c.q. politicians in granting ever bigger subsidies to keep them functioning. In this we did not differ from any other market-player, though our prime goal was and is as much free public access to information as possible.
However, times are a'changing, and libraries are pushed to become more competing market-players than ever before. This means not only that we have to be more cost-efficient, but that we have to treat information as a market commodity and keep it at the lowest costs possible available to our clients and the public in general.
In order that producers, public and private, are willing to cooperate in solving the problems of accessing and preserving digital information we must ensure that they can see profit from cooperation with the library field.

In the field of spatial digital data we think we can get a basic insight if we ask producers to aid us. They are the ones who have created the data, with their options and applicability, using their own highly sophisticated technologies. They are the ones who know best the ins and outs and what is valuable and what is not.

Metadata
But how is it that they can help us best in solving the digital problems we are confronted with. To understand the values of the data and the way they are structured we need a blueprint of the way they are designed and processed. It is already possible to make bibliographical descriptions, which can function in the present cataloguesRef. 11 . But a mere ISBD is not enough as its primary goal is to identify information. Neither will an abstract do as this circumscribes only the core of the matter.
We need information about information which identifies it, circumscribes it, gives information about its structure, functions, fitness of use, quality and authenticity. We call this metadata.

But I would first like to prevent confusion concerning the term metadata. All data about data is metadata. In casu this includes bibliographic data. However, only since the inclusion of digital datasets in library-holdings this term is used in a library context. In a report by the Dutch IWIRef. 12 under the title The library breaking new groundRef. 13 there is talk of a metacatalogue. As I read it this means a bibliographic database with descriptions of digital datasets. Though this is not concerned with hardcopy books I would prefer that these kind of descriptions, which are used to identify objects or sets of information, retain the designation bibliographic. (We also call descriptions of cartographic materials, non-book materials, music-sheets, etc. bibliographic, so why the change?) I presume philosophies about this subject in other countries in Europe are not differentRef. 14 . Creating special bibliographic catalogues also confuses the researcher who preferably would like to search only one database which contains all kinds of bibliographic data independent of the information-carrier.
I prefer to define the term metadata as

"... data that describe the content, data definition and structural representation, extent (both geographic and temporal), spatial reference, quality, availability, status, and administration of a geographic dataset." Ref. 15
In the following paragraphs I hope to illustrate what is meant by this.

For geospatial digital data, including processed remote sensing images, we have fortunately a good example from the U.S.A. On April 11, 1994, President William Clinton signed Executive Order 12906, "Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure". Section 3, Development of a National Geospatial Data Clearinghouse, paragraph (b) states:

" Standardized Documentation of Data. Beginning 9 months from the date of this order, each agency shall document all new geospatial data it collects or produces, either directly or indirectly, using the standard under development by the FGDCRef. 16 , and make that standardized documentation electronically accessible to the Clearinghouse network. Within 1 year of the date of this order, agencies shall adopt a schedule, developed in consultation with the FGDC, for documenting, to the extend practicable, geospatial data previously collected or produced, either directly or indirectly, and make that data documentation electronically accessible to the Clearinghouse network. " Ref. 17
Soon after the FGDC produced on June 8, 1994 the Content Standards for Digital Geospatial Metadata; Ref. 18
The objectives of the standards are to provide a common set of terminology and definitions for the documentation of digital geospatial data. The standards establish the names of data elements and compound groups (groups of data elements) to be used for these purposes, the definitions of these compound elements and data elements, and information about the values that are to be provided for the data elements.
The major uses of metadata are:
The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data, to determine the fitness of the set for an intended use, to determine the means of accessing the set, and to successfully transfer the set.
The standards do not provide instructions or techniques for its implementation and accordingly does not concern itself with the construction of databases for holding metadataRef. 19

Classes of metadata
Thus Metadata in this context are data about the contents, quality, condition and other characteristics of data.
In a nutshell the Content Standards for Digital Geospatial Metadata are concerned with the following kinds of information:

  1. Identification information
  2. Data quality information
  3. Spatial data organization information
  4. Spatial reference information
  5. Entity and attribute information
  6. Distribution information
  7. Metadata reference information
  8. Citation information
  9. Time period information
  10. Contact information
Metadata structure and applicability is visualized through this image map.

Except for fields 3, 4 and possibly 5 these field can be applied to any kind of digital information. For those who want to have a better insight in the use of this Standard I refer to the the FGDC.

Descriptions of digital metadata are available on the Internet through the Webpage Metadata? or the Geospatial Support Staff Metadata TutorialRef. 20 . These URL's also has many supporting papers for the use and creation of digital metadata and the functions of the American clearinghouse-system.

Though we are not yet developing in this field as the USA is, it might be good to take the advice of Patrick McGlamery at heart who points to the fact that the USGSRef. 21 is running grants for cooperative projects which stimulate exchanges between data producers and libraries among others. Though governments are retreating they sometimes have to create financial means to take up new challenges.

ICSDM (International Content Standards for Digital Metadata)
The American FGDC initiative soon was followed by the European Committee for Standardisation (CEN), Technical Committee (TC) 287, which has published a draft of a European Standard for Geographic metadata in January 1996, which should be finalized in 1997Ref. 22 .
However, both standards are restricted to spatial information. That they are developed so soon depended on the inability to manage the vast amounts of digital geospatial data being produced otherwise.

But it would be wiser to create metadata standards which encompass all digital data in the same way as the ISBD was produced to create a general bibliographic framework: a general framework including special definitions for special properties of special kinds of digital data. The standards probably have to be open-ended as digital technology is still evolving and we do not know what new kinds of metadata are called for in future. It probably means also that a new kind of thinking is called for concerning the different kinds of digital data, which will probably differ from our current thinking in non-serial, serial, non-book-materials, music, cartographic materials, and antiquarian materials. At the same time we have to keep in mind that we want to extract data for use in bibliographic databases.
If there is not any other body momentarily proposing to design such standards, why not libraries, which have a long history in documentation. I urge LIBER, maybe in cooperation with RGL and other organizations, to start planning and make proposals to political bodies (e.g. European Commission, Unesco, ISO, etc.) concerning this matter. But this time active cooperation is needed from producers, governmental as well as non-governmental. And also of information-specialist, as these standards will pass the stage of mere identification and access.
Because of the inside information we get from producers we may also get a better grip on the problems of collection development and of preserving vital digital data. However, be assured that it will not be an easy way, as these standards will make large inroads on finances of governmental and non-governmental organisations.
One of the tasks the American Commission on Preservation and Access has set itself is to use metadata for digital preservation through migrationRef. 23 . Though this Commission primarily focuses on document-like objects (i.e. documents which can be represented in a print format, which to my opinion excludes interactive digital spatial data) they may find functional structures in the FGDC-standards.
How the data has to be formatted and which kind of organizations will archive dynamic digital datasets is a question which is not the subject of this paper and therefor will not be answered for the moment.

Conclusion
The most important result of these standards will be that producers will have an obligation to deliver the metadata concerning their databases. As said before, they are the ones with the best knowledge of how these data were produced and how they can be used. But the blade cuts more ways. When the right metadata are provided the producer can market the digital datasets more efficiently, which probably will be an incentive for them to co-operate with libraries. On the other hand this information provides collection developers with a means for easier and better selection of digital datasets, and at the same time present more handles to cope with the problems of preserving these datasets.
We as librarians can also use the metadata to create catalogue-records, which will fit in our general bibliographic catalogues in order that users will be able to search a continuum from written and printed to digital material and will be enabled to pick from them what they can use best. This will put a greater strain on the competencies of librarians, but we shall have to live with that if we are to continue playing a vital role in the world of informatics. I can envisage a future in which bibliographic records are hyperlinked to records in metadata-databases, in this way enabling access on different levels for different users.
We may not be able to maintain free access to current information, as this becomes more and more an economic commodity, but we may be able to maintain at least the possibility that anybody can find the information that is needed. But time is short if we want to play a significant role in it.

AFTERWORD
This paper was read by Susan Vejlsgaard at the LIBER Annual General Conference in May 1996 in Malta. Since then the author of this paper -as observer of the IFLA Geography & Map Library Section- has attended a 4-day meeting, August 17-20 1996, of the ICARef. 24 Commission on Standards for the Transfer of Spatial Data. The main topic of this commission during its 1995-1999 cycle is "[to produce a book] which will examine and assess national and international metadata standards". Therefor some 15 national metadata (some existing, some in the process of being developed) and transfer standards will be examined together with the metadata standards of CENRef. 25 /TC 287 to which many European countries will adhere, and the ISORef. 26 /TC 211. The CEN and ISO standards probably will be finalized in 1997. Knowing the backgrounds of the ICA Commission members I wonder how many of my colleagues are involved in this work and how map collections and/or cartographic information centres are preparing themselves to use these standards and how to develop the resulting clearinghouses for metadata and spatial data?

Acknowledgement
I am grateful to Pablo Garcia i Garcia and Anna Lluch i Galera for providing the environment in which I could first concentrate my thoughts upon this subject. I would like to thank my colleagues Susan Vejlsgaard (Det Kongelige Bibliotek, Denmark), Patrick McGlamery (University of Connecticut, USA), Chris Perkins (Manchester University, UK) and Tony Campbell (The British Library, UK) for their critical contributions, of which many I took to heart and incorporated in this paper.


Notes and references

  1. The author is long serving Secretary (1984-1998) and President (1998-....) of the Groupe des Cartothécaires de LIBER (European Map Curators Group). This article is based on the Proceedings of the 9th Conference of the Groupe des Cartothécaires de LIBER, held in 1994 in Zürich, Switzerland (The LIBER Quarterly, Vol. 5/1995/3, pp. 225-347) and on a paper read at the 1996 Conference of the Dutch Cartographic Society. It gives, however, the author's personal view of the matter and cannot be read as the opinion of the Groupe des Cartothécaires de LIBER.
    Back to document

  2. TATHAM, Andrew (1995). Can the map curator adapt?
    In: The Liber quarterly, Vol. 5/1995, No. 3, pp. 330-336.
    Back to document

  3. MCGLAMERY, Pat (1995). Maps and spatial information: changes in the map library.
    In: The Liber Quarterly, Vol. 5, 1995, No. 3, pp. 229-234.
    Back to document

  4. SMITS, Jan (1994). Mapcuratorship in transition : report on the 9th conference of the Groupe des Cartothécaires de LIBER, 26-29 September 1994, Zürich, Switzerland.
    In: The Liber quarterly, Vol. 4, 1994, No. 3, pp. 345-362.
    Back to document

  5. PERKINS, C.R. (1995). Leave it to the labs? Options for the future of map and spatial data collections.
    In: The Liber quarterly, Vol. 5, 1995, No. 3, pp. 312-329, Figure 1. Types of cartographic software.
    Back to document

  6. BÜTIKOFER, Niklaus (1995). Archiving electronic information: some aspects.
    In: The LIBER quarterly, Vol. 5, 1995, No.3, pp. 274-279.
    Back to document

  7. PERKINS, C.R. (1995). Leave it to the labs? Options for the future of map and spatial data collections.
    In: The Liber quarterly, Vol. 5, 1995, No. 3, pp. 312-329.
    Back to document

  8. TATHAM, Andrew (1995). Can the map curator adapt?
    In: The Liber quarterly, Vol. 5/1995, No. 3, pp. 330-336.
    Back to document

  9. We have seen already many examples where digital census and satellite data have become unretrievable, because the technical infrastructure in which they functioned has become obsolete.
    Back to document

  10. See 'Executive summary' in: Preserving digital information : draft report of the Task Force on Archiving Digital Information, commissioned by the Commission on Preservation and Access and the Research Libraries Group. Version 1.0 August 23, 1995.
    Back to document

  11. SMITS, Jan (1995). Describing geomatic datasets with ISBD and UNIMARC: problems and possible solutions.
    In: The Liber Quarterly. Vol. 5, 1995, No. 3, pp. 292-311.
    Back to document

  12. IWI = Committee for the Innovation of Academic Information Services. This is a managerial platform of the Dutch universities (VSNU), the Royal Library (KB), the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Netherlands Organization for Scientific Research (NWO), which aims to coordinate the activities in the field of information services innovation, primarily with respect to education and research.
    Back to document

  13. UKB/CVDUR (1995). De grensverleggende bibliotheek : de innovatie van de Nederlandse wetenschappelijke informatievoorziening : een verkenning tot het jaar 2000. Utrecht, IWI, 56 p.
    Back to document

  14. OCLC/NCSA's Dublin Core descriptions are also called metadata, which according to them is something between indexes generated by general search engines and bibliographic data. Element description clearly points to bibliographic data enriched with access data, though why they do not follow the ISBD/AACR2 scheme is something I do not understand. Unfortunately only document-like objects are concerned.
    Information can be found in OCLC/NCSA metadata workshop report
    Back to document

  15. Working definition adopted by the ICA Commission on Standards for the Transfer of Spatial data at their meeting in Den Haag, August 17-20 1996.
    Back to document

  16. Federal Geographic Data Committee. This consists next to many government departments concerned also of the Library of Congress and the Archives and Records Administration.
    Back to document

  17. Fortunately this order uses the verb to document and not the verb to identify to differentiate from common library practises.
    Back to document

  18. FEDERAL GEOGRAPHIC DATA COMMITTEE (1994). Content Standards for Digital Geospatial Metadata (June 8). Washington, D.C., FGDC, 1994.
    The standard is superseded in 1998 by Content Standard for Digital Geospatial Metadata (CSDGM), officially known as FGDC-STD-001-1998, dated June 1998. Another recently available full metadata standard is the Core Metadata Elements for Land and Geographic Directories in Australia and New Zealand. The standards must be unzipped.
    Back to document

  19. This counts for the American as well as the European standards (for these see note 22).
    Back to document

  20. Ecoregion , retrieved through this URL, shows a metadata-description of the digital dataset Aquatic ecoregions of the conterminous United States, at the moment the shortest description I could find. Metadata-descriptions of two to three times the size are possible.
    Back to document

  21. United States Geological Survey.
    Back to document

  22. WORKING GROUP 2 of CEN/TC 287 (1996). Geographic Information - Data description - Metadata [English version] : Draft V2 - for 2nd informal vote by WG2. Brussels : CEN, 1996. 42 p.
    The draft is created with the EXPRESS-G model of ISO standard 10303. Unfortunately no examples could be located on the Internet.
    Back to document

  23. See 'Executive summary' in: Preserving digital inform ation : draft report of the Task Force on Archiving Digital Information, commissioned by the Commission on Preservation and Access and the Research Libr aries Group. Version 1.0 August 23, 1995.
    "Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent technology".
    Back to document

  24. International Cartographic Association.
    Back to document

  25. Comité Européen de Normalisation.
    Back to document

  26. International Organization for Standardization.
    Back to document


Back to GdC Homepage
Back to Jan Smits' Homepage