Ligue des Bibliothèques Europeénnes de Recherche, Groupe des Cartot
hécaires de LIBER
TRANSLATE ENGLISH to Français, Deutsch, Italiano, Português, Español! Explanation
Abstract
Digital datasets need deeper access than the current bibliographic record if we want to cater more for the wishes of users. As digital spatial datasets already are so overwhelmingly produced the creation of metadata is more urgent than in any other field. This led to the creation of American standards to produce the desired metadata. With these standards as example the library community is urged to make proposals to political bodies to create International Content Standards for Digital Metadata, which incorporates all kinds of digital datasets. These proposals should incorporate the enlistment of the producers of the digital datasets, who are designated to produce the metadata. These metadata are a prerequisite for collection development. They are a source for bibliographic descriptions and might also be used as a tool in understanding the problems of preserving digital datasets.
Trials and tribulations of knowledge
For millennia people have gathered information to help understand and govern the functioning of society and its natural environment. To make this knowledge more permanent they have recorded it, amongst others, on stone, clay, bark, papyrus, paper, etc., and in this recent age on electronic media.
The written/printed knowledge can be divided into two distinct classes: raw data and processed data. Many administrative (parts of) organisations have records of raw data, which, if not (purposefully) destroyed, found their way into archives. Through the ages much raw data has been processed in one or more stages into intermediate and final forms and been deposited in archives, libraries, museums, private households and so on.
Libraries contain many written and printed records which illustrate a process of selection, editing, renewal of sources, reworking of data, gaps of lost data leading to speculation, and so on. Sometimes the intermediate stages are in time so overwhelming that only ghosts of the original data are discernible. Raw data records, however, are the lesser part of a libraries holdings.
From hardcopy to digital: changes in sources?
As with so many developments the digital age started out with substituting. I.e., the ultimate form in which knowledge was reproduced remained the same, but the processes to arrive at this stage were different and faster. Input was/is still governed by output. This means that derivative processes like cataloguing have not undergone much significant change.
But the digital age creates also its own innovative developments. Has manipulation of data been a very tedious process in the past, digital technologies can overcome much of the difficulties. Not only brings it within our reach the manipulation of the aforementioned intermediate and ultimate stages of data. Because of its many available algorithms, its processing speed and the availability of enormous memory banks it can also help to reprocess the original or raw data. This will not happen much, but (re)processing can get closer to the original source than we are used to in libraries. Because of the available technology the value of the final result of processed information becomes less permanent. Andrew Tatham (Keeper of the Map Collection of the Royal Geographical Society) sees our future consequently as follows:
"We shall no longer provide the users with someone else's selection and presentation of data, but with the data itself and with the means by which the user can make their own selection and presentation of this data to inform or to mould their own or other people's image of the world"Ref. 2Of course the older media still will exist keeping all the inherent functions of libraries intact, but if his words will come true this means an enormous addition to our functions.
The case of spatial data
Compared with the information most departments of research libraries collect, map departments are on a way-of-no-return when speaking of digital data.
The bigger part of spatial data, that is all geo-referenced data including statistical data, are produced by governmental agencies, federal, state, provincial, municipal, etc. Mainly so because private organisations cannot bear the financial burden of keeping up a permanent framework in which these are gathered. This is especially true for geophysical, meteorological and demographic data, including aerial photographs and remote sensing images. For these governmental organisations the digital age came in the nick of time, because the imbalance between the amount of data produced on all levels and the number of personnel and the technology available to process them was getting bigger and bigger. Now almost all governmental agencies are creating or have created digital spatial databases which are the basis for their products. As Patrick McGlamery of the University of Connecticut, U.S.A., said:
"We have reached the point in spatial information revolution where the amount of spatial information available outstrips the ability to represent it cartographically".Ref. 3Not that hard-copy will not be produced anymore by the agencies, but they are a selection of all data available. Even if we want to we never can put these vast amounts of data on paper. And when we talk of vast amounts it is closer to terabytes than gigabytes. But these unused data may have potential for other or future users. It also means that we, map curators, must adapt quickly to digital practises or opt-out and become museums as some of us think will happen.
"By rigorously reducing the complexity of the matter in hand, we can say that the [digital] system we need consists of data, software and hardware. Each of these elements has its own life-cycle."Ref. 6
Figure 1: Succession of hardware and software generations
(From: BÜTIKOFER, Niklaus: Archiving electronic information: some aspects. In: The LIBER quarterly, Vol. 5, 1995, No.3, p. 276)
This poses the problem that the (cultural) value of digital data cannot be viewed independent from the environment it functions in. But the linchpin is the purveyor, in our case the maplibrarian.
"Probably the most important factor for the map library is the complexity of the software and the level of interaction it allows. This is important because of the degree of library staff input required, and the nature of tasks which can be performed." Ref. 7
We are constrained by our present knowledge and practices. The problem is, as Andrew Tatham stated:
"Does the map curator, as an individual, and does his or her institution, have the confidence to help bring about the future? ." Ref. 8That means, are we prepared to advance our knowledge and do we want to think digital, without losing that special understanding of spatial relations and patterns which has been the case with map curators until now. But if we want to change, we have to built on our strength: collecting, describing and providing access. And then we have to live with the uncommon practise that we may not own part or the whole of datasets.
Interacting problems
The digital age confronts us with (at least) two problems: How to access digital data and, if we own them, how to preserve them and their functionality. In library-practise up till now these problems were two distinct tasks of the institution, though the one supported the other.
In order to be able to access and preserve digital data we have to understand their functionality, use and the technical infrastructure in which they function Ref. 9 . This is only possible when we know more about their quality, their technical functions, their availability etc. It means we have to think digital. But are we able to do so? Can we acquire this knowledge in due time or are there better options?
If we do not act soon I'm afraid that, because of digital technology evolution, a lot of valuable digital data will fade away from the collective memory and leave a gap in our history.
We are not an island
Is the situation sketched in the paragraphs above only true for geo-referenced data or is it to come true for other science-fields the libraries cater for? We see that libraries are trying to adapt themselves to accessing CD-ROMs, other digital end-products and on-line sources mainly from the Internet. The library-community is aware of the fact that the purveyance of information is changing, especially information with a high added value.
But we built only partly on our strength, i.e. selection and access. What about preserving? Can we trust that the purveyors think about how to try to preserve the data they produce? The American Commission on Preservation and Access states in one of her reports that
"Without ... a fail-safe mechanism, preservation of the nation's cultural heritage in digital form will likely be overly dependent on marketplace forces, which may value information for too short a period and without applying broader, public interest criteria"Ref. 10Maybe we can help them by creating structures in which selection, access and preservation have equal value. Since the Paris Principles we have created the ISBD and MARCs to cope with the problems of ever rising amounts of information. Can we not advance this knowledge to the digital age, and I don't mean with that the ISBD(CF) and resulting changes in MARC-formats.
But ideas are sooner posed than realized. Though we might be wanting to think that ISBD's and MARC-formats were created by the sheer will of librarians to make valuable information more accessible the underlying drive was economics. By creating these standards libraries were able to enter the digital cataloguing age and produce cost-efficient enough to meet the ever increasing amount of information that was being produced, and at the same time placate governments c.q. politicians in granting ever bigger subsidies to keep them functioning. In this we did not differ from any other market-player, though our prime goal was and is as much free public access to information as possible.
However, times are a'changing, and libraries are pushed to become more competing market-players than ever before. This means not only that we have to be more cost-efficient, but that we have to treat information as a market commodity and keep it at the lowest costs possible available to our clients and the public in general.
In order that producers, public and private, are willing to cooperate in solving the problems of accessing and preserving digital information we must ensure that they can see profit from cooperation with the library field.
In the field of spatial digital data we think we can get a basic insight if we ask producers to aid us. They are the ones who have created the data, with their options and applicability, using their own highly sophisticated technologies. They are the ones who know best the ins and outs and what is valuable and what is not.
Metadata
But how is it that they can help us best in solving the digital problems we are confronted with. To understand the values of the data and the way they are structured we need a blueprint of the way they are designed and processed. It is already possible to make bibliographical descriptions, which can function in the present cataloguesRef. 11 . But a mere ISBD is not enough as its primary goal is to identify information. Neither will an abstract do as this circumscribes only the core of the matter.
We need information about information which identifies it, circumscribes it, gives information about its structure, functions, fitness of use, quality and authenticity. We call this metadata.
But I would first like to prevent confusion concerning the term metadata. All data about data is metadata. In casu this includes bibliographic data. However, only since the inclusion of digital datasets in library-holdings this term is used in a library context. In a report by the Dutch
IWIRef. 12 under the title The library breaking new
groundRef. 13 there is talk of a metacatalogue. As I read it this means a bibliographic database with descriptions of digital datasets. Though this is not concerned with hardcopy books I would prefer that these kind of descriptions, which are used to identify objects or sets of information, retain the designation bibliographic. (We also call descriptions of cartographic materials, non-book materials, music-sheets, etc. bibliographic, so why the change?) I presume philosophies about this subject in other countries in Europe are not
differentRef. 14 . Creating special bibliographic catalogues also confuses the researcher who preferably would like to search only one database which contains all kinds of bibliographic data independent of the information-carrier.
I prefer to define the term metadata as
"... data that describe the content, data definition and structural representation, extent (both geographic and temporal), spatial reference, quality, availability, status, and administration of a geographic dataset." Ref. 15In the following paragraphs I hope to illustrate what is meant by this.
For geospatial digital data, including processed remote sensing images, we have fortunately a good example from the U.S.A. On April 11, 1994, President William Clinton signed Executive Order 12906, "Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure". Section 3, Development of a National Geospatial Data Clearinghouse, paragraph (b) states:
" Standardized Documentation of Data. Beginning 9 months from the date of this order, each agency shall document all new geospatial data it collects or produces, either directly or indirectly, using the standard under development by the FGDCRef. 16 , and make that standardized documentation electronically accessible to the Clearinghouse network. Within 1 year of the date of this order, agencies shall adopt a schedule, developed in consultation with the FGDC, for documenting, to the extend practicable, geospatial data previously collected or produced, either directly or indirectly, and make that data documentation electronically accessible to the Clearinghouse network. " Ref. 17Soon after the FGDC produced on June 8, 1994 the Content Standards for Digital Geospatial Metadata; Ref. 18
Classes of metadata
Thus Metadata in this context are data about the contents, quality, condition and other characteristics of data.
In a nutshell the Content Standards for Digital Geospatial Metadata are concerned with the following kinds of information:
Except for fields 3, 4 and possibly 5 these field can be applied to any kind of digital information. For those who want to have a better insight in the use of this Standard I refer to the the FGDC.
Descriptions of digital metadata are available on the Internet through the Webpage Metadata? or the Geospatial Support Staff Metadata TutorialRef. 20 . These URL's also has many supporting papers for the use and creation of digital metadata and the functions of the American clearinghouse-system.
Though we are not yet developing in this field as the USA is, it might be good to take the advice of Patrick McGlamery at heart who points to the fact that the USGSRef. 21 is running grants for cooperative projects which stimulate exchanges between data producers and libraries among others. Though governments are retreating they sometimes have to create financial means to take up new challenges.
ICSDM (International Content Standards for Digital Metadata)
The American FGDC initiative soon was followed by the European Committee for Standardisation (CEN), Technical Committee (TC) 287, which has published a draft of a European Standard for Geographic metadata in January 1996, which should be finalized in 1997Ref. 22 .
However, both standards are restricted to spatial information. That they are developed so soon depended on the inability to manage the vast amounts of digital geospatial data being produced otherwise.
But it would be wiser to create metadata standards which encompass all digital data in the same way as the ISBD was produced to create a general bibliographic framework: a general framework including special definitions for special properties of special kinds of digital data. The standards probably have to be open-ended as digital technology is still evolving and we do not know what new kinds of metadata are called for in future. It probably means also that a new kind of thinking is called for concerning the different kinds of digital data, which will probably differ from our current thinking in non-serial, serial, non-book-materials, music, cartographic materials, and antiquarian materials. At the same time we have to keep in mind that we want to extract data for use in bibliographic databases.
If there is not any other body momentarily proposing to design such standards, why not libraries, which have a long history in documentation. I urge LIBER, maybe in cooperation with RGL and other organizations, to start planning and make proposals to political bodies (e.g. European Commission, Unesco, ISO, etc.) concerning this matter. But this time active cooperation is needed from producers, governmental as well as non-governmental. And also of information-specialist, as these standards will pass the stage of mere identification and access.
Because of the inside information we get from producers we may also get a better grip on the problems of collection development and of preserving vital digital data. However, be assured that it will not be an easy way, as these standards will make large inroads on finances of governmental and non-governmental organisations.
One of the tasks the American Commission on Preservation and Access has set itself is to use metadata for digital preservation through migrationRef. 23
. Though this Commission primarily focuses on document-like objects (i.e. documents which can be represented in a print format, which to my opinion excludes interactive digital spatial data) they may find functional structures in the FGDC-standards.
How the data has to be formatted and which kind of organizations will archive dynamic digital datasets is a question which is not the subject of this paper and therefor will not be answered for the moment.
Conclusion
The most important result of these standards will be that producers will have an obligation to deliver the metadata concerning their databases. As said before, they are the ones with the best knowledge of how these data were produced and how they can be used. But the blade cuts more ways. When the right metadata are provided the producer can market the digital datasets more efficiently, which probably will be an incentive for them to co-operate with libraries. On the other hand this information provides collection developers with a means for easier and better selection of digital datasets, and at the same time present more handles to cope with the problems of preserving these datasets.
We as librarians can also use the metadata to create catalogue-records, which will fit in our general bibliographic catalogues in order that users will be able to search a continuum from written and printed to digital material and will be enabled to pick from them what they can use best. This will put a greater strain on the competencies of librarians, but we shall have to live with that if we are to continue playing a vital role in the world of informatics. I can envisage a future in which bibliographic records are hyperlinked to records in metadata-databases, in this way enabling access on different levels for different users.
We may not be able to maintain free access to current information, as this becomes more and more an economic commodity, but we may be able to maintain at least the possibility that anybody can find the information that is needed. But time is short if we want to play a significant role in it.
AFTERWORD
This paper was read by Susan Vejlsgaard at the LIBER Annual General Conference in May 1996 in Malta. Since then the author of this paper -as observer of the IFLA Geography & Map Library Section- has attended a 4-day meeting, August 17-20 1996, of the ICARef. 24 Commission on Standards for the Transfer of Spatial Data. The main topic of this commission during its 1995-1999 cycle is "[to produce a book] which will examine and assess national and international metadata standards". Therefor some 15 national metadata (some existing, some in the process of being developed) and transfer standards will be examined together with the metadata standards of CENRef. 25
/TC 287 to which many European countries will adhere, and the
ISORef. 26 /TC 211. The CEN and ISO standards probably will be finalized in 1997. Knowing the backgrounds of the ICA Commission members I wonder how many of my colleagues are involved in this work and how map collections and/or cartographic information centres are preparing themselves to use these standards and how to develop the resulting clearinghouses for metadata and spatial data?
Acknowledgement
I am grateful to Pablo Garcia i Garcia and Anna Lluch i Galera for providing the environment in which I could first concentrate my thoughts upon this subject. I would like to thank my colleagues Susan Vejlsgaard (Det Kongelige Bibliotek, Denmark), Patrick McGlamery (University of Connecticut, USA), Chris Perkins (Manchester University, UK) and Tony Campbell (The British Library, UK) for their critical contributions, of which many I took to heart and incorporated in this paper.
Notes and references
Back to GdC Homepage
Back to Jan Smits' Homepage