Can a map be a geographic information retrieval tool?
Ligue des Bibliothèques Europeénnes de Recherche, Groupe des Cartothécaires de LIBER
TRANSLATE ENGLISH to Français, Deutsch, Italiano, Português, Español!
Can a map be a geographic information retrieval tool?
Jan Smits, Koninklijke Bibliotheek, The Netherlands
© LIBER and author
Published from: LIBER Quarterly, the Journal of European Research Libraries, ISSN 1435-5205, Vol. 10(2000), No 4. With permission from K.G. Saur Verlag, Munich, Germany
E-mail: jan.smits@kb.nl
Abstract
This paper is aimed at launching a not yet full-grown idea for discussion concerning co-operation. It is aimed at enabling participants with unequal resources and technologies to participate within their own restrictions in a pan-European project. Within this project digital maps not only function as geographic resources and educational tools, but simultaneously may function as interface to metadata-databases containing description of maps and spatial databases as well as locational or georeferenced resources in books, periodicals and the Internet.
Introduction
The Groupe des Cartothecaires (GdC) functions mainly as a platform for communication and information-exchange. This is effectuated in first instance by LIBER's kindness to put LIBER Quarterly at our disposal for publishing the papers and in the past also the progress-reports of our conferences. We hope that when LIBER decides to publish it in a hybrid form we can still make use of the paper form or have access to an integrated Internet-version. Although we are partly happy with our "Status Aparte" we think it is of overall importance that our collections and the developments within our field are integrated into overall library-developments.
The second channel we use is our own homepage, hosted by the Koninklijke Bibliotheek (Royal Library) of The Netherlands, which contains full-text versions of all papers, progress-reports, and other relevant documents published since 1992. It contains some 120 documents and presently is visited some 45-55,000 times annually.
Another channel are the communications of the Secretary of the GdC; among others a bi-annual newsletter of which number 6 is published in 2000.
The last channel available to us is a discussion list hosted by the National Library of Scotland for the sole use of those correspondents who possess e-mail. Though not generating heavy traffic yet we hope that our correspondents will use it more for discussions and exchanging ad-hoc information.
Another function which starts to materialise slowly is that of co-operation. Our Workinggroup for Central- and Eastern Europe is putting together a training programme for our Eastern European colleagues. Till now this has resulted only in some bilateral training, as we were unable yet to find the means to finance the larger part. We hope we can do so with the help of the LIBER-board or that the programme can be incorporated into a larger programme. Meanwhile the Annual General Meeting 2000 of LIBER has given a mandate to the LIBER Executive Board to set up a Task Force on Training, aimed at the library community as a whole.
Also since 1998 our Workinggroup for Education has been busy building up an Internet site,
The gateways to "Education and Literature for Map Curatorship", hosted by the Eidgenössische Technische Hochschule in Zürich. With incredible speed they have created a large body of information for education in mapcuratorship.
Co-operation can be effectuated in many ways, bur it would be great if the documents we conserve are instruments in this.
I would first like to relate something concerning a marketing-study, which shows the willingness of institutions to participate. Then a possible project in The Netherlands concerning a visual geographical interface to show a theoretical framework. And finally present an idea for a possible pan-European project, based on co-operation.
Market research
Between December 1999 and March 2000 the Alterra Library of the Agricultural University of Wageningen, The Netherlands, did a market research for a Union Map Catalogue and electronic delivery of cartographic materials (SLIJKHUIS, H. (2000). Alterra: catalogus voor kaarten : rapport over de marktverkenning in het project Alterra: catalogus voor kaarten, in opdracht van Alterra, Wageningen UR.). This latter object was almost a precondition. Of the 380 institutions addressed some 140 responded, which makes the drawing of conclusions viable. Most responding institutions work in the field of the earth sciences and need maps as professional tools for their daily work, projects or policies. About half of them have a staff of more than 100. More than 50% of these 140 institutions reported that all their personnel had Internet access, and the rest had less access. The response to the question whether they wanted to have access to an on-line catalogue was overwhelmingly positive. This is not an eye-opener seen the heavy use of on-line library catalogues.
When asked how they wanted access to this catalogue 98% responded they wanted to search for an area and 95% wanted to search thematically. Only 40% would search for a title or year of publication and almost all other responses were aiming at innate properties of cartographic materials, like scale, accuracy, technical data, format, source, etc.
When asked whether they wanted to pay for the development of and access to such a catalogue 57% responded affirmative, though this depended partly on the quality/price ratio. As a follow-up to this question they were asked whether they wanted to view simultaneously the item described and whether this might be a low-resolution image. Both questions had an affirmative respons far above 50%. Most did not want to pay for viewing a low-resolution item. The question of whether they are willing to pay or not depends mainly on whether the search is successful or to the ratio of use, the latter being a so-called truncated pricing system. Most respondents aim at a cost-free search system, but are willing to pay for positive results (i.e. resulting in ordering or buying specific datasets).
Most institutions want potential electronic delivery of map-files, and this to be done through standardised transfer standards to enable receivers to process the electronic files directly into their own systems, mainly being GIS, i.e. Geographic Information System.
Another result of this marketing research is that a lot of institutions see profit in co-operation creating this union catalogue and data-delivery system.
In the meantime this sytem has been realised and can be viewed on the
homepage.
I am of the opinion that when this kind of marketing-research is done in other areas or countries with groups of professional users of geographic information the results will not differ much from the ones described above.
An atlas as a geographical interface
As an example I would like to sketch a project which was drafted in co-operation with some university departments of cartography in The Netherlands. The idea is to have maps function not only as geographical information sources, but research whether it is possible to have the same maps function as visual interface for metadata-databases. For this project we looked at three possible databases which could function together within this frame.
- National Clearinghouse Geo Information (NCGI, click on English): This metadata-database of the NCGI contains presently more than 1,500 descriptions of digital spatial datasets with visual examples of some 17 producers of geospatial data. The datasets range from several 100s Mb to 10s of Gb.
- AvN: The second analogue edition of the Scientific Atlas of The Netherlands (AvN, published 1984-1990) contained some 1,000 maps, which give a comprehensive view of the socio-economic, physical and ecological situation of The Netherlands. These maps are now digitised, but give not up to date information.
- Dutch Depositary Collection in the Koninklijke Bibliotheek (Royal Library).
There is a database of some 30-40,000 descriptions of cartographic documents concerning The Netherlands from 1975 onwards within the framework of the Dutch Depositary collection in the Royal Library. The Royal Library phased out the CCK-system (See:
VELDEN, G.J.K.M., P.J.M. Douma and J.G. Zandstra (1990): CCK : making cartographic materials accessible. In: The LIBER Quarterly 2(1992)2, pp. 192-208) in 1999 and will convert these description in 2003 to the
database.
The NCGI concerns mainly large-scale datasets with a high economic value. The owners of these datasets, which are usually (semi)governmental or academic bodies, are usually forced to recoup part or most of the costs involved in producing these data. At the same time these producers aggregate these data into the middle- and small-scale data necessary to create the maps used in the Scientific Atlas of The Netherlands. In the age of analogue products they did this service for free. But in the digital age it is hard to get the same services from them. This means we must find a modus to entice them in such a way that they are willing to create and provide these data for free. We think we can reach this goal by offering the maps of the Scientific Atlas of The Netherlands as a visual geographical interface for the NCGI.
Geographical interface for The Netherlands
The geographical interface would provide several search strategies:
- When clicking on a map of The Netherlands one can search the underlying metadata-databases for datasets which also cover the whole of The Netherlands.
- When making a cutout of a part of The Netherlands one can search the underlying metadata-databases for datasets which also cover that part of The Netherlands.
- When ons zooms in the interface will query the underlying metadata-databases for datasets, which cover that specific part of The Netherlands.
The queries will be effectuated by mainly using bounding-box co-ordinates. There must be, however, also possibilities for searching on point locations, within a radius of a point, a bounding polygon, etc., and when the computer can read the index-map also on named area, administrative sub-division or locality description, when necessary using a geographic thesaurus. But it must be also possible to use traditional geographic and thematic dictionaries to come to the same result.
We try to sell the idea to the producers as follows. Is there a better way then the one described above to advertise the economically interesting datasets through such an interface and at the same time service education and the general cause! Because the Scientific Atlas of The Netherlands will not only function as a visual geographical interface, but at the same time will be an informational and educational resource. The educational functions will be enhanced and become interactive when an Online Mapping Application (OMA) will be incorporated with which users can manipulate certain aspects of the maps offered. Maps produced with this application can be selected by an editor and added to an archive to serve as examples for future users.
At the same time some scanned samples of newspaper-maps have been added to the present database to get or give some insights in what way mass media use scientific mapping data to inform the public in general. In future digital maps from newspaper-archives could be downloaded to a subdatabase of the Scientific Atlas of The Netherlands and have the same function as maps added from the OMA-activities.
But why should we restrict ourselves to spatial metadata-databases for digital materials and not try to include metadata-databases for analogue materials. As long as the descriptions include geographical bounding-box co-ordinates the same kind of queries can be made on bibliographic databases as can be made on the NCGI-database. Because researchers require disparate sources to do their research, and because not all necessary information will be available on the Internet or in ready digital form, it would be a miss when bibliographical databases are left out of this scheme. The 30-40,000 descriptions of maps, which will be loaded into the PICA-database contain all the necessary geometric or mathematical data necessary to use the query-functions as envisaged with the NCGI.
Furthermore the idea of one-stop-shop information gathering is so prevalent that we must do our utmost to realise this concept with the tools and data at hand.
The problem is best described in a paper by Ray R. Larson
(LARSON, R. Ray (1996). Geographic information retrieval and spatial browsing. In: GIS and Libraries: Patrons, Maps and Spatial Information (ed.by Linda Smith and Myke Gluck), pp. 81-124), which discusses geographic information retrieval (GIR) within the Alexandria Digital Library project for both map like spatial objects as well as georeferenced material. "As with traditional print libraries, […] information can be indexed and retrieved in a variety of ways, ranging from purely descriptive cataloging of items in the database and topical analysis of content, to more specialized methods of classification and description that exploit the characteristics of digital information". It not only describes the background but also discusses some tools for automated geo-referencing of text.
In discussions with the Netherlands Council for Geographic Information (RAVI), an advisory ministerial body, we have got the idea that a project as described above might be realisable within some years.
A changing metadata creation and retrieval environement
This Dutch project in first instance is concerned with modern material, in which the maps for the interface have to be created yet. In a European context I would like to start from a different premise.
One of the problems, which might keep us awake is what the library will be in future. Will we become pure information brokers for the content we own or will we be able to deliver something more than our own content.
At the moment we are building metadata-systems which are aimed at the traditional library functions of "find", "identify", "select" and "obtain". At the same time we are creating more kinds of manifestations (in the wording of IFLA's report
Functional requirements for bibliographic records (UBCIM publications - new series, Vol. 19) of the content we hold, like photographs, microfilm and digitised objects. This doesn't create so much newer services as well as might make the delivery of certain objects faster or easier.
In the case of metadata some work might be done yet to enhance our databases. I do not think the minimal way of describing as Functional requirements for bibliographic records advises will lead us to paradise. But cost-effectiveness seems to give us no choice. I think that we should make computers and software do more of the work.
For instance in The Netherlands some 2 million pages of daily newspapers are being microfilmed (Koninklijke Bibliotheek laat twee miljoen krantenpagina's op microfilm zetten).
The reason for this project of course is conservation. But as soon as they are microfilmed we can enhance our metadatasystem. Lots of clients ask for specific information from papers but really do not have any idea where to look for it, except that the fact has taken place in a certain area at a certain time. Should the library-world decide to digitise this microfilmed content I think it is possible to index its contents rather easily. Papers are published in a certain lay-out using a special character-set and use specific heights for different kinds of headings. It must be possible to develop some software, which can easily read the contents and recognises the function of certain pieces of text by their form and lay-out. In this way the computer can robotically create an index to the digitised contents, which makes access much more satisfactory. The technology could be alike to the Pharos architecture (See:
DOLIN, R. (1998) Using automated classification for summarizing and selecting heterogeneous information sources. In: D-Lib Magazine (ISSN 1081-9873), January 1998) used in the Alexandria Digital Library, though the first subject of investigation here was newsgroups.
This hasn't done anything yet to the information value of the content we provide. These are not more than objects, which are linked either through metadata or through references included in the object. I think that libraries should not only provide the relations of "works", "expressions", "manifestation" and "item" as sketched in the report Functional requirements for bibliographic records, which is aimed at metadata, but also find a mode to deliver inherently the documents which are described in this string. That is, we should look at ways in which we can remould the content we own into more value-added content. Naturally the content we own is limited to the copyright-free material and the material on which the owner waives its right.
What we have to try to imagine is a future library, where its data and metadata will function as an active and dynamic component of the digital information world.
Multi-speed pan-European co-operation
Let us now turn to maps. Cartographic materials are knowledge sources in themselves as they present in a graphical way uniquely aggregated information concerning phenomena directly or indirectly associated with a location and time relative to the surface of the earth. In this present case the topic is topographic maps, which display concrete objects or phenomena which are visible on the earth's surface and as such are represented. These are physical or manmade features like waterways, vegetation, settlements, roads, railways and canals, and orography or representations of height. And as we can use in first instance only copyright-free material I limit myself to 19th century topographic maps as we do not have the situation as in the United States where contemporary topographical maps are freely available and access is provided to every map of the USGS in the scales 1:100,000, 1:63,360, 1:25,000, and 1:24,000 for the entire United States.
Why 19th century maps? In the 19th century state- and regional governments started for the first time to publish large scale map-series (scale 1:25,000 to 1:300,000) based on geodetic triangulation, including geographic co-ordinate-grids, and covering the whole of the country or area with the same information content.
Being knowledge sources in themselves they can be used at the same time as digital geographical interfaces thanks to their underlying mathematical properties.
The purpose of this paper is to launch the idea that in a Europe without borders and without loss of regional cultural contexts these topographical map-series can be used to foster a multi-phased pan-European project in which unequal partners with unequal resources can co-operate. The availability, lack or divergence of certain technologies do not have to be a hindrance to achieve common goals. And of course the Internet will be the carrier-technology.
To create a framework a first stage could be availability of the index-maps to the map-series as this involves mainly the scanning of the paper index-maps. This will not create a seamless index-maps as the series are sometimes border-crossing or overlapping, are different in time-coverage and information depicted, and may have different mathematical properties. A super-index could be created by scanning a map of Europe with national boundaries and links to the separate island-index-maps. This will show that many topographic map series will cross current national borders, but this does not have to be a hindrance.
In a second stage these index-maps should be linked to the scanned or digitised version of the sheets of the map-series. The link should not be created by incorporating the addresses of the scanned maps, but by geographical co-ordinate search (point-in-polygon, region, distance and buffer, path) using the geometrical properties of the index-maps and the resulting queries of bibliographic databases, which describe among others the geometric properties and mathematical data of the scanned images. The realisation of this part of the project could be done as island-projects which slowly evolve into an overall patchwork map in many parts and possible layers. Development with different regional speeds may have a very good educational aspects as areas behind in certain technologies may learn from those which are in the front. And it must be possible to use different standards of quality as maps from countries with a high amount of built-up areas will have a different information-content from countries with a large amount of rural area.
The history I have lived through has taught me that a dirigistic approach to such a project is sure to fail as highest standard practises will outcast certain possible participants which can only attain lower standard practises. It will be acceptable that certain areas are finished soon while others might still have to wait for some years before they can participate.
The realisation of the images may become effective in .gif, .jpeg, tiff or any imagery-format accepted by Internet protocols, and that they could be in many resolutions as single, multiple or wavelet-images. But they can also be made into interactive maps by including their geometric properties as a dynamic basis.
There's no end to possibilities
But this could be the beginning of a larger idea.
Locational information can be found in much more sources than only cartographic materials. When we look at documents concerning statistics, history, biology, ecology, travel, etc., many of these are concerned with a certain location on earth. In traditional library-catalogues these are mainly accessed by subject-matter, but not according to geographical area. It is my idea that a researcher is always searching for a body of interrelated information which will provide knowledge for the subject he/she is working on. And we would like maps to be studied in context, which means textual and statistical resources should be available at the same time.
Of course we can use the traditional geographical- and subject dictionaries and thesauri for this purpose, but why not using the unique mathematical properties of maps in an age where visual information and the possibility of aggregating information in graphical representations are becoming more and more predominant. When we can translate subject thesauri for locations (like Amsterdam, The Netherlands, U.S.A., etc.) through a conversion table to bounding-box co-ordinates the same interface might be used for directing users to these non-map resources.
In the paper of R.Ray Larson the GYPSI-model is described and summarised. GIPSY, the Geo-referenced Information Processing System, was developed as a new model of automatic geographic indexing for text documents. In the GIPSY model, words and phrases containing geographic place names or geographic characteristics are extracted from documents and used to provide evidence for probabilistic functions using elementary spatial reasoning and statistical methods to approximate the co-ordinates of the location being referenced in the text. The actual "index terms" assigned to a document are a set of co-ordinate polygons that describe an area on the Earth's surface in a standard geographical projection system. The GIPSY method for automatic geo-referencing is described in detail by Woodruff and Plaunt (WOODRUFF, A.G. & C. Plaunt (1994). GIPSY: Geo-referenced Information Processing System. In: Journal of the American Society for Information Science, 45, pp.645-655). Later this method was evaluated, together with the POSTGRES method and the TextTilling method in a paper for the Sequoia 2000 project ( LARSON, Ray R. et al. (1995?). The Sequoia 2000 Electronic Repository. In: Digital Technical Journal. 7(1995)3, pp. 50-65) in the framework of the Global Change programme.
On the Internet several (proto)types of Geographic Information browser which can perform this kind of function we are looking at are available, e.g.
- The Environmental Resources Information Network [ERIN] Unit of Australia has developed a generic WWW map interface using a collection of simple map images and a standard lookup table to provide visual interactive access to geographically-related information (.CROSSLEY, David & Tony Boston (1995). A generic map interface to query geographic information using the World Wide Web.)
- MERI (Meadowlands Environmental Research Institute, New Jersey, U.S.A.) built a WWW-based interface to the database, integrating web server, database server and GIS server technologies. Through a map interface, a user can obtain a list of documents that studied a particular area. Conversely, through a text interface, a user can obtain a list of documents that report on, for example, a land use or cover type or a specific water body, with sampling/analysis locations displayed on a map
(BARRETT, Kirk R. , Richard Holowczak & Francisco J. Artigas (1999). A database of environmental documents about an Urban Estuary, with a WWW-based, geographic interface.)
Conclusion
The newest search-engine of Netscape, called
, is a combination of a display of the information found in a graphic environment as well as lists of hits. It is my belief that this is the first in the development of new search-engines, which have the advantage that they can visualise the overall supply of digital objects concerning a certain topic-search.
[This is realized, among other things, by e.g.
in 2002 with its 'image search', though in the beginning of 2003 this covered only about 15% of the text-based web. Update 2003-02-18]
The same can be done using maps in a historical context. Converting or creating the descriptions to modern catalogue-systems seems not to create a much more effective access. But using them as interfaces might give them more value as access-tools as well as using their information content far more efficiently.
Back to GdC Homepage