Stephen M. Richard

Stephen M. Richard
Columbia University Lemont-Doherty Earth Observatory

2018 Outstanding Contributions in Geoinformatics Award

Presented to Stephen M. Richard

Citation by Xiaogang (Marshall) Ma

Dr. Stephen Miller Richard is an internationally-known geoinformatician who has developed and used software to automate the process of geoscience data capture, management and dissimilation. As a pioneer in the joint field of geoscience and computer science, Steve's work has inspired and has been applied in various national and international efforts of geoscience data standardization and interoperability. His enthusiasm and active engagement and participation have promoted the formation of the geoinformatics community in the United States and across the world.

Steve's undergraduate training was at the Massachusetts Institute of Technology, where he obtained degrees in Earth and Planetary Science and in Electrical Engineering. Later, he received Master and PhD degrees in Geology in from the University of Arizona and the University of California, Santa Barbara, respectively. In 1992, Steve was appointed as research geologist at the Arizona Geological Survey (AZGS). In 2016, he retired from the AZGS to work as a technical director with the Interdisciplinary Earth Data Alliance (IEDA) until January 2018. Steve remains as an adjunct research scientist with IEDA.

Steve's work at AZGS has addressed the urgent needs of managing and disseminating geologic information on the Web. Starting in the 1990s, he was involved in development of standards to support digitized geologic mapping. From 2000 to 2007 he worked with the USGS on the National Geologic Map Database. Later, as chief of the geoinformatics section at AZGS, Steve was engaged in geologic data management and web services, including the US Geoscience Information Network and the National Geothermal Data System. Steve made significant contribution to international standards for geoscience data exchange, including the GeoSciML and the XML implementation of ISO 19115-1. Steve has been an active member and leader in various organizations and groups. In particular, he served as chair of the Geoinformatics Division of the Geological Society of America (GSA) for 2010-2011, and led the healthy growth of the division.

In recognition of his exceptional merits, the GSA Geoinformatics Division is pleased to select Dr. Stephen Miller Richard as the recipient of the 2018 'Outstanding Contributions in Geoinformatics Award'. Congratulations, Steve!

top8 Contributions to Geoinformatics Award — Response by Stephen M. Richard

First, I would like to thank the Geoinformatics division, and Marshall Ma for this unexpected honor. I was disappointed not to be able to attend the award ceremony at the meeting in Indianapolis, and apologize for my tardiness in providing this response. I am going to take this opportunity to review the development of geoinformatics as I have seen things since my 'awakening' in the 1990's. This is a subjective and somewhat rambling history, please forgive any errors or omissions. Some of the developments I discuss here have happened since the awards ceremony, but are relevant!

I became interested in the possibilities for geologic knowledge representation and exploration using computer technology when I was introduced to ESRI's Arc/Info software as a post-doc at UC Santa Barbara in 1991. It was a small project, using some simple algorithms to produce seismic hazard assessment in southern California; in the end too naïve for practical application at the time, but a great learning experience for me.

When I started work at the Arizona Geological Survey in 1992, we were producing geologic maps with rapidograph pens on mylar and making copies on a blue line ozalid machine. Exploring possibilities for creating digital versions of these geologic datasets led to my engagement with Dave Soller, Boyan Brodaric, Bruce Johnson, Gary Raines, Jerry Weisenfluh, Jordan Hastings and many others involved in developing database schema for geologic map data in a collaboration between the USGS, US State Geological Surveys and the Canadian Geological Survey. The initial thought was to develop a relational database schema that everyone could use, thus making sharing data transparent. Under the aegis of the USGS National Geologic Map Database project, this group released a draft data model in 1999 (version 4.3, https://ngmdb.usgs.gov/www-nadm/prd/Model43a.pdf) for evaluation by the North American geoscience community. It quickly became apparent that the various requirements and practices of different user groups led to different adaptations of the database implementation, with resulting loss of interoperability.

My participation really began with the next step in the evolution of thought-the development of an implementation-independent conceptual model for representing geologic information, and concomitant development of a set of vocabularies to categorize geologic phenomena. A steering committee was formed in early 1999 -- the North American Geologic Map Data Model (NADM) Steering Committee, and workgroups were assembled to work on the conceptual model and the science language vocabularies. A series of workgroup and steering committee meetings between 2000 and 2006 supported development of a conceptual model (NADM Conceptual Model 1.0 -- A Conceptual Model for Geologic Map Information, USGS Open-File Report 2004-1334, https://doi.org/10.3133/ofr20041334), and a series of informal publications documenting the science language workgroup products (see https://ngmdb.usgs.gov/www-nadm/sltt/products.html). My particular interests were in development of the conceptual model (Richard, 2006), and a hand-sample scale classification for metamorphic rocks (https://pubs.usgs.gov/of/2004/1451/sltt/appendixB/appendixB.pdf).

At this point, I'd like to acknowledge and thank Dave Soller, manager of the USGS National Geologic Map Database project, for his support of my participation in these activities, without which I would have stayed at home in Tucson drawing maps with rapidograph pen on mylar.

With Dave Soller's support I worked on the USGS National Geologic Map Database project to develop an implementation of the NADM Conceptual model, with scope and granularity to support the many mapping requirements of the USGS. This work led to a comprehensive and very expressive relational database implementation (e.g. Richard et al, 2004; Richard et al., 2005). A multiple-year effort to implement this design, with tools to enable working geologists to use the database, failed. Some hard lessons were learned, foreseen by some colleagues with more experience but less youthful exuberance. Research geologists are interested in details and edge cases; trying to account for all their possible information desires leads to complex data design; complex data design make usability problematic, and slows performance. The available technology at the time was not ready for the multi-step linking between observation procedures, results, metadata, and descriptive characterization of the heterogeneities in the Earth. The performance of a test multiple-map database implementation using ESRI geodatabase, with 15 1:24000 scale quadrangles in the Tonto Forest of Arizona (Richard, 2015), implemented by the Arizona Geological Survey, was unacceptable. The project team realized that implementation, training, tools, and support for the kind of database we had implemented was going to require resources far beyond what was available, and the business case for making that investment had not been demonstrated.

Learning from this development and testing work led to new developments. First the realization that a single database design was not going to meet the needs of all the various users. What might solve the interoperability/data interchange problem is an interchange format that is expressive enough to account for most use cases, and not bound to a particular database implementation. The scope of such an interchange format needed to be international. At an international meeting in Scotland in 2003, the GeoSciML project was initiated under the auspices of the IUGS Commission for the Management and Application of Geoscience Information (CGI). A working group was formed to develop an XML interchange format, with a first meeting in Perth, Australia in December, 2004. Models from the British Geological Survey, Geologic Survey of Canada, the NADM CA, USGS NGMDB, New South Wales (AU), Geological Survey of Japan, Geological Survey of Victoria (AU), and the broader Open Geospatial Consortium (OGC) community were reviewed, and used as the basis for developing an XML implementation for data interchange, with versions released between 2008 ad 2015. This work brought me in contact with a new cast of influential characters, including Simon Cox, Bruce Simons, Alasdair Ritchie, Eric Boisvert, John Laxton, Tim Duffy, Juoni Vuollo, and Francois Robida to name a few.

The initial releases of GeoSciML suffered from the same problem that had impeded adoption of the NGMDB database design - it was designed for what geologists dreamed of doing, not what they were actually doing in their day-to-day work. Off-the-shelf web service data interchange tools were not available, or too complex for non-technical users.

This is where the world of data interchange intersected the world of USGS National Geologic Mapping Program (NGMP) data delivery. For years, the USGS program had required the delivery of map images-e.g. a Portable Document Format (pdf) rendering of the geologic map data acquired under an NCGMP contract. A vision of the original NGMDB effort in 2004-2007 was that data would be delivered in a standard SQL database format. As early as 2008, simplifications of the information-rich, complex designs of GeoSciML v. 1 or the NGMDB database design were being proposed (Richard and Soller, 2008)-focused on the information that was actually being collected and utilized in geologic map data. In the USGS National Geologic Mapping Program (NCGMP) community, this led to a proposal for 'NGMDB lite'-a simplified geologic map data model (USGS NCGMP, 2010). This simplified design has subsequently evolved into the 'Geologic Map Schema' (GeMS) (USGS NCGMP, 2020), which is being adopted by the NCGMP for geologic data delivery. Similar thinking in the GeoSciML data modeling group led to the GeoSciML portrayal schema (2011, 2013), which has seen better adoption that the full GeoSciML model.

In 2006, Lee Allison became the State Geologist and Director of the Arizona Geological Survey, and brought a new level of energy and commitment to the geoinformatics program at the AZGS. Once again, I owe a great debt of gratitude to Lee for his support and guidance; his loss in 2016 was a blow to myself and to the community. In 2008, the AZGS received funding to develop architecture for a Geoscience Information Network (USGIN), with collaboration between the Association of American State Geologists and the US Geological Survey (NSF EAR-0753154, 2008). Through this project, we developed a metadata profile based on ISO 19115 (Richard and Grunberg, 2009), set up a catalog for aggregating metadata about geoscience datasets, prototyped the use of shared 'content models' for interoperable data delivery, and implemented prototype Open Geospatial Consortium Web services for delivery of data on the web. This work was proceeding in parallel with the simplification efforts for the NGMDB and GeoSciML, building on the ideas that data delivery should use simple formats and data schema compatible with the spreadsheet and GIS software capabilities in common use for data analysis and exploration.

Through Lee's motivation and networking, and based on the USGIN architecture concepts, the AZGS was selected as the prime contractor for the development of a National Geothermal Data System, funded by the U.S. Department of Energy (DOE) Geothermal Program under the American Recovery and Reinvestment Act of 2009. This project engaged 46 state geological surveys, the USGS and several academic partners in compiling new and legacy data relevant to geothermal energy exploration and development. It provided an unparalleled opportunity to produced standardized datasets using simple flat-file data schema, linking data with resolvable URIs, publishing data using OGC Web Services, and cataloging datasets for discovery and access. As the AZGS section leader for geoinformatics, I was tasked with development and implementation of the architecture. By 2013 we had an operational system (Anderson et al., 2013). Major project funding ended in 2014; the system is still operational, but suffers from a ubiquitous problem. Maintenance of online data systems requires ongoing effort to keep up with technology advances, data updates, and personnel turnover. Building a user base requires continuous marketing, community outreach and training. Funding for these operational costs is difficult to obtain, and requires significant effort. Currently (2022), NGDS data is being archived for long-term preservation, and options for updating services are being explored.

Development of the data catalogs for USGIN and the NGDS motivated engagement with the ISO 19115 metadata standards process, and I worked on the XML implementation of the ISO 19115-1 metadata standard between 2010 and 2016, working closely with Ted Habermann and David Danko among others. This work attempted to bring some of the simplification concepts developing in the geoscience data interchange community into the metadata world. If you're familiar with the original ISO19139 XML implementation of the ISO19115 metadata standard, you probably recognize that it suffers from the same problems that hindered adoption of the first NGMDB database designs or GeoSciML v3-trying to do too much in one model, resulting in a complex schema and steep learning curve for new users. As an international standard adopted by many government agencies with authority to mandate usage, ISO19115/ISO19139 certainly achieved a wider user base than the geoscience data schemes, but complaints about its complexity were common. The ISO19115-3 implementation attempted to alleviate this problem by breaking the monolithic ISO19139 schema into modules to support development of profiles that could range in complexity from quite simple to very expressive (and complex). Adoption of ISO19115-3 has been slow. There is a significant installed base of metadata management and catalog systems based on the original 19139 XML implementation, and technology migration to the new scheme requires a significant investment. In many ways, the XML implementation has been overtaken by technology developments.

In 2011, the National Science Foundation initiated a series of meetings to plan for development of EarthCube-a cyberinfrastructure to support Earth Science Research. From my perspective, this began a long collaboration with Ilya Zaslavsky and Dave Valentine at the San Diego Supercomputer Center. We worked on a series of white papers and road maps for Earth Cube Architecture and implementation, focusing on cross-domain interoperability (Zaslavsky et al., 2012), and conceptualizing the cyberinfrastructure as a marketplace for software tools and data (Zaslavsky et al., 2016).

This collaboration led to implementation of the DataDiscoveryStudio (http://datadiscoverystudio.org/), a metadata aggregator based on the Geoportal open source software (https://github.com/CINERGI/geoportal-server-catalog), and using the ISO19115/ISO19139 metadata schema and implementation. This catalog contains almost 1.7 million items (as of 2023-01), harvested from many different sources of interest to EarthCube. The project developed tools to use text analytics to provide additional keywords, tools for building collection of resource to share and recall, implements spatial search, metadata conventions for linkage from a discovered dataset to software tools that can operate on data in the offered distribution format, and various other upgrades to the UI presentation from the base open-source community project.

It was at this point, in July, 2016 that I retired from the Arizona Geological Survey, and accepted a position with the Interdisciplinary Earth Data Alliance (IEDA) at Lamont Doherty Earth Observatory, under the direction of Kerstin Lehnert. My engagement with EarthCube continued, with projects to develop a generic data entry portal to support researchers submitting datasets for archive and accessibility. The portal was designed to assist users create metadata to support discovery and reused of submitted datasets, and to direct data to the most appropriate repository, based on data type and domain relevance. A prototype was implemented, but funding was insufficient to move into a production mode with all the necessary documentation, training, marketing, and interface refinement required. The concepts developed in the project have recently (2021) been implemented in production systems for Hydroshare. Another major activity during my time a Lamont was migration of the PetDb geochemical database to a new data schema using the ODM2 data model (https://www.odm2.org/). Mapping the data content to the new schema was completed, but the migration was hampered by the complexity of updating user interfaces and workflows to the new backend data design. During this period I was commuting between Tucson, AZ and Palisades, NY; this proved to be both taxing for my family, and financially unsuccessful. In Jan. 2018 I resigned my position there and shifted my base to Tucson as a contractor.

To enable the marketplace concepts presented in the Zaslavsky et al. (2016) conceptual design, the EarthCube office provided funding to a working group (of which I was a member) to develop an EarthCube Resource Registry for registration and discovery of information artifacts useful for research; these include semantic resources like vocabularies or ontologies, software, interface specifications and interchange formats for data interoperability (Duerr et al., 2019).

In 2019, a new approach to metadata harvesting and aggregation was adopted by EarthCube management, using the schema.org vocabulary developed by Google, Microsoft, Yahoo and Yandex, and a JSON-LD encoding of the dataset documentation embedded in web pages listed using sitemap XML documents registered by data providers for harvesting. The idea of updating the DataDiscovery Studio harvest process to index metadata from these embedded JSON-LD scripts and build on the existing user interface was rejected. New software developed by Doug Fils and the EarthCube team was adopted to extract the JSON content, convert to RDF triples and access via SPARQL queries from a triple store. A new user interface has been implemented and is still in development (2022-11) and refinement. The future benefits of a linked data approach to metadata and resource linkage are apparent to me, but the current cost has been a step back in functionality and content. This tradeoff is another dilemma posed in system design, and raises an important question-should cyberinfrastructure implementation be a research project, or thought of as a more mundane business process using proven technology? The development of infrastructure in the EarthCube context as research projects has produced many interesting prototypes, but few actively used production systems. In the world market for software and user attention, geoscience information is a tiny niche. Large commercial enterprises might find our problems to be interesting, but only insofar as the solutions might be applicable to wider target users generating some kind of cash flow.

I'll leave off here, even though there are lots of current interesting development threads in vocabulary harmonization, geochemical data interoperability, and data discovery (to name a few). Looking back, the dreams and ambitions I had 30 years ago are not met, but progress has been incremental and continuous. Painfully slow it seem sometimes, but the reality is that the technology is not the limiting factor-social engineering is the challenge and takes time.

In closing, I'd like to thank Marshall and the GSA Geoinformatics Division again for this honor, and say that whatever contributions I've made are the product of collaboration, and built on the shoulders of many others.

Thank you.

References

Richard, S.M., Craigue, J.A., Soller, D.R., 2004, Implementing NADM C1 for the National Geologic Map Database, https://pubs.usgs.gov/of/2004/1451/pdf/richard.pdf.

Richard, S.M., Soller, D.R., Craigue, J.A., 2005, NGMDB geologic map feature class model, https://pubs.usgs.gov/of/2005/1428/richard/.

Richard, S.M., 2006, Geoscience concept models: Geological Society of America Special Paper 397, p. 81-107, https://doi.org/10.1130/2006.2397(07).

Richard, S.M. and Soller, D.R., 2008, Geologic content specification for a single-map database, https://pubs.usgs.gov/of/2009/1298/pdf/usgs_of2009-1298_richard2.pdf.

NSF EAR-0753154, 2008, INTEROP - Geoscience Information Network, https://nsf.gov/awardsearch/showAward?AWD_ID=0753154.

Richard, S.M., and Grunberg, W., 2009 (rev 2016), USGIN Metadata Profile: Use of ISO metadata specifications to describe geoscience information resources: https://www.researchgate.net/profile/Stephen-Richard/publication/258470777_USGIN_ISO_metadata_profile/links/5b2494450f7e9b0e374b0fb4/USGIN-ISO-metadata-profile.pdf .

Richard, S.M., Soller, D.R., and Percy, D.C., 2010, NGMDB-Lite-Database design for the National Geologic Map Database data portal, https://pubs.usgs.gov/of/2010/1335/pdf/usgs_of2010-1335_Richard.pdf.

U.S. Geological Survey National Cooperative Geologic Mapping Program [USGS NCGMP], 2010, NCGMP09-Draft standard format for digital publication of geologic maps, version 1.1, in Soller, D.R., ed., Digital Mapping Techniques '09-Workshop Proceedings: U.S. Geological Survey Open-File Report 2010-1335, p. 93-146, 4 appendixes, https://pubs.usgs.gov/of/2010/1335/pdf/usgs_of2010-1335_NCGMP09.pdf .

Zaslavsky, Ilya; Altintas, Ilkay; Arctur, David; Brotzge, Jerry; Couch, Alva; Domenico, Ben; Hooper, Rick; Lehnert, Kerstin; Murphy, Philip; Nativi, Stefano; Plale, Beth; Richard, Steve; Stocks, Karen; Valentine, David, 2012, EarthCube RoadMap: Cross-Domain Interoperability, https://doi.org/10.5281/zenodo.7502074.

Anderson, A., Blackwell, D., Chickering, C, Boyd, T., Horne, R., MacKenzie, M., Moore, J., Nickull, D., Richard, S., and Shevenell, L., 2013, National Geothermal Data System (Ngds) Geothermal Data Domain: Assessment Of Geothermal Community Data Needs: in PROCEEDINGS, Thirty-Eighth Workshop on Geothermal Reservoir Engineering Stanford University, Stanford, California, February 11-13, 2013, SGP-TR-198, https://pangea.stanford.edu/ERE/pdf/IGAstandard/SGW/2013/Anderson.pdf.

Richard, S.M. (compiler), 2015, Geologic map of the southwestern part of the Tonto National Forest, Gila, Maricopa, Pinal and Yavapai Counties, Arizona: Arizona Geological Survey Digital Geologic Map, DGM-76 & DI-41, map scale 1:130,000: http://repository.azgs.az.gov/uri_gin/azgs/dlio/1615.

Zaslavsky, Ilya, Richard, S. M., Gupta, Amarnath, Malik, Tanu, Valentine, David, 2016, Conceptual Design for EarthCube: Market Manifesto for Science: https://doi.org/10.5281/zenodo.7502122.

Duerr, R.E., Richard, S.M., and Zaslavsky, Ilya, 2019, Final Report EarthCube Resource Registry Implementation: https://doi.org/10.5281/zenodo.3840743.

U.S. Geological Survey National Cooperative Geologic Mapping Program [USGS NCGMP], 2020, GeMS (Geologic Map Schema)-A Standard Format for the Digital Publication of Geologic Maps: USGS Techniques and Methods 11-B10, https://doi.org/10.3133/tm11B10.

top