In my previous post Another Step Toward Lifting Library Metadata into the Cloud I offered a partially developed MODS ontology and an approach intended to assist others who might have an interest in criticquing, extending, modifying, or simply commenting on such an ontology. However, one of the first comments on my posting brought to my attention another library oriented ontology, Bibliontology and the poster later essentially asked, “Why a MODS ontology.” Without directly answering that question, I would like to reiterate that my
Primary Motive is to:
Promote further discussion on identification and eventually implementation of a community acceptable strategy for migrating existing MARC based metadata into a form more universally accessible to consumers and producers of Linked Data.
So what are some of the paths we might take for migrating MARC to RDF, presuming that RDF is the preferred way to express Linked Data?
Well, there is already significant support for converting MARC to MODS (see the “Tools & Utilities” section of the Library of Congress Document: MARCXML) and a Draft RDA to MODS mapping. So MODS is at least one natural intermediary candidate to target for promotion into the Linked Data cloud. Furthermore, establishing a community acceptable “standard” approach for expressing MODS in RDF formally benefits from the existence of a MODS ontology, which will define a common vocabulary for inserting MODS triples into triplestores, and which can aid in the formulation of “natural” (hopefully relatively simple and easy to understand) SPARQL queries.
In fairness, it should also be noted that:
- There is also already a way to go straight from MARC to a limited Dublin Core based RDF implementation
- a Google search on MARC + Ontology produces many results
- Significant work has been done on MarcOnt
- And a Google search on MODS + ontology does produce some results (now including some references to my own postings
).
Still after some searching, it has not been clear to me that a full MODS ontology yet exists. By that I mean one that fully captures all the details of the MODS XML schema.
The above is a bit of a global perspective on why others (especially other MARC producers from the library community) might be interested in a MODS ontology. The following is more of a local perspective on the point of developing a MODS ontology.
In my work context we are required to produce MODS. I won’t go into total detail about all the reasons for this, but essentially they include:
* UC San Diego, and the UCSD Libraries, where I work, is part of the larger University of California system.
* The libraries of the UC system are centrally served by the California Digital Library (CDL)
* CDL provides a shared Digital Preservation Repository (DPR) service
* DPR requires deposited content to be accompanied by METS
* Deposited METS files must conform to predefined “profiles”
* Acceptable candidates for the “descriptive metadata” component of the METS profiles are primarily either MODS or Dublin Core
* Our local cataloging staff has already invested significant work in full MARC cataloging of hundreds of thousands of objects that we want to send to the DPR and they don’t want to “dumb-down” to Dublin Core. MODS therefore becomes the preferred choice metadata expression for incorporation in METS.
* Then, because there are automated mechanisms for generating MODS from our existing MARC encoded data, producing MODS is something that we’ve known how to manage and and have been able to implement on a mass scale for several years now. So, to state it simply, like it or not, we already have lots of MODS data that we need to work with.
Just to tell a little more of our local story:
A couple of years ago we started to work on building a digital library. We looked at open source products like DSpace, Fedora, and others, but one of the limitations we encountered was lack of support for metadata with the richness of MARC or MODS. Also, we were not aware of any obvious established, extensible relational database schemas for dealing with the complexity of MARC or MODS and XML database products didn’t seem to perform well at the time. So, we took a leap and began exploring ways to encode our MARC data in RDF for access from a triplestore via SPARQL. Since we already had MODS, it was natural for us to try and find a way to express that in RDF. Starting only with the notion that “subject” and “predicate” should always have URIs (URLs), and because we:
- Couldn’t find an already existing MODS ontolgy
- Were not sophisticated enough to create our own
- And (in a way unfortunately) didn’t really think just to reference the existing Library of Congress MODS XML schema
we created individual files for each of the predicates we needed. We used the same CDL defined ARK based file naming convention for these predicate URLs as we did for our actual digital content files and then created a mapping between the ARKs and MODS vocabulary elements
Thus, for example, the URL for “mods:title” for us is:
http://libraries.ucsd.edu/ark:/20775/bb72705143
Note: Unfortunately our system is not available to the public so this link will not generally work for everyone
where the “http://libraries.ucsd.edu/ark:/20775/” prefix component is constant for all other MODS predicates.
Armed with this approach, we encoded data for several hundred thousand MARC –> MODS records to RDF and loaded on the order of 15 million triples into AllegroGraph, which, thanks to vendor licensing terms, we were allowed to use at no cost as long as we were working with less than 50 million triples.
The following are some examples of what the user interface for our system displays: (download and zoom in on the files for closer viewing, if you like)
RDF triples
JSON view of data
JSON manifestation of data for processessing by client-side JavaScript
RDF XML file
RDF graph
All this works for us and we are able to do SPARQL queries on the results. We have had thoughts about sharing more of our work with others, but have been painfully aware that we are missing anything like a candidate for a community shareable MODS ontology that would enable others to generate RDF for their MODS records in a way that would potentially allow us all (i.e. those starting with MARC data) to make our catalog records available as consistently encoded linkable MODS data.
We have wanted to fill that gap by beginning to encourage some community discussion about a MODS ontology that we could eventually migrate our own data and software towards.
So, again with that end in mind, my previous posting offers a partially complete MODS ontology candidate along with a visual aid assisted methodology to help in the validation of that ontology as it is assembled in a sequential layer-like fashion from increasingly large subsets of the complete body of statements which define the full ontology.


#1 by Bruce D’Arcus - July 31st, 2009 at 14:18
The problem from my standpoint is that MODS has some really odd, library-specific, design choices that I don’t think map very well to the wider world. A central concept like mods:name, with mods:role as a child of that, really makes no sense, and conflicts with more common modeling you see in DC, FRBR ,etc.
It’s semantics are also really loose.
So you have to ask yourself, just how linked could a MODS view in RDF really be?
#2 by Clay Redding - August 11th, 2009 at 21:16
Hi Chris,
I’ve enjoyed what you’ve posted here on starting work on a MODS ontology. I work at LC in the Network Development and MARC Standards Office, although standards work is not my primary duty.
Nothing is set in stone, but I can tell you with certainty my co-workers and (to a lesser extent) I have started some effort toward a MODS ontology. We’ve been asked by several organizations to provide this, and we’re working on it. However, at present more effort has gone into first completely re-doing the MADS model and expressing it in OWL.
The MADS work has largely grown out of requests we’ve heard from users who want augment the subject headings descriptions in id.loc.gov to see the subdivisions identified, and we’ve had trouble doing that with SKOS alone. Also influencing this path that in terms of the progression of XML metadata standards (MODS/MADS, EAD/EAC), it seems the emphasis has always favored bibliographic standards preceding their authority counterparts, making it hard to truly incorporate authority retrospectively. We want to reverse that trend here.
From that work, we hope to then have MODS rely heavily on MADS, such that parts of a bibliographic record that normally make use of authority are described using instances of MADS classes.
Regarding what Bruce points out, for both MODS and MADS, we’ve thought a lot more about the semantics, to the point of e.g. incorporating MARC relators as properties that could help bypass the mods:name/mods:role issue.
I’d welcome a chat sometime to discuss ideas further, and I plan to plug in your ontology into my TopBraid Composer tomorrow to take a spin.
Clay