Category Archives: The Registry!

Anything related to the NSDL Metadata Registry development

Password reset fixed

One of the side effects of our server move was the need to reconfigure our locally-hosted transactional email services, which were always a little flaky — if you have tried to reset your password lately you will no doubt have noticed that the promised email never arrived.

That’s fixed now. We? switched from self-hosted to using Postmark, which works very well and should be far more stable.

Still here, folks

When we first designed the NSDL Registry, part of the requirements was that it be able to run in Lynx (seriously) and be usable without JavaScript in all browsers including all versions of IE. As you will probably have noticed, the world of web development has changed a bit in the last 8 years.

Lately it has come to our attention that the Open Metadata Registry looks a lot like abandonware. The reality is that we’re hard at work on a long-planned, often delayed, and absolutely necessary update.

Stay tuned.


Until lately we’ve been pretty happy with our ISP, Dreamhost. But a few months ago, after several during-the-presentation meltdowns of the Registry we determined that we needed to move to a higher-performing, more reliable server. We could have done the easy thing and moved to a Virtual Private Server at Dreamhost. Instead, we setup an entirely fresh server in the Rackspace Cloud and very carefully, with much testing created a fresh instance of the Registry with greatly expanded data capacity, some updated code, and considerably more speed. So far, so good.

We had a self-imposed deadline of several weeks before the DCMI 2011 Conference in The Hague and completely missed it. This left us with the choice of waiting until after the conference to redirect our domain to the new server or taking the risky step of switching domains just a few days before the conference. Of course, we didn’t wait. At which point we discovered that we couldn’t simply redirect our main domain to the new server but needed to redirect the subdomains as well, breaking our wiki and blog. Which we had a great deal of difficulty restoring while on the road to DCMI.

But everything’s back to normal now, and even updated. We now resume our regular programming.

SPARQL queries

We’re using?Benjamin Nowack‘s excellent?ARC libraries?for a tiny bit of our RDF management. It may surprise you to know that we don’t use a triple store as our primary data store, but we do too many things with the data that we think are cumbersome at best when managed exclusively in a triple store (a subject for another post someday). Still, last year we started nightly importing of the full Registry into the ARC RDF store and enabled a?SPARQL?endpoint, thinking that it might be useful.

Lately, we’ve heard a few folks wishing for better searching in the Registry and since we’re actively building an entirely new version of the Registry in Drupal (a subject for another post someday) we’re loathe to spend time doing any serious upgrading of the current Registry. But we have SPARQL!

Yesterday, as part of another conversation, a colleague helped me figure out what I think is a fairly useful search query. If you follow the link, you’ll be taken to the Registry’s SPARQL endpoint which will?display inline a list of all of the skos:Concepts in the RDA vocabularies which have no definitions. Well, 250 of them anyway since that’s the arbitrary limit we’ve set on the endpoint. They’re not hyperlinked (which would be really useful) but it’s still good info.

The SPARQL query used to create the list:

PREFIX skos: <>?
WHERE { GRAPH ?g { ?s ?p ?o . }
OPTIONAL { ?s skos:definition ?Thing . }
FILTER (!bound(?Thing))
FILTER regex(str(?s), "^")}

…can be used to find any missing property?(see ‘optional’)?and the regex used in the filter can be modified to limit the search to any vocabulary, group of vocabularies, or a domain. I’m not enough of a SPARQL expert (meaning I’m completely clueless) to know how to filter by attribute, but it should be possible, if not easy, to find skos:Concepts that have an English definition, but no German definition (I look forward to your comments).

Announcing the New Open Metadata Registry

As of this week, the familiar NSDL Regisry has a new name–the Open Metadata Registry–and a new logo. The name change reflects the fact that we’re no longer receiving funding from the National Science Foundation on behalf of the National Science Digital Library (NSDL), but also recognizes that the Registry has become one of the leaders in providing open, stable tools for those building infrastructure for the Semantic Web.

As part of this change, we’re joining our colleagues at JES & Co. as a project under their umbrella, as well as bringing current users and partners together in an Open Metadata Registry Consortium to build a sustainable plan for moving the Open Metadata Registry forward. Please watch for additional announcements and an expansion of the new look for our pages. If you’d like more detail on the Consortium, please contact Diane Hillmann at metadata dot maven at gmail dot com.

July 20, 2010

Readable URIs

Over the years we’ve been engaged in a number of discussions in which the ‘readability’ of URIs was raised, either as an issue with non-readable URIs or as a requirement in new URI schemes.

At the Registry, we understand and are sensitive to the desire for human readability in URIs. However embedding a language-specific label in the URI identifying concepts in multilingual vocabularies has the side effect of locking the concept into the language of the creator. It also unnecessarily formalizes the particular spelling-variant of the language of the creator, ‘colour’ vs. ‘color’ for instance.

When creating the URIs for the RDA vocabularies we acceded to requests to make the URIs ‘readable’ specifically to make it easier for programmers to create software that could guess the URI from the prefLabel We have come to regret that decision as the vocabularies gained prefLabels in multiple languages. And it creates issues for people extending the vocabulary and adding concepts that have no prefLabel in the chosen language of the vocabulary creator.

That said, the case is much less clear for URIs identifying ‘things’, such as Classes and Properties, in RDFS and OWL, since these are less likely to have a need to be semantically ‘understood’ independent of their label and are less likely to be labeled and defined in multiple languages. In that case the semantics of the Class or Property is often best communicated by a language-specific, readable URI.

In the end I personally lean heavily toward non-readable identifiers because of the flexibility in altering the label in the future, especially in the fairly common case of someone wishing to change the label even though the semantics have not changed. This becomes much more problematic when the label applied to the thing at a particular point in time has been locked into the URI.

I’m not trying to start a non-readable URIs campaign, just pointing out that the Registry, in particular, is designed to support vocabulary development by groups of people, whose collective agreement on labeling things may change over the course of the development cycle, who are creating and maintaining multilingual vocabularies. Our non-literal-label URI default is designed to support the understanding we’ve developed of that environment over time.

SKOS updated for Vocabularies

Just a quick note that today we updated the version of SKOS that we provide for describing value vocabularies. This deprecates the properties that were removed from the final SKOS release and adds the many new ones. We’ve also restricted the non-mapping relation properties (skos:broader, skos:narrower, skos:related) to the ‘containing’ scheme while providing cross-scheme mapping for the mapping relations.

We don’t yet provide a useful interface for building collections, but that’s coming real soon now.

Oh, and we added a SPARQL endpoint.

The German National Library: translating and registering RDA elements and vocabularies

A prerequisite for the registering of our terms in the NSDL Registry and one of the greatest challenges for the German National Library at the moment is the translation of the RDA elements and vocabularies.? Since bibliographic description is executed with a highly specialised vocabulary, we are finding that the process of pinpointing the appropriate terms is interesting but also very involved. Although the existing German rules for bibliographic description (RAK) and the authority files for subject headings (Schlagwortnormdatei, or SWD) have plenty of vocabulary to offer as equivalents to Anglo-American cataloguing terminology, RDA does include concepts relatively new to bibliographic description.

Before resorting to “inventing” words, always a last resort, we launch comprehensive vocabulary mining efforts, in the process of which, beyond checking already existing translations (FRBR, MARC 21), we consult the expertise such institutions as art libraries and film institutes to get the most up-to-date descriptive terms available in the German language. If we deem a word previously used in a translation suboptimal, we may deviate from its use and in particular cases forgo the advantages of standardisation in the interest of our primary criteria: consistency, currency, usability, and precision. A quick and general Google search can also be helpful to learn how terms are being (in)formally circulated. In the case that we should find it necessary to create a new term in German, as we are experiencing with such an example as the type unmediated, we have to weigh up what sort of etymological root we would like to lean towards, Latin or Germanic.? If we translate it with unmediatisiert, it can ease communication around cataloguing between nations because of its morphological similarity to many European languages.? However, leaning on Germanic roots may sometimes be necessary in the interest of standardisation and aligning with existing descriptive language or with the strengths and realities of the German language. In that case, we may be better off choosing nicht mediatisiert or ohne Hilfsmittel zu benutzende Medien, which seems awkward but conforms to types of uses already in existence in the subject headings. The option of the “new-proposed” status in the Registry for the concepts therefore suits our needs perfectly, since for the reasons just mentioned and outlined in Diane’s blog entry about multiple languages and RDA, none of the translations we have entered are as of yet official.

Once our small team of librarians from the Office for Library Standards has followed these processes and developed a pool of equivalent German terms which we deem worthy of proposing initially for the Registry and subsequently for our official translation of RDA, we make them available to groups of colleagues specialised in bibliographic description or subject headings at the German National Library for comment in a Wiki and working meetings.?Our experience with translation has shown us that the translations of descriptive bibliographic elements and vocabulary into German must be handled by librarians (professional translators can potentially pick up from there) and peer-reviewed through the above-mentioned process to ensure accuracy and acceptance in the library community.

Beyond motivating us to begin our RDA translations early, our participation in the Registry really has also given us an opportunity to dabble in the semantic web through the process of assigning URIs to our German translations of RDA element and value vocabulary.? As a test run, it therefore allows us to toy with the idea of linked data by setting descriptive bibliographic vocabulary up with its prerequisite domain. The lessons learned and questions raised through this experience put us in a better position for strategic planning regarding the nature of the presentation and sharing of bibliographic data in the future.

What has particularly attracted us about the Registry and its connection with the RDA tool is that, provided that we do decide to provide linked bibliographic data in the future as an institution, the Registry makes it possible to do so in our national language. This is a condition for its wide-spread usability and acceptance in the German-speaking library and internet community and therefore of primary importance to us, provided of course that the Committee for Library Standards takes the decision to introduce RDA as the official rules for description and access in Germany and Austria.

Multiple languages and RDA

We’ve been thinking for some time about how to implement multi-lingual (and multi-script) vocabularies in the Registry. Some Registry users have been experimenting with language and script capability for some time (see Daniel Lovins’ Sandbox Hebrew GMD’s). But it was really when we started working with the RDA vocabularies that we got serious about multi-linguality.

At DC-2008 in Berlin, we started talking to the librarians at the Deutsche Nationalbibliothek about adding German language versions of RDA vocabularies into the Registry. I knew how eager the German libraries were to participate more actively in the RDA development, and had been talking to German librarians for some time about their frustrations with the notion that they had to wait until “later” to become involved. Christine Frodl and Veronika Leibrecht have been our primary contacts at the Deutsche Nationalbibliothek on this work, and they’ve been a real pleasure to work with.

We decided collectively to start with some of the value vocabularies, in particular Content Type, Media Type and Carrier Type. We enabled Veronika to become a maintainer on those vocabularies, and she worked within her library and associated German-speaking libraries to translate and develop labels and definitions in German for the existing terms. As she describes the challenge:

“Because RDA was not developed simultaneously in various languages (that would be an even more daunting task!), we are looking for ways to adapt German to English language/cataloguing concepts and must get agreement on the terms in our community. The search for terminology to translate RDA will therefore be an ongoing process in the short term for us. … Now I am looking forward to seeing French and Spanish come along ? and would be happy to share a few resources I found which could help people in their search for terminology.”

Those of you who know German (or have an interest in multilingual vocabularies in general, might want to take a look at some of the work done already:

Content Type Vocabulary (you can see that for now, all concepts display in English)

Detail for concept of “computer program”: (the German translation for the label appears in the list of properties of the concept)

Veronika points out that the process behind this effort is a complex one, but solidly based on existing relationships in the German-speaking world:

“[B]ecause of the federal system in Germany, the DNB works very closely with all library consortia in the country and Austria and decisions about cataloguing rules and data formats are reached through consensus with them. The reason for this it that the consortia include and represent libraries which existed long before the German state as such (or the DNB, for that matter) and therefore have traditionally and independently held the written cultural heritage of their individual counties, duchies, kingdoms etc.”

We have had some additional interest by other language communities in this effort, and Jon has added some detail on our wiki to describe how we plan to improve the software to make both building and maintenance of other language versions simpler, and easier to configure at the output end. Do note that this isn’t implemented yet, but is instead a blueprint for moving ahead in this critical area.

Updated Step-by-Step Instructions

Those of you who have actually discovered the Registry and tried to add stuff to it have (I hope) already realized that we had Step-by-step Instructions for doing so. They were old, and we’d added new things (mostly Jon added new things—I just rant, nag and test), so I finally re-did the instructions. They can be found here:
Looking at the old instructions was, for me at least, a reminder that we have made progress, much as it sometimes seems like we’re moving at a glacial pace. The interface has changed, we’ve added versioning and history, as well as schema registration (read Jon’s posts for more details). There’s still lots more to come, and believe me we have seemingly endless list of what’s still missing. But writing documentation, even basic stuff like these instructions, is a humbling experience. Trying to do things more linearly than I usually do reminds me yet again where the gaps are.

One of the issues, which I’m not sure I’ve papered over very well in the instructions, is something I call the “eating our own dog food” problem. Those of you who know me personally have heard me use that phrase before—it’s a favorite. It basically means that, if you’re just preaching about how to do something, and not doing it, you’re not eating your own dog food. Not a good thing, and likely as not it will affect your credibility in ways that aren’t very comfortable, because SOMEBODY will call you on it.

Where we managed to step in it (the natural product created from said dog food, that is), was when we extended the registry from value vocabularies only to value vocabularies and schemas. Then, our model of concepts and properties of concepts started getting a little funky. When you’re registering schemas, you’ve got an aggregation of schema properties, and then, um, properties of properties? Uh oh. You can see the problem, I think—it’s about identifying and defining terms (among other things), and isn’t that what we’re supposed to be doing?

So, for the moment, until we’ve figured out how to hold our noses and eat that unappetizing dog food, we’re making a distinction in the schema instructions between “schema properties” and “specific properties.” Not elegant, but until inspiration strikes, somewhat helpful, I hope.

If any of you have occasion to use the instructions or stumble upon them and want to provide some helpful (or not) comments, just send them along to me: