space
Home > Factsheets > Managing Data Relationships
space
Factsheet
Managing Data Relationships Using DOI® Resolution
Version 1.0
 
Data as relationships
Managing data implies managing relationships between entities: "A has the relationship B to C" (e.g.: "Daniel Defoe is the author of Robinson Crusoe"; "Article Y is chapter of Book X"; "this book has dollar price 25..."): more concisely, "AisBofC"[1]. The DOI® System can ensure such data are persistent and interoperable.
A, B and C all need to be precisely identified for automation[2], though relationships can be expressed loosely ("my car is blue") or precisely ("the car registration number ABC1234 has colour paint code BG45678"), depending on the need: any automated or interoperable application needs more precision.
  • A and C in AisBofC may or may not already have identifiers in one or more registries: existing identifiers can be used in a DOI name[3], or a DOI name can be minted as a de novo identifier string.
  • Standard relationship types (values of B in AisBofC) from all the main content-related identifier and metadata schemes are part of the Vocabulary Mapping Framework[4], so a relationship may be specified precisely and interoperably.
A, B and C may all be assigned DOI names if required. In a simpler case, A has a DOI name, and B and C (not necessarily uniquely identified) are part of the resolved record of that DOI name.
Relationship between entities is often called metadata[5]. In managing content, as an example, metadata includes any of authorship, provenance, rights positions, pricing, ownership, distributor, aggregator and licensee data, production information and identification of how, where, when and the context of content use. Some metadata may be confidential and proprietary (companies may make a business from providing it); other metadata may be usefully made public to provide "hooks" for access.
Relationships can be static or dynamic. Static relationships can be published without further concern. Dynamic relationships (where the current value of C may vary, or the number or type of relationships B may vary) need support to be persistent: this can be provided by value-added services. A classic example is "item A has URL C" (which if not managed leads to "404 linkrot", lack of persistence). Once the relationship is made public, the assigner cannot always control its use: one of the current main uses of the DOI System is to provide persistence in this URL relationship through managed redirection (the assigner does not need to patrol every mention of the URL).
Resolution can be use to express relationships.
  • Resolution is the process in which an identifier is the input – a request – to a network service to receive in return a specific output of one or more pieces of current information (state data) related to the identified entity: e.g. a location (URL). In the DOI System, the data is structured in type-value pairs.
  • Multiple resolution is the return as output of several pieces of current information related to a DOI referent: at least one URL (though possibly several), and defined data structures assisting in management; the ability to "get metadata about" the DOI referent in structured interoperable form provides considerable added value.
URL resolution locates an item or arrives at a managed destination page with further links to be selected. DOI resolution can provide more information, which clients can process, so managing the relationship links in the resolution rather than at the destination. URL resolution is a one to one relationship; a DOI resolution offers the option of one to many (multiple resolution). This may be:
  • One DOI name to many URLs. When the entity A is available at several URLs, a DOI name can record all, and provide all or the most appropriate of these. This is currently in use with some DOI applications.[6]
  • One DOI name to many other data types[7]. One or more of the entries in a DOI handle record could be used to express relationships (e.g. data type = URL, Value = http; or e.g. data type = "relationship", value = "Chapter of" (simplified examples). Since a value might also be a DOI name, these can be nested to express any level of complex relationships AisBofC (e.g. this DOI name is a chapter of that DOI name).
  • A standard grouping mechanism for treating similar DOI names and similar DOI System services as classes is available through the DOI data model of Application Profiles and Services[8] . A standard way of expressing relationships is available in the Vocabulary Mapping Framework, also denoted by DOI names.
Application considerations
Where relationships are managed in a silo application, controlled by one managing body, considerations of interoperability may be irrelevant. But when that application needs to link to others, or be exposed so as to enable requests from other applications to provide a service, interoperability becomes important.
Interoperability is the ability of independent systems to exchange meaningful information and initiate actions from each other, in order to operate together to mutual benefit. The context and assumptions made on assignment of an identifier may not be known to someone else encountering and using an identifier, so data about the referent needs to be easily available.
The majority of DOI applications do not currently use metadata relationships in this structured interoperable way. There will always be a need for simpler implementations that don't carry the full interoperability load (silo applications); they may be in the majority for a long time yet. But the increasing use of linked applications, and the value of a comprehensive rights and permissions management infrastructure, implies it would be prudent to manage data so as to allow a ready transition to such an interoperable common framework.
Separation of internal data and systems and the exposure of that data to the outside world is standard information management practice. DOI names offer a way of exposing data and associated relationships to others in a standard form, based on granularity analysis (what assets get separately identified) and interoperability in a standard fashion. Just as the use of DOIs as persistent pointers provides a value-added layer on top of changeable URLs, so the use of DOI names to connect assets managed by multiple organizations can provide a value-added layer on top of information management silos built by individual organizations, to reduce transaction costs of mobilizing and using assets[9].
Application design therefore requires decisions on identifier granularity and commercial value of expressing relationships; these may be business decisions (e.g. with e-books, it might be useful to express the relationships between an original work and all its published versions: the problem isn't whether to use an ISBN or DOI name, but to agree the level of granularity at which e-books need to be identified and who should provide the identifiers and manage the data given that many of the large publishers are only prepared to assign ISBNs to the generic .epub file.)
 
References
[1] Each relationship of the form "A is B of C" can also be expressed as "C is B of A"(e.g. Robinson Crusoe has author Daniel Defoe): i.e., any piece of data may be "metadata" for another piece of data. There will be multiple relationships about any entity (AisBofC, AisDofE, etc.).
[3] DOI System and Standard Identifier Schemes: http://www.doi.org/factsheets/DOIIdentifiers.html.
[4] Vocabulary Mapping Framework: http://www.doi.org/VMF/index.html.
[5] For simplification, we have omitted in this discussion the provenance of the relationship statement ("who says that AisBofC?"). For most purposes it is sufficient to allow this to be implicit; in the DOI System, it is implied by the right to manage the DOI record, but for an application where this was a direct concern it could be made explicit as another relationship.
[6] 'Resolution of Multiple URLs': http://www.doi.org/multiple-url-resolution.html.
[7] This builds on the extensible data typing mechanism of the Handle System®.
[8] DOI Handbook, Chapter 5, Applications.
[9] As with physical resources: "all standard formal property documents are crafted in such a way as to facilitate the easy measurement of an asset's attributes. If standard descriptions of assets were not readily available, anyone who wanted to buy, rent, or give credit against an asset would have to expend enormous resources comparing and evaluating it against other assets – which also would lack standard descriptions. By providing standards, Western formal property systems have significantly reduced the transaction costs of mobilizing and using assets." (H de Soto, The Mystery of Capital, 2000).
 
Updated 23 September 2009