Friday, February 04, 2011

HL7 Clarifies Everything

In my “Beyond Concepts”, published in 2004, I draw attention to a series of problems associated with the widespread use of the term ‘concept’ in knowledge representation circles. I show that the term is rarely defined by those who use it, with the result that it is used with a variety of conflicting and sometimes intrinsically incoherent interpretations, sometimes within one and the same sentence. The resultant problems affect not only human beings but also computer systems, since the latter must rely on  artifacts afflicted with the same incoherence. These problems are now, increasingly, being acknowledged in other circles, for example by the IHTSDO, whose most  recent (2010) release of SNOMED CT provides the following definition  for ‘Concept’:

An ambiguous term. Depending on the context, it may refer to:
1. A clinical idea to which a unique ConceptId has been assigned.
2. The ConceptId itself, which is the key of the Concepts Table (in this case it is less ambiguous to use the term ‘concept code’).
3. The real-world referent(s) of the ConceptId, that is, the class of entities in reality which the ConceptId represents (in this case it is less ambiguous to use the term ‘meaning’ or ‘code meaning’).
As Solbrig and Chute point out:
The term “concept” continues to be used in models to reference categories, classes, universals, individuals and other less well defined artifacts. The fact that this obfuscates the purpose and usefulness of the model itself has already been well documented. Here, we show how the use of the term “concept” as a class name in a model can introduce serious confusion and propose a simple way that such confusion can be avoided. (2009, p. 123) 
It is thus heartening that HL7, too, with its vision of creating the best and most widely used standards in healthcare’ in  a way that exhibits ‘timeliness, scientific rigor and technical expertise,’ should have turned its attention to its own use of the term ‘concept’. In a new draft document on the representation of concepts that has been posted by the HL7 Vocabulary Group we are told that the concept is The fundamental unit of meaning in HL7 with respect to vocabulary’. We are told further that: 
As defined in HL7, “a concept is a unitary mental representation of a real or abstract thing – an atomic unit of thought.”  As such, we have to do some work to make concepts computationally tractable.
A concept, as used in HL7, has two fundamental characteristics (it often has other characteristics that are immaterial to this discussion, such as relationships to other concepts):
    1.  A concept can be identified
    2.  A concept can be represented
The HL7 model of concept covers #1 by having 1..m identifiers each one of which uniquely identifies the concept.  These are required to be easily machine-processable with simple recognition and indexing algorithms.
The HL7 model of concept covers #2 by having 1..m representations, each of which represents the concept in some way.  These are usually human-readable, and may or may not be easily machine processable, but are almost always machine renderable.  For instance, an image, or video, that represents a particular concept would be a representation of that concept that can be consumed by a human being, and likely rendered by a machine, but generally not easily indexed or matched and recognized with 2011 technology.
... The HL7 model defines a group of concepts with their associated identifiers and representations that are handled as a managed collection.  HL7 calls such managed collections Code Systems.  The organizations that publish these often use different words to describe the identifiers and representations in their collection, but they all publish identifiers and representations of the concepts in their Code Systems.
HL7 has been using nomenclature for these for years, and some of it is also widely used in the informatics community – but often used slightly differently.  It has been shown to be unclear how the different words used related to these fundamental characteristic items are precisely defined, and we have not had an easy way to ensure consistency of use.  To make things more challenging, some words that we often use in everyday discussion of vocabulary may be used to mean either or both an identifier or a representation.
The following table attempts to disambiguate this nomenclature we commonly use in HL7:

HL7 Nomenclature Used
Is it an Identifier?
Is it a Representation?
Should it be our preferred nomenclature?
Code
Yes
Sometimes
Term
Sometimes
Yes
Designation
Generally Not
Yes
Concept Representation
Generally Not
Yes
Concept Identifier
Yes
No
Display Name
No
Yes


We conclude from the above that HL7 has a long way to go on the vocabulary front if its existing resources are to advance the sort of precision in use of language that would be required to support the ambitious goal of semantic interoperability in the complex domain of  healthcare. We remain convinced that a helpful step towards achieving this goal would be to abandon entirely the use of the term ‘concept’. 

1 comment:

Matt Leo said...

I did some research in the application of ontology technologies to environmental data sharing, and much of the literature I read related to developing shared medical ontologies. That literature led me to the conclusion that ambitious attempts to free data's semantics from its application context are impractical at best.

Developing a shared ontology is a natural thing to imagine when you look at the idea of ontologies, but that impulse stems from a flawed notion of how communication works -- or at least an impractical notion. It assumes that people have to share a common conceptualization in order to communicate. That is true as far as it goes, but the problem is constructing that common conceptualization across the transitive closure of all the parties who might get their hands on a piece of data. In practice people communicate all the time about things while having fundamental disagreements about them. They merely have to agree about the role a term plays in the situation of interest.

Far from automatically making all communication easy and safe, trying to shoehorn data into universal foundation ontology has the potential to make routine communications difficult and unreliable.

Attempting to paper over natural and reasonable differences in viewpoint introduces lots of ad hoc terminology and model. Ontologies of large scope tend to become complex, and I suspect a complex ontology is necessarily an unreliable basis for communication. Users are called upon to make distinctions that are meaningless to them or violate their conceptual models.

Sometimes differences in conceptualization exist that just aren't bridgeable, yet the differences have no practical significance until you try to bridge them and fail.

The upshot is that while ontologies have significant potential in making data sharing safer and more convenient, we have to be judicious about our expectations and use of ontology technology.

I'd be very interested in seeing what HL7 has done along these lines, but the only way to get a comprehensive set of HL7 standards seems to be shelling out $700 for a membership. That seems to be a flaw in the standards process right there, since the only way to examine the body of standards critically is to literally buy into the organization. That's almost a conflict of interest. Other organizations like IEEE do this, but not at that price level.