Thursday, October 09, 2008

Barriers, approaches and research priorities for integrating biomedical ontologies

Alan Rector has published an impressive survey of what we can expect for terminologies and ontologies in healthcare in the next ten years. Summarizing a long document, Rector believes:
  • HL7 (a mix of versions) will dominate the standards for messaging.
  • The standard for EHRs is likely to be a combination or amalgam of the HL7 CDA and Archetype based standards from OpenEHR, CEN EN 13606.
  • Terminologies from biomedicine, particularly the Gene Ontology and the associated ontologies in the Open Biomedical ontologies consortium will become of increasing importance to clinical medicine.
  • Following the example of the bioinformatics community, open systems “owned” by their community are likely to make an increasing contribution.

Rector echoes a number of what one might be tempted to call 'philosophical' points addressed in this blog as concerns the relations between ontologies and information models. He nicely summarizes these points as follows:

The relationship between knowledge representation and ontologies remains controversial and plagued by confusion of substance compounded by loose use of language. A second closely related notion is that of an “information model” of “model of data structures”. Both Archetypes and HL7 V3 Messages are examples of data structures. Formalisms for data structures bear many resemblances to formalisms for ontologies. ... However, there is a clear difference.

  • Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician.For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a “missing” body temperature.
  • Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

A key point is that “epistemological issues” – issues of what a given physician or the healthcare system knows – should be represented in the data structures rather than the ontology. This causes serious problems for terminologies coding systems, which often include notions such as “unspecified” or even “missing”. This practice is now widely deprecated but remains common.

Under 'desirable outcomes', he lists:

The methods will become increasingly formal. The conflict between the scaling problems presented by pre-coordinated terminologies and the difficulty of maintaining consistency with post-coordinated terminologies will be overcome. To this end, the formal structure of SNOMED-CT and will be radically revised to take advantage of its purported underpinnings in description logic. HL7 v3 and/or Archetypes will likewise be reformulated to take advantage of modern technologies to ensure their mutual consistency and consistent binding to the new terminologies. Common links to terminologies from OBO and others used in molecular biology will be forged.

And under 'outcomes to be avoided':

enormous resources will be spent on over-ambitious plans for semantic interoperability that inevitably fail. In either case, communication will take place by going around rather than via the clinical information systems. In countries where it is mandated, SNOMED and HL7 V3 will become taxes on healthcare, absorbing significant resources while returning no, or in some cases negative, benefits.

The document as a whole contains a wealth of important material on HL7 V3 -- and SNOMED CT -- and on the problems associated with each. It draws special attention to HL7 15 year-long planning process designed to produce a '“version 3” that is not yet in routine use'.

No comments: