Sunday, November 27, 2005

HL7 and Object Oriented Programming

In the HL7 V3 Introduction/Backbone document (Published 12/30/2004) it is claimed that:

"HL7 V3 adopts an Object Oriented (OO) approach using Unified Modeling Language (UML) principles. HL7 employs a completely new development approach with V3. HL7 V3 uses an OO approach that is model and repository-based. This OO approach is supported with trained facilitators, formal education classes, and computerized tools. This approach leads to increased detail, clarity and precision of the specification and finer granularity in the ultimate message. This helps functional committees meet the challenges of de novo interface design, and increases functional breadth and evolution of assumptions. It helps new members become productive in fewer meetings. This is an enormous aid to providing institutional memory and sharing work in progress across committees and to the membership at large. "

We know of only one peer-reviewed publication addressing the degree to which HL7 V3, does in fact correspond to OO modeling principles. This is:

Eduardo Fernandez and Tami Sorgente, "An analysis of modeling flaws in HL7 and JAHIS", Proceedings of the 2005 ACM symposium on Applied Computing, Santa Fe, New Mexico, 2005, pp. 216 - 223

whose authors (F&S) assert:

"Instead of using the Unified Modeling Language (UML), the standard notation for object-oriented software development, these two organizations (HL7 and JAHIS) have developed specialized object-oriented models. This has resulted in languages which are incompatible with the current use of UML. The consequences of this choice are the loss of the possible use of a large variety of existing models and patterns. What is worse, it will be difficult to add security specifications in their models, a critical aspect in the electronic interchange of medical records. We discuss here the shortcomings of HL7 and JAHIS as modeling languages and as languages in which to add security specifications."

"HL7 is complex and designed for implementation, not for conceptual modeling. We show below some of the sources of complexity. Although the designers deny having a software orientation, it is clear that the model contains artifacts normally used in the design stage of software systems. A measure of their complexity is that their documentation is hard to understand."

Specific problems identified by F&S can be summarized as follows:

  1. Because of its ad hoc variation of UML, HL7 V3 becomes incompatible with the conceptual models already developed for medical systems.

  2. V3 has an explicit concept of entity, which is not appropriate for a UML conceptual model, where everything is an entity; thus adding an entity class does not add anything to the semantics of the model. The class is present in V3 because 'entity' is there used, idiosyncratically to mean entities of a specific type (people, places, organizations, ...).

  3. V3 treats associations in an idiosyncratic way, giving them no name and no semantic value, and placing their semantics in a separate class. UML associations have semantic value and association classes can be used to add details to links.

  4. In V3 states are hidden as mood codes: UML uses state diagrams to describe the stages in the lifecycle of an object. In HL7 these stages are implicitly encoded using mood codes. This precludes any analysis of state correctness and correlation with sequence (scenario) diagrams.

  5. V3 is cumbersome and inefficient: As a result of its misuse of associations the models are unnecessarily complex. Models that in UML take a few classes and associations require a much larger number of classes in HL7 V3.

  6. V3 uses arbitrary software-dictated names to distinguish the different types of classes in their model and these classes, e.g. by using a prefix letter, as in 'P_patient', to indicate a class of type Participation. In this way the benefits of compositionality are lost. The same effect can be achieved much more naturally in UML by using stereotypes, e.g., of the form '[Participation] Patient', which allow easy extension to new cases, so that one does not need to predefine all the class names one will need from the very start.

  7. Loss of the possibility of using security patterns and models: Given the importance of security when medical records are fully computerized and exchanged through the Internet, this may be V3's most serious flaw.

As F&S point out,

"Some of these deficiencies were already noted by Mead (see C. Mead, "HL7: Challenges, Benefits, and Applications", HL7 UK, December 2003), who said that the following may be true about HL7:

HL7 has assembled a considerable amount of process and number of artifacts without too much concern to UML. [F&S: Exactly, they invented their own notation.]

HL7 is not (in general) interested in software systems. [F&S: While they do use software-oriented aspects in the model, their approach does not follow the rules of software engineering.]

HL7 does not have extensive internal modeling/UML expertise. [F&S: Clearly]"

1 comment:

Gunther Schadow said...

The F&S paper is easily the worst papers that ever slipped through peer review. Basically it shoots a whole bunch of allegations against HL7 without really demonstrating these. It also mingles the the HL7 discussion with some Japanese model which seems to have little to do with HL7 and says that it's bad (without really showing why). Basically the argument goes like this: they show a little toy UML diagram that they drew up themselves (sloppyly, with no consideration of different use cases, with many obvious sore spots) and then wonder that its different from what they see in the other model and HL7. From that they conclude that HL7 must be stupid. It is as if one kindergarten-toddler argues that their painting is better because it uses more blue than the next toddler's painting.

But let's address the specific allegations one by one.

1. Because of its ad hoc variation of UML, HL7 V3 becomes incompatible with the conceptual models already developed for medical systems.

What other "conceptual model" are they talking about? "already developed"? Are they talking about their little toy model that they show in the figure? Again, every serious model developed by independent people will look different, there is no compatibility from the mere fact that UML is used in a certain way. It's like saying that writing in cursive with a felt-pen makes Japanese, Hindi and English "compatible". Of course not.

Besides, none of the "variations" to UML which HL7 uses are "ad hoc". The HL7 RIM consensus process started before UML and all its self-proclaimed inquisitors even existed. We began in 1994 with Coad&Yourdon notation. We drew a huge enterprise data model, informed by collating several such enterprise data models from large healthcare enterprises (Kayser, Mayo, and others plus the hugely successful HL7 v2 messaging standard reverse-engineered.) Then, over a period of 5 years we distilled the model down to what it is today. It is now way more general but also way more abstract. It's in the middle of the E-A-V style models (which e.g., Protege people create) and a detailed data model. The UML rendition of the RIM is not violating anything about UML. The authors don't demonstrate that.

2. V3 has an explicit concept of entity, which is not appropriate for a UML conceptual model, where everything is an entity;

So, they see a class with a name and the name happens to be used somewhere ins the UML specification itself, and they think that all names must have a globally identical meaning between different packages? Have they never heard about namespaces? UML specification also uses the word "file", does that mean I cannot create a class called "File" in a UML design of an operating system? Java has a class called "File," does that mean Java isn't UML complient?

3. V3 treats associations in an idiosyncratic way, giving them no name and no semantic value, and placing their semantics in a separate class. UML associations have semantic value and association classes can be used to add details to links.

Associations do not need names in the UML specification, nor need association role-names be assigned if the association is unique between two classes. So, if we draw an associaiton between the class "Participation" and the class "Act" and it is the only association between them, we do not need any names. They would simply create redundancy. And since when is a name a sourec of "semantics"? How do you "place semantics" anywhere? How do we "place semantics in a separate class"? F&S leave little impression that they even know what they are talking about in this point.

4. In V3 states are hidden as mood codes: UML uses state diagrams to describe the stages in the lifecycle of an object. In HL7 these stages are implicitly encoded using mood codes. This precludes any analysis of state correctness and correlation with sequence (scenario) diagrams.

Not true. Un HL7 mood codes are expressly not states. This is made very clear. Anyone pretending to criticize the RIM must have at least read it. If F&S has taken the effort to analyze the RIM before they write about it, they would have noted that the Act class has the attributes moodCode and statusCode. They would have also noted that statusCode is what is related to the UML state diagram.

That said, F&S sound ignorant about what "state" of an object actually is when they bemoan that "these stages are implicitly encoded". So, let's get it straight: An object has identity and state. State is anything that can change during the life-cycle of an object, while identity is what remains the same. There is no requirement in UML that state be encoded in any one single attribute, and if F&S had even attempted carefully read what they're quick to criticize, they would have noted that moodCode is expressly precluded from being changed in an Act-object. Mood is not state.

5. V3 is cumbersome and inefficient: As a result of its misuse of associations the models are unnecessarily complex. Models that in UML take a few classes and associations require a much larger number of classes in HL7 V3.

One can make unnecessarily complex models in UML as in any other language, and the RIM is a UML model. But the RIM and RIM based models are as complex as they need to be. What proof do F&S have for their statement? The only thing they present is their dinky little toy model, which simply exposes F&S naivete about healthcare models. F&S have no proof for their ramblings.

6. V3 uses arbitrary software-dictated names to distinguish the different types of classes in their model and these classes, e.g. by using a prefix letter, as in 'P_patient', to indicate a class of type Participation. In this way the benefits of compositionality are lost. The same effect can be achieved much more naturally in UML by using stereotypes, e.g., of the form '[Participation] Patient', which allow easy extension to new cases, so that one does not need to predefine all the class names one will need from the very start.

This is a moot point. We haven't done this "P_" prefixing for many years. But even if we did, F&S do not define what "the benefits of compositionality" even are, much less how they are somehow "lost". They obviously do not understand the purpose of HL7, so how can they say how "the same effect" could be achieved in another way? How does a name prefix prevent "easy extension to new cases"? And how do F&S think that HL7 "predefines all class names"? F&S show none of that.

7. Loss of the possibility of using security patterns and models: Given the importance of security when medical records are fully computerized and exchanged through the Internet, this may be V3's most serious flaw.

This "may be" F&S most serious flaw, that they finish their paper off with what looks their favorite topic, security, suggesting that that's really all they know or care about. But as the rest of the F&S paper, this point too is pulled up out of the blue, without a shred of a rational method. Their only method seems to be that of a kindergarten-toddler looking at another toddler's painting: "yours looks different from mine, I don't like it, so it can't be right."

As a conclusion, the F&S article is not worth the paper on which it is printed because it is bare of any method by which it arrives at its conclusions. Without a method, it's nothing but the oppinion of its authors. The example UML model from their own hands reveals that they don't really know the healthcare field. Since HL7 is open to anyone to join the discussion how come that F&S never contributed? The answer probably is that F&S have no lasting interest in healthcare IT and noone has found the need to support their travels to HL7. And why would they? Fernandez seems like an operating system security type and Sorgente his student, and neither has any track record of any health care IT involvement. If they have anything more than a drive-by interrest in this fascinating field, I'd welcome them to show up at HL7 and learn what it is all about.