Thursday, November 13, 2008

Why Make It Simple If It Can Be Complicated?

In a document entitled

Ten good reasons why an HL7-XML message is not always the best solution as a format for a CDISC standard

Jozef Aerts, an XML expert at XML4Pharma, has examined the on-going attempts to format new CDISC standards as HL7-XML messages.

These attempts reflect the desire for integration, given that HL7 is well-established in the healthcare world, that XML is itself a global standard of growing importance, and that the FDA is already using a number of HL7-XML-based standards.

As Aerts points out, however, there are a number of problems: 1. it is an XML-free HL7 v2.x that is well-established; 2. HL7-XML is an embarrassingly complex non-standard version of XML, with little in the way of software tool support; 3. HL7-XML causes problems which standard versions of XML avoid; 4. the movement to HL7-XML seems to be especially supported by people that have never been actively been involved in XML development (and to be supported often for reasons political rather than computational, clinical or economic).

HL7-XML messages take years to develop and are nearly always overcomplicated:

Those who have ever inspected an HL7 aECG-XML file in detail, may have been surprised by the enormous complexity of the XML. One of the reasons for this is that the XML structure (as defined in its XML-Schema) is not developed by XML specialists, but is derived from UML diagrams. I have been teaching a lot of XML in the last ten years and have experienced that CDISC ODM can be learned in a one day course. No chance however to accomplish this with aECG. Therefore, the amount of people that really understand aECG-XML is very limited, this in contradiction to the amount of people that understand ODM-XML.

Similarly, though XML is usually defined as being “as well machine-readable as human-readable”, one may question whether the latter is applicable to some HL7-XML messages such as aECG: not only the complexity is overwhelming, but it also uses a lot of code ununderstandable for the human reader.

In 2006, Gartner issued a Note entitled "HL7 V3 Messages Need a Critical Midcourse Correction", stating that “HL7 must act vigorously to make Version 3 messages easier to use and more compact” (See e.g. here.)
The direct consequence of this overcomplexity is that it is much harder (and thus much more expensive) to develop software to read and write HL7-XML than it is for ODM-XML. From my personal experience (30 years) in software development, I estimate that the cost of developing softwarefor a complex HL7-XML message is at least the twentyfold than it is for ODM-XML.
Once again, therefore, HL7 chooses an idiosyncratic approach to development that is at odds with the approaches that have been tried and tested elsewhere -- with results which might have been anticipated (some of which were indeed anticipated on this blog, for example here).

As Aerts continues:

HL7-XML messages are developed in a somewhat curious way: first of all one or more UML diagrams are developed, and then the XML-Schema is derived from the UML. The UML is derived from the RIM (HL7 Reference Information Model) which currently has over 70 versions (!). Though the use of UML may be the perfect and well-established way for translating a software design to software classes, it is considered bad practice by XML specialists. Transformation of UML to XMLSchema in general leads to “spaghetti XML”, introducing unnecessary complexity. Of course it is an “easy” way: the world has much more UML specialists than it has XML-Schema specialists.
Personally, I would consider transformation of UML to XML-Schema the “lazy man's way”. The result can however be catastrophical. By the way, none of the most popular XML-based standards, such as MathML, VoiceXML, XHTML or XForms etc. have ever been developed using UML.
Aerts provides a series of examples to illustrate the problems and costs caused through use of HL7-XML, problems which are avoided if one uses XML in the standard way recommended by XML experts.

Addendum (April 10, 2009):

Some comments on Hacker News:

I don't know anything about HL7 or HL7-XML, but this sounds like letting loose people that dont know zilch about the implementation side of things. In this case HL7 is translated into UML because the people involved know UML, not XML. Then the UML is translated into XML by the push of a button, generating monstrous XML. Rant: dont let your tools substitute for personal knowledge of the domain.

How can someone not know XML when it's actually relevant to their job? It's just a tree with a fairly simple structure. Anybody who avoids learning XML because they already know UML is just not even trying. Seriously, learning it takes like half an hour at most.

HL7... what a nightmare! I remember having to work with it, and it was a convoluted solution where every provider and vendor had a different interpretation. As bad as 2.3.1 is, it's still worlds better than 3. The best thing that can happen is for 3 to be scrapped. The worst part is its model. I worked with it for the purposes of PHIN-LDM, and I've never seen a worse clusterfuck. It made dailywtf look positively logical.

I've written my own HL7 (pre-XML v2.x) message parser and generator in Java for work. I'd really like to not have to touch that code again, if possible. My code is easy enough to understand, but I don't want to have to rewrite it support this non-standard XML. Just putting XML on the name of something doesn't instantly make it all easier.