Thursday, October 09, 2008

Flavors of Null

Interesting post from Ananda Mohan concerning the problems created by HL7 v3's lack of support for any kind of optionality. One result of this policy is that codes need to be provided in advance to cover all cases where information is missing -- hence the multiple 'flavors of null', which, are listed by Mohan on the basis of the latest HL7 v3 ballot pack as follows:
1. NI: "no information" - this is the most general and default exceptional value. There is no information which can be inferred from this exceptional value.

2. MSK: "masked" - this particular item has a known proper value, but it cannot be released in a given context due to security, privacy or other reasons.

3. OTH: "other" - there is a value, but it is not an element in the value domain of a variable, with particular cases:
- NINF: "negative infinity of numbers"
- PINF: "positive infinity of numbers"

4. UNK: "unknown" - a proper value is applicable, but not known. In particular:
- ASKU: "asked but unknown" – information was sought from the source but not known (e.g., patient was asked but didn't know)

- NAV: "temporarily not available" - information is not available at this time but it is expected that it will be available later.

- NASK: "not asked" - the Information was not requested from the patient

- QS: "sufficient quantity" – The actual quantity is not known but sufficient enough to achieve a specific goal. For example the advice can be: add a sufficient quantity of water to 10 mg of medicine.
- TRC: "trace" – The content is too small to measure but still a non-zero value.

5. NA: "not applicable" – There is no proper value for this data item for this patient; for example, the date of the last menstrual period is not applicable for a male.
All of these “flavors” are provided for every data type. Thus for example “null” is a possible value of an integer or real alongside actual integer and real values As Mohan points out, this might lead to reasoning problems: the HL7 definition of 'Boolean', in contrast to every working formalism, implies a 3-valued logic. The null flavors are causing problems also for the (surely in any case premature) attempts by ISO to model its health data types standard (ISO 20190) on HL7 v3 data types. Organizations such as CEN have, it seems, opposed the current ISO draft, in part because of the problems generated by the usage of null flavors.
These problems were described in detail already five years ago in a document on the openEHR Data Types Information Model, the latest version of which is here:
All HL7 data types inherit from the ANY class (equivalent to the DATA_VALUE class in openEHR) which contains the attributes:
BL nonNull;
CS nullFlavor;
BL isNull;
The purpose of these attributes is to indicate whether a datum is Null, and for what reason. Since some data type classes also appear as the attributes of other data types, the Null markers also ndicate whether any part of a datum is null. ... this allows an interval with missing ends and width to exist as a structured type. The consequence of the approach is that the entire model is essentially a model of "partial" data types; any attribute and any function call may return a Null value, as well as the true values of its type (in fact, in the specification, Null values are defined to be valid values of all data types).
This design decision was taken in HL7 so that any datum, no matter how unknown, would be structurally representable in the same way as completely known data, enabling it to be processed in the same way as all other instances of the same type. However, an important object-oriented design principle has been ignored in this approach. In the proper design of classes, properties and class invariants are stated. Invariants are statements which describe the correctness conditions of instances of the class; the general rule is that the post-condition of a creation routine (constructor) of a class must be that the invariants are satisfied. For example, an invariant of the HL7 IVL class could be:
(exists(low) and exists(high)) or else
(exists(low) and
exists(width)) or else
(exists(width) and exists(high))
When an instance of this class is created, this condition should be satisfied, and remain satisfied for the life of the instance. To do otherwise is to create instances ofdata which other software can make no assumptions about, and is forced to check every single field, and then determine what to do in an ad hoc way. ... Possible consequences of the built-in Null marker design approach include:
• since even HL7’s basic types ST, INT, REAL, LIST<>, SET<> include null markers, processing of null values will be pervasive at the lowest level;

• software will be more complex, both implementations of the data types, and of software which handle them. This is because the software always has to deal with the possibility of calls to routines and attributes returning Null values. Most clinical information systems to date have taken the approach that a datum is either represented as an instance of a formal type if fully known, or else as narrative text if only partial;

• data may not be always be safely processable, since some software may not properly handle the null values associated with attributes of partially known data items.
Essentially, all software which processes the data has to be “null-value aware”, and make no assumptions at all about whether a particular data instance is valid or not.
For all of these reasons the HL7 data type model is in stark contrast with the much simpler approaches used in CEN and in openEHR.


Postscript May 23, 2011

See also now here: http://wolandscat.net/2011/05/18/the-hl7-null-flavor-debate-part-2/

No comments: