Thursday, November 13, 2008

Why Make It Simple If It Can Be Complicated?

In a document entitled

Ten good reasons why an HL7-XML message is not always the best solution as a format for a CDISC standard

Jozef Aerts, an XML expert at XML4Pharma, has examined the on-going attempts to format new CDISC standards as HL7-XML messages.

These attempts reflect the desire for integration, given that HL7 is well-established in the healthcare world, that XML is itself a global standard of growing importance, and that the FDA is already using a number of HL7-XML-based standards.

As Aerts points out, however, there are a number of problems: 1. it is an XML-free HL7 v2.x that is well-established; 2. HL7-XML is an embarrassingly complex non-standard version of XML, with little in the way of software tool support; 3. HL7-XML causes problems which standard versions of XML avoid; 4. the movement to HL7-XML seems to be especially supported by people that have never been actively been involved in XML development (and to be supported often for reasons political rather than computational, clinical or economic).

HL7-XML messages take years to develop and are nearly always overcomplicated:

Those who have ever inspected an HL7 aECG-XML file in detail, may have been surprised by the enormous complexity of the XML. One of the reasons for this is that the XML structure (as defined in its XML-Schema) is not developed by XML specialists, but is derived from UML diagrams. I have been teaching a lot of XML in the last ten years and have experienced that CDISC ODM can be learned in a one day course. No chance however to accomplish this with aECG. Therefore, the amount of people that really understand aECG-XML is very limited, this in contradiction to the amount of people that understand ODM-XML.

Similarly, though XML is usually defined as being “as well machine-readable as human-readable”, one may question whether the latter is applicable to some HL7-XML messages such as aECG: not only the complexity is overwhelming, but it also uses a lot of code ununderstandable for the human reader.

In 2006, Gartner issued a Note entitled "HL7 V3 Messages Need a Critical Midcourse Correction", stating that “HL7 must act vigorously to make Version 3 messages easier to use and more compact” (See e.g. here.)
The direct consequence of this overcomplexity is that it is much harder (and thus much more expensive) to develop software to read and write HL7-XML than it is for ODM-XML. From my personal experience (30 years) in software development, I estimate that the cost of developing softwarefor a complex HL7-XML message is at least the twentyfold than it is for ODM-XML.
Once again, therefore, HL7 chooses an idiosyncratic approach to development that is at odds with the approaches that have been tried and tested elsewhere -- with results which might have been anticipated (some of which were indeed anticipated on this blog, for example here).

As Aerts continues:

HL7-XML messages are developed in a somewhat curious way: first of all one or more UML diagrams are developed, and then the XML-Schema is derived from the UML. The UML is derived from the RIM (HL7 Reference Information Model) which currently has over 70 versions (!). Though the use of UML may be the perfect and well-established way for translating a software design to software classes, it is considered bad practice by XML specialists. Transformation of UML to XMLSchema in general leads to “spaghetti XML”, introducing unnecessary complexity. Of course it is an “easy” way: the world has much more UML specialists than it has XML-Schema specialists.
Personally, I would consider transformation of UML to XML-Schema the “lazy man's way”. The result can however be catastrophical. By the way, none of the most popular XML-based standards, such as MathML, VoiceXML, XHTML or XForms etc. have ever been developed using UML.
Aerts provides a series of examples to illustrate the problems and costs caused through use of HL7-XML, problems which are avoided if one uses XML in the standard way recommended by XML experts.

Addendum (April 10, 2009):

Some comments on Hacker News:

I don't know anything about HL7 or HL7-XML, but this sounds like letting loose people that dont know zilch about the implementation side of things. In this case HL7 is translated into UML because the people involved know UML, not XML. Then the UML is translated into XML by the push of a button, generating monstrous XML. Rant: dont let your tools substitute for personal knowledge of the domain.

How can someone not know XML when it's actually relevant to their job? It's just a tree with a fairly simple structure. Anybody who avoids learning XML because they already know UML is just not even trying. Seriously, learning it takes like half an hour at most.

HL7... what a nightmare! I remember having to work with it, and it was a convoluted solution where every provider and vendor had a different interpretation. As bad as 2.3.1 is, it's still worlds better than 3. The best thing that can happen is for 3 to be scrapped. The worst part is its model. I worked with it for the purposes of PHIN-LDM, and I've never seen a worse clusterfuck. It made dailywtf look positively logical.

I've written my own HL7 (pre-XML v2.x) message parser and generator in Java for work. I'd really like to not have to touch that code again, if possible. My code is easy enough to understand, but I don't want to have to rewrite it support this non-standard XML. Just putting XML on the name of something doesn't instantly make it all easier.

Tuesday, October 28, 2008

Information and Communication Technologies Standards in the Health Sector

A new study has been released on behalf of the Enterprise & Industry Directorate General of the European Commission on the topic of ICT standards in the health sector: current situation and prospects.

The study contains a number of interesting remarks on HL7, for example welcoming the current initiative towards greater collaboration between HL7, ISO and CEN.

But there are also critical remarks, for example to the effect that:

Small or medium-sized ICT manufacturers may not be willing to adopt commonly used standards because these are very complex and thus difficult and expensive to implement. This applies for example to HL7 version 3. It may be less costly to develop proprietary standards on their own. (p. 21)

And also this:

HL7’s v2.x standards were important steps towards standardising clinical messaging. However, several issues caused difficulties, above all different options to implement the standard. To correct this issue, the RIM was developed for v3.0, eliminating most of the implementation options. The concept behind HL7 v3.0 has been generally well received. However, the RIM caused new problems. Firstly, it is unlikely that the defined RIM classes and attributes could be applied to every domain in healthcare – which is what they are intended to do. Secondly, the RIM documentation is described as being “disastrously unclear”, poorly integrated with HL7 v3.0 documentation, and inconsistent.

Under these circumstances, it may be difficult for HL7 v3.0 to establish a large user base. ... HL7’s involvement in the joint initiative with ISO and CEN may have the objective to move faster to international adoption of HL7 standards. The outcome of this convergence work as well as the organisation’s ability to create a satisfactory RIM may determine the future importance of HL7. (p. 37f.; emphasis added)

Thursday, October 09, 2008

Does the emperor have clothes?

This set of slides from a recent presentation by Eric Browne at an Australian national e-health conference provides some reassurance that the arguments I have been making on this blog are perhaps not simply the product of my own ignorance. Summary: "Basing clinical information interoperability on the HL7 v3 RIM is severely flawed. It is too complex; the underlying principles are unproved and highly suspect. The complexity leads to bad models, glacial progress, compromised quality and safety."

Postscript: Feb. 15, 2009
A more recent post by Eric Browne documenting problems with HL7 Clinical Document Architecture (CDA) is here.

Barriers, approaches and research priorities for integrating biomedical ontologies

Alan Rector has published an impressive survey of what we can expect for terminologies and ontologies in healthcare in the next ten years. Summarizing a long document, Rector believes:
  • HL7 (a mix of versions) will dominate the standards for messaging.
  • The standard for EHRs is likely to be a combination or amalgam of the HL7 CDA and Archetype based standards from OpenEHR, CEN EN 13606.
  • Terminologies from biomedicine, particularly the Gene Ontology and the associated ontologies in the Open Biomedical ontologies consortium will become of increasing importance to clinical medicine.
  • Following the example of the bioinformatics community, open systems “owned” by their community are likely to make an increasing contribution.

Rector echoes a number of what one might be tempted to call 'philosophical' points addressed in this blog as concerns the relations between ontologies and information models. He nicely summarizes these points as follows:

The relationship between knowledge representation and ontologies remains controversial and plagued by confusion of substance compounded by loose use of language. A second closely related notion is that of an “information model” of “model of data structures”. Both Archetypes and HL7 V3 Messages are examples of data structures. Formalisms for data structures bear many resemblances to formalisms for ontologies. ... However, there is a clear difference.

  • Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician.For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a “missing” body temperature.
  • Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

A key point is that “epistemological issues” – issues of what a given physician or the healthcare system knows – should be represented in the data structures rather than the ontology. This causes serious problems for terminologies coding systems, which often include notions such as “unspecified” or even “missing”. This practice is now widely deprecated but remains common.

Under 'desirable outcomes', he lists:

The methods will become increasingly formal. The conflict between the scaling problems presented by pre-coordinated terminologies and the difficulty of maintaining consistency with post-coordinated terminologies will be overcome. To this end, the formal structure of SNOMED-CT and will be radically revised to take advantage of its purported underpinnings in description logic. HL7 v3 and/or Archetypes will likewise be reformulated to take advantage of modern technologies to ensure their mutual consistency and consistent binding to the new terminologies. Common links to terminologies from OBO and others used in molecular biology will be forged.

And under 'outcomes to be avoided':

enormous resources will be spent on over-ambitious plans for semantic interoperability that inevitably fail. In either case, communication will take place by going around rather than via the clinical information systems. In countries where it is mandated, SNOMED and HL7 V3 will become taxes on healthcare, absorbing significant resources while returning no, or in some cases negative, benefits.

The document as a whole contains a wealth of important material on HL7 V3 -- and SNOMED CT -- and on the problems associated with each. It draws special attention to HL7 15 year-long planning process designed to produce a '“version 3” that is not yet in routine use'.

Flavors of Null

Interesting post from Ananda Mohan concerning the problems created by HL7 v3's lack of support for any kind of optionality. One result of this policy is that codes need to be provided in advance to cover all cases where information is missing -- hence the multiple 'flavors of null', which, are listed by Mohan on the basis of the latest HL7 v3 ballot pack as follows:
1. NI: "no information" - this is the most general and default exceptional value. There is no information which can be inferred from this exceptional value.

2. MSK: "masked" - this particular item has a known proper value, but it cannot be released in a given context due to security, privacy or other reasons.

3. OTH: "other" - there is a value, but it is not an element in the value domain of a variable, with particular cases:
- NINF: "negative infinity of numbers"
- PINF: "positive infinity of numbers"

4. UNK: "unknown" - a proper value is applicable, but not known. In particular:
- ASKU: "asked but unknown" – information was sought from the source but not known (e.g., patient was asked but didn't know)

- NAV: "temporarily not available" - information is not available at this time but it is expected that it will be available later.

- NASK: "not asked" - the Information was not requested from the patient

- QS: "sufficient quantity" – The actual quantity is not known but sufficient enough to achieve a specific goal. For example the advice can be: add a sufficient quantity of water to 10 mg of medicine.
- TRC: "trace" – The content is too small to measure but still a non-zero value.

5. NA: "not applicable" – There is no proper value for this data item for this patient; for example, the date of the last menstrual period is not applicable for a male.
All of these “flavors” are provided for every data type. Thus for example “null” is a possible value of an integer or real alongside actual integer and real values As Mohan points out, this might lead to reasoning problems: the HL7 definition of 'Boolean', in contrast to every working formalism, implies a 3-valued logic. The null flavors are causing problems also for the (surely in any case premature) attempts by ISO to model its health data types standard (ISO 20190) on HL7 v3 data types. Organizations such as CEN have, it seems, opposed the current ISO draft, in part because of the problems generated by the usage of null flavors.
These problems were described in detail already five years ago in a document on the openEHR Data Types Information Model, the latest version of which is here:
All HL7 data types inherit from the ANY class (equivalent to the DATA_VALUE class in openEHR) which contains the attributes:
BL nonNull;
CS nullFlavor;
BL isNull;
The purpose of these attributes is to indicate whether a datum is Null, and for what reason. Since some data type classes also appear as the attributes of other data types, the Null markers also ndicate whether any part of a datum is null. ... this allows an interval with missing ends and width to exist as a structured type. The consequence of the approach is that the entire model is essentially a model of "partial" data types; any attribute and any function call may return a Null value, as well as the true values of its type (in fact, in the specification, Null values are defined to be valid values of all data types).
This design decision was taken in HL7 so that any datum, no matter how unknown, would be structurally representable in the same way as completely known data, enabling it to be processed in the same way as all other instances of the same type. However, an important object-oriented design principle has been ignored in this approach. In the proper design of classes, properties and class invariants are stated. Invariants are statements which describe the correctness conditions of instances of the class; the general rule is that the post-condition of a creation routine (constructor) of a class must be that the invariants are satisfied. For example, an invariant of the HL7 IVL class could be:
(exists(low) and exists(high)) or else
(exists(low) and
exists(width)) or else
(exists(width) and exists(high))
When an instance of this class is created, this condition should be satisfied, and remain satisfied for the life of the instance. To do otherwise is to create instances ofdata which other software can make no assumptions about, and is forced to check every single field, and then determine what to do in an ad hoc way. ... Possible consequences of the built-in Null marker design approach include:
• since even HL7’s basic types ST, INT, REAL, LIST<>, SET<> include null markers, processing of null values will be pervasive at the lowest level;

• software will be more complex, both implementations of the data types, and of software which handle them. This is because the software always has to deal with the possibility of calls to routines and attributes returning Null values. Most clinical information systems to date have taken the approach that a datum is either represented as an instance of a formal type if fully known, or else as narrative text if only partial;

• data may not be always be safely processable, since some software may not properly handle the null values associated with attributes of partially known data items.
Essentially, all software which processes the data has to be “null-value aware”, and make no assumptions at all about whether a particular data instance is valid or not.
For all of these reasons the HL7 data type model is in stark contrast with the much simpler approaches used in CEN and in openEHR.


Postscript May 23, 2011

See also now here: http://wolandscat.net/2011/05/18/the-hl7-null-flavor-debate-part-2/

Saturday, March 08, 2008

How does one refer to an organism in a microbiology report?

Is an organism an entity? Or an observation of an entity (thus, presumably, an observation of an organism)? Can it really be true that, after ten years of HL7 RIM development, the answer to this question is still not clear?

As the useful Resources page of HL7 Australia makes clear:

At first site the RIM is quite simple. The RIM backbone has just five core classes and a number of permitted relationships between them.In HL7 V3, every happening is an Act, which is analogous to a verb in English. Each Act may have any number of Participations, in Roles, played by Entities. These are analogous to nouns. Each Act may also be related to other Acts, via Act-Relationships.Act, Role and Entity classes also have a number of specialisations. For example, Entity has a specialisation called Living Subject, which itself has a specialisation called Person. Person inherits the attributes of both Entity and Living Subject.

Organism, too, is a specialization of Entity, we might reasonably suppose. Thus an organism is not a Role, not a Participation, not a Relationship, and also, we presume, not an Act. That an organism is an Entity is indeed the view embraced by advocates of HL7 in their oral discussions with me over the question whether the RIM can be taken seriously as a representation of the healthcare domain.

Not so for everyone in the world of HL7, however – at least not according to what we can infer from this:

Hi,
I have been working with people at CDC on using V3 messaging to convey microbiology reports among other things. In discussions today, the question came up of where in the Microbiology specification was the observation that identified the organism for which susceptibility results were being passed. I said, well no, the organism was indicated as an entity playing the role of isolate and participating in the "specimen observation cluster". But, I was told, the CDA hospital acquired infection report carried this as an observation, and indeed it does. Is this a problem to be addressed? Or a characteristic of V3 to be managed? It does seem clear that the two specifications have been underway in parallel [*], so it is, if not pointless at least difficult, to say which should be allotted precedence. What ideas do people have?
Mead

*This is exactly the thesis defended here.

Sunday, March 02, 2008

News from Stockholm

More from Stockholm County Council, and its ambitious healthcare IT system, the GVD, sometimes advanced as a success story of HL7 V3:

We chose Oracle Healthcare Transaction Base because it complies with the worldwide HL7 standard for clinical data, and because it comes from a major international company, committed to supporting, developing, and refining the product over time. When we conducted a market evaluation, Oracle also came in at the right price. – Jack Robinson, IT Manager, Stockholm County Council
In an article entitled "Missarna som knäckte GVD" (roughly: Flaws in the Cracked GVD"), Madeleine Bäck reports on the recent history of the GVD project, which continues to move from crisis to crisis:

Heavy criticism is directed towards the choice of storage system for GVD, the so-called HTB database, which was acquired from WM-Data and its partner Oracle in 2004. 'Our pilot tests point to catastrophic performance when loading data to the system. We also observed that it would be incredibly complicated and expensive to adapt HTB to the GVD', explains one involved person (who however chose to remain anonymous). The suppliers who built the GVD are aware of the criticism, but they do not agree with all aspects: Pia Kullstrom, head of Public Sector and Healthcare at WM-Data, pushes back specifically as regards criticism of the HTB system. This is not based on facts she claims, but rather on people having a different product-religion.

I am told that features of the GVD marked out as problematic include:

1. The poorly functioning BAT&Portal (for authentication and authorisation services), which uses HL7's CCOW (Clinical Context Object Workgroup) standard protocol, and is supposed to be a web-based, single point of entry and single sign-on access route to the different parts of the system.

2. For writing data to HTB the performance is 'still horrendous', even though Oracle re-wrote the whole implementation for reading data through their API after Stockholm had already accepted the HTB product.

3. GVD has a strongly centralized architecture, but its protagonists did not address the question of how to handle the legacy systems during the transition period. Many of the latter are fully functional, mission critical, clinical systems. Centralized architectures, in which the attempt is made to consolidate semantically non-interoperable data from hundreds of databases into one, are show-stoppers.
As a whole, GVD is a classical "big bang" project, where the thinking has been quantitative, not qualitative, and the Stockholm political leadership has admitted that GVD is an "IT-fiasco".

But the responsible civil servants remain in denial, and there have not as yet been any signals to the effect that they are going to back down from HTB. This raises one further problem: GVD has Oracle HTB, and thus HL7 V3, as central component. One can state with high confidence that HL7 V3 is not going to be the standard at national level for interchange of clinical data in Sweden. So what is Stockholm going to do?

Sunday, February 17, 2008

The weight of the baby

HL7 RIM, as we have pointed out on too many occasions, confuses observations with the entities observed. To illustrate this confusion once again, we provide the following scenario, from Werner Ceusters (WC), with reactions from Dan Russler (DR), as they appeared on the HL7 vocabulary list. (Dan played a key role in the development of the HL7 Reference Information Model.)  We added some small clarifications and corrected some spelling errors (and perhaps, by trying to work our way through the numerous levels of comments on comments, we might have overlooked some dependencies). Dan and Werner are free to suggest corrections. (To access the Archives of HL7 lists one can go to: http://www.hl7.org/listservice)

Here we go:

WC to DR: I am in a delivery room, and there is that baby and his weight. When I want to register something about that baby's weight, I will use the symbol "#w-1234" to denote that baby's weight. The numbers are there to differentiate that baby's weight from some other baby's weight. The baby's weight is something that has different values at different times. The baby's weight endures through time. It is a continuant.

I want to obtain a value for the baby's weight, for which I have to perform an act of measurement, for which I will use the symbol "#m-5678". The ID numbers are there to differentiate that measurement from other measurements I might perform, even from the measurement which I might do a few minutes later when I weigh the same baby for a second time. The act of measuring occurs at a time. It is an occurrent.

The act of measuring gives me a magnitude, which in this case is something we have been taught to register as "4.7 kg". Now registering that (entering "4.7" into a computer, for instance) is an act in its own right, and when I want to refer to that registering act, I will use the symbol #r-881 (you get the picture, I hope).

Now you (Dan) seem to argue that I am not allowed to assign some of these symbols, though I can't figure out from your comments which one(s) precisely you object to. You should clarify. Here are the symbols again, for easy reference:

#w-1234 : THAT baby's weight
#m-5678 : THAT process (which occurred during THAT time) of measuring THAT baby's weight
4.7 kg : the value obtained for THAT baby's weight through THAT process
#r-881 : the further process of entering the obtained value in some record
DR: The baby is a collection of molecules ...

WC: Sure; but that is not relevant here.

DR: Why isn't the fact that the baby is made of molecules relevant? ... The baby wouldn't have a weight without being made of molecules.

WC: If the task is to register the magnitude of that baby's weight in a record, there is no reason to mention his molecules. We also don't talk about those other molecules on the side of the Earth that attract the baby's molecules, do we ?

DR: The weight is the measurement of the force of attraction between the baby's molecules and the earth at that location and at that point in time.

WC: "No" on several things. First, the weight of the baby is not a measurement at all.

DR: You will need to go back to the physics definition of weight. Objects have mass, which create a mutual force of attraction.

WC: Thanks for this lesson, again; although I said in my mail that I knew that. It is simply not relevant.

DR: Weight is simply the concept of putting a scale in between two masses and measuring the force of attraction between two masses, the baby and the earth. You could abstract out the idea of measurement, but then we would just say "force of attraction," not weight. Of course the force of attraction between the baby and the earth is less in Denver than at sea level. So perhaps you could alter your discussion to match the real physics of the situation?

WC: That doesn't change anything to the simple task at hand: putting a baby on the scale, and reading what the instrument gives as weight. Dan, stick to the topic.

DR: Now this illustrates the problem with comparing "your reality" to "my reality."

WC: There is only one reality, but we can describe it in different ways. Your way seems to be to lump important things together (e.g. that baby's weight and my activity of measuring that weight), and to add irrelevant things (such as the molecules of the Earth)

DR: You invent things in your mind in the discussion below that I have not invented in my mind.

WC: I didn't invent anything. I gave a description of a simple scenario and I introduced 4 symbols. You started to mix and confuse the symbols, and bring in others.

DR: When you communicate what you invent, you use words that mean something different to me (and to many other people). Although I respect the inventions of your mind, I don't understand them, and when you explain them, I don't always agree with them.

WC: Then, if you still don't understand the scenario I described, and what the four symbols stand for, give me a language in which I can describe it so that you will understand it.

There is something that you can measure (the baby's weight), and an act of measurement, which is a different thing. The value for the weight that you obtain by measuring it is yet another thing. The word "measurement" is often used ambiguously to denote the last two things: the measuring act and the value obtained through the measuring act. It would be good for everybody's understanding not to use language ambiguously in this way. (Compare it with the noun phrase "the cut" which is used for both the act of cutting and the gap that results from this act.)

[Ceusters here tries to describe what at first might seem hard to swallow: the weight of this baby is the same entity from one time to the next. It is an enduring attribute of the baby. Certainly the baby's weight changes over time, but then so also does the baby. But just as the baby stays the same individual from one time to the next as it changes, so also does its weight. The baby's weight is a continuant. At any point in time, that weight has a precise magnitude, but which magnitude this is changes from one time to the next. - BS]

The discussion continues:

WC: All that gravity stuff is irrelevant here, because if I or a nurse put a baby on a scale to measure it's weight, I know what I'm measuring. I was and I am not talking about other stuff you can measure.
Again, I will use the symbol "#w-1234" to denote that baby's weight.

DR: Werner has in his mind the idea of "the force of attraction" and created a symbol #w-1234 to communicate the force of attraction between the baby's molecules and the earth at that point in time.

WC: No on several things, again: I had nothing regarding forces of attraction in my mind. In fact, I was describing a concrete delivery room situation.

DR: Doesn't gravity exist in the delivery room? Gravity sounds pretty concrete to me.

WC: Sure, but this fact is irrelevant. There are at least a billion other things in that room, such as the molecules in the door knob. I used explicitly demonstrative particles to make it clear: There is "THAT" baby. I'm not talking about conceptual representations of babies and delivery rooms, but simple things: babies and rooms.

"#w-1234" is the symbol that I use to denote that baby's weight in a description, NOT the magnitude of the weight.

DR: Now this symbol "#w-1234" has attributes associated with the symbol, such as:
time (since the weight will vary with time);
location (since the weight will vary with the location, e.g. altitude);
identity of the baby (weight will vary with different babies).
WC: Many more "no's". The symbol "#w-1234" may have attributes associated with it, but I did not talk about that.

DR: on the contrary, you gave the symbol attributes "the symbol #w-1234 ... to denote that baby's weight". To transform to a formal propositional grammar:
the symbol #w-1234 dentotes the weight of that baby
object of the predicate: "that baby"
predicate: "denotes the weight"
WC: Dan, the symbol I introduced here is "#w-1234". I told you what the symbol stands for: that baby's weight. You can decompose that in as many ways as you want and I can give fifteen other grammars and NLU paradigms for you to analyse the way I said things; but you must at some stage get back to the topic. We are not analysing the lines of text that I produced, we are analysing what the lines of text try to convey. But the fact that you insist on analysing it this way, demonstrates that you are not able to get passed the language level.

Let's try it this way: "#w-1234" is a symbol. It is composed of the characters "#", "w", etc. I use it in a language to refer to / to stand for that particular baby's weight, a particular entity in reality, not an element of a language. I could use another symbol for that baby's weight. Thus I could use the symbol "that baby's weight'. I could then say "that baby's weight" stands for that baby's weight.

Attributes of the symbol may be the number of characters, the font used, etc. I am not studying symbols for the sake of this discussion. But interestingly, Dan, it is becoming clear at this point that you (perhaps because of the inadequacy of the representation language you choose to use) confuse the symbol with what the symbol stands for, a common mistake made by people who misunderstand semiotics, and a confusion which pervades the entire RIM, as we and others have shown.

Dan, it is surely the case that the words I used in the language of that paragraph are symbols, but we are not talking about the language in that paragraph. We are talking about the state of affairs in reality of which that paragraph is a linguistic representation. And you darned well know it, Dan, come on.

DR: I just tried to understand what you mean (perhaps not always in the same way as what is in your mind) or what the relationship is in your mind to what is occurring in the delivery room.

WC: I told you: that is irrelevant. That is "Wusteria". If I am talking to you about my mother, then I am not talking to you about some neuronal blurb in my brain. I am talking about my flesh and blood mother.

DR: If there is a problem with me understanding what you created in your mind, is that your problem, my problem or both our problems?

WC: If I was obscure or ambiguous, it would be my problem. But I was, repetitititititively, quite specific.

DR: Surely, my difficulty in understanding what you invented doesn't change what happened in the delivery room!

WC: The weight of the baby will change over time, the symbol will not. You may perhaps not like my symbol and would use another one. That is fine with me, as long as we make it clear for each other that these symbols denote the weight of THAT baby, not the magnitude of THAT baby's weight.

Thus, again, I use the symbol ONLY for THAT baby's weight, not for any other baby's weight. I made that quite clear, but you ignore it, and I am interested in knowing why you ignore it. I repeat thus about that symbol: "The numbers are there to differentiate that baby's weight from some other baby's weight."

DR: I am happy to be corrected on what you meant to say. Can you explain how I know how to figure out the baby's identity and the name of the force of attraction (not the magnitude of the force as you suggest) from the symbol #w-1234?

WC: Excuse me ? Do you not know how a baby looks like ? If I am in a delivery room, and they ask me to take the weight of that baby, I don't think I would grasp some bucket and measure its diameter. Or do you mean literally what the name is of that baby? I don't see what that has to do with that baby's weight. It has to do of course with putting the value in the right record. But again, that adds nothing here to the discussion.

DR: Symbol #w-1234 has common name: "weight"

WC: No! that symbol denotes the weight of THAT baby. The symbol itself certainly does not have the common name "weight". It might be given the common name "symbol" though. You can use the common name "weight" to denote that baby's weight, but that is imprecise and may lead to exactly the sort of confusions that you exhibit.

DR: I can agree to narrowing down my understanding of #w-1234 if you can teach me how to make sure the symbol unambiguously represents the baby's identity and other things that affect the force of attraction between the baby and the earth.

WC: Because I told you. "#w-1234" is the symbol that I use to denote that baby's weight. If you were there, I would have pointed to the baby. If not, I could show you a picture, or you would get other information related to the parents, etc. All that information could be put in some symbol dictionary or look-up table. And no, the idea is NOT to infer it from the form of the symbol itself. I think that the notion of no meaning IN the code is broadly accepted.

DR: Symbol #w-1234 has definition: "the force of attraction between the baby's molecules and the earth at that location and at that point in time

Symbol #w-1234 "has location: delivery room "latitude-longitude-altitude"

Symbol #w-1234 has time: TS

Symbol #w-1234 has baby: identity of baby

WC: No ! I did not give a definition. And if I would have done so, it would have been a quite different one.

DR: Since a physicist, or a physician, would give the definition I inferred from your use case, which would you supply?

WC: That is irrelevant in this case. We are talking about the simple notion of weight.

DR: Once the symbol "#w-1234" is created in Werner's mind and written down and re-created in my mind ...

WC: The symbol is on some bearer medium. Whatever happens in my brain is irrelevant here. The symbol is for sure not "created" there. Something will happen there, of course, some state of affairs involving neurons and neuro-transmitters and so forth, and there is some relationship between the symbol on the medium and the state of affairs in my brain. If I would wish to say something about that particular state, I would use another symbol for that. At that point, I could imagine that you would say: "Ah, you see, you assign another symbol to that symbol", but if you did, then clearly you did not get the point.

DR: We can communicate using the symbol.

WC: Right, and independently of whatever our brain does in this case, because we agreed (I hope, finally) that we use "#w-1234" ONLY to refer in descriptions to THAT baby's weight (NOT its magnitude, NOT the act of measuring in order to determine this magnitude, ...)

DR: However, the weight is still the force of attraction between the baby's molecules and the earth.

WC: probably, but irrelevant.

DR: Not irrelevant to me, because the force of attraction, the earth, the baby, the people like you and me, are the only things really existing in my reality. Everything else is made up in your mind. What is in your mind is important to me, but I don't confuse what is in your mind with my reality.

...

DR: When you say "obtain a value for the baby's weight, for which I have to do a measurement" you add a new attribute to the symbol for weight: "value"

WC: no ! In the sentence above, I didn't mention or introduce another symbol at all. The symbols I introduced were:
#w-1234 : THAT baby's weight
#m-5678 : THAT process (which occurred during THAT time) of measuring THAT baby's weight
4.7 kg : the value obtained for THAT baby's weight through THAT process
#r-881 : the process of entering the obtained value in some record
I did not introduce the symbol "value".

DR: Your whole paragraph is made up of symbols. Above, you just pick out several symbols from the paragraph and throw away the rest.

WC: Try not to confuse the readers by throwing in another level of symbols. I was quite specific about what the symbols I was talking about are. All, except the "4.7 kg" started explicitly with #.

DR: In any case, I now see you added to your story the term "process".

WC: ... in a humble attempt to make you see the difference between (1) what is to be measured, (2) measuring itself, (3) and the value obtained through the measuring.

DR: ... and defined 2 processes:

1) #m-5678 : THAT process (which occurred during that time) of measuring THAT baby's weight
2) #r-881 : the process of entering the obtained value in some record

WC: I didn't define these processes. In the scenario, these are two processes which are relevant to our discussion (because you confused them) and for which I introduced two different symbols.

DR: It is helpful for me to know that you feel these are two processes, which represent the movement between three states. Thank you for the extra detail. In my mind, that communicates the standard state transition model where process describes the movement from state to state.

WC: I am not responsible for the limitations in your language of choice, i.e. the standard language of state transition models. If you don't get the right results by using that state transition stuff when addressing this scenario, then don't blame me; blame your language.

DR:
State 1: pre-condition to Process #1
Process #1: #m-5678 : THAT process (which occurred during that time)of measuring THAT baby's weight

4.7 kg : the value obtained for THATbaby's weight through THAT process
State 2 is both the post-condition of Process #1 and the pre-condition to Process #2
Process #2: #r-881 : the process of entering the obtained value in some record
State 3: is the post-condition of Process #2.

DR: You said, "When I want to register something about that particular measurement, I will use the symbol #m-5678".

Here you identify the measurement with a symbol and fill in the value with a symbol:
#m-5678{symbol #w-1234
has common name: "weight"
has definition: "the force of attraction between the baby's molecules and the earth at that location and at that point in time ...
The act of measurement, which gives a magnitude which in this case is something that we have been taught to register as '4.7 kg' , he identifies with the symbol
r-881
has location: delivery room "latitude-longitude-altitude"
has time: TS
has baby: identity of baby
measurer: Werner
value: #r-881
WC: No ! Again, you confuse the measurement (the act of measuring) with the weight of the baby. It is for THAT measurement act, that was performed to get a value for THAT baby's weight, that I use the symbol "#m-5678". I can then use that symbol to document, for instance, that the measuring act took 55 seconds to perform.

Furthermore, I didn't fill in any value.

Furthermore, you erroneously equate #r-881 with that weight, which I clearly (or so I thought) explained r-881 to be the act of registering the obtained value in a record. I wrote that:
The measurement gives me a magnitude which in this case is something that we have been taught to register as '4.7 kg'". Now that act of registering (entering "4.7" into a computer, for instance) is an act in its own right, and when I want to refer to that act, I use the symbol #r-881 (you get the picture, I hope).
But clearly, you didn't get the picture, confusing now THREE things.

DR: I must apologize for not clearly discovering what is in your mind. Again, is that my problem or our problem? If what you want to achieve is good communication, you have to be very clear for those of us who can't read your mind.

WC: You didn't have to read my mind. You only had to read what appeared on your screen after opening my message. Again, if you are not able to distinguish the symbols that I introduced from the natural language that I used to try to describe to you what they stand for, then suggest a better language.

DR: I see from your explanation of process above, that you meant to use the symbol "#r-881" to represent "the process of entering the obtained value in some record".

WC: Yes !!!!!!!! Great !!!!!!!!!

DR: ... and not the magnitude of the force of attraction. I would ask however, other than communicating the symbol"#r-881," what gets communicated along with the symbol in process #2, the process of entering the value in the record?

WC: I didn't say anything about that, as yet. But stuff that might go there is, for example, how long it takes to enter weight values in a record (interesting for comparing user interfaces from an ergonomic and effort required perspective) or who entered the data (that gives you the culprit in case of mistyping) or when the data was entered (the weight was taken at time t, but the registration at t+1).

DR: Summary: I believe these are the symbols you assigned in your own mind and communicated in your paragraph.

WC: As I explained above, your belief is wrong.

DR: Here is how I created the communication in my mind and communicated it back -- is my creation in my mind wrong from your point of view?

WC: yes, indeed. You confused several different entities.

DR:

Act.ii = #m-5678
Act.code = "3-1234 (weight--the force of attraction between the baby's molecules and the earth at that location and at that point in time)
Act.effectiveTime = TSAct.observation.value = #r-881
Act.participantMeasurer = Werner
EntityPatientRole = identity of baby
Entity.location = "latitude-longitude-altitude"

WC: On the basis of what I described above, you must identify at least 2 acts in HL7-speak, and not just one as you came up with: one is for measuring the weight of that baby, the other for registering the value obtained through the former in a record. Furthermore, there must be at least three non-act entities: the baby, its weight and the magnitude of this weight at the time of the measurement (i.e., first act). Now I can accept that this detail is not relevant for many purposes and that therefore you don't want to register these things (although then I don't understand why you consider the baby's molecules and the gravity of the earth to be of relevance here - and please, don't take the latter statement as an indication that I don't know what weight physically is about, it is simply irrelevant). But you should NOT come up with a representation as you did - in HL7-speak, I believe - that CONFUSES the different elements.

Compare with this analogy: if you take a picture of some scene (say of your best friend and his wife), and parts of the scene are irrelevant (say his wife), you can cut out these parts of the picture. The remaining parts (the picture of your friend) still depict faithfully the corresponding parts of the scene. One should not, in contrast, use some analogous technique of removing irrelevant parts if this means that the relevant parts will get distorted as well.

Your analysis of my use case was clearly wrong: I pointed out precisely where you went off track. The $60M question is now: WHY ? I can come up with several possibilities, but prefer, as Anthony asked and we agreed to, to keep the discussion germane to the issue. Thus I argue here (as before and as a few others have done earlier) that the semiotic and speech act theories and architecture of the "HL7-language" are such that they mislead even experts like you: HL7, in many cases, lets you build representations which are such that removing irrelevant detail leads to distortion of what is relevant.

DR: Certainly, I have learned more about how your mind works in this exchange.

WC: Sure ?

DR: That is important because as I communicate back to you what I heard from you, I make misinterpretations unless your language is VERY clear.

WC: Wasn't it ? I think the problem is that from the very beginning you lumped different things together. As you know, getting things-lumped-together apart is harder than lumping things together. Third time: propose me a more formal language then.

DR: Of course, our communication patterns don't change what happens in the delivery room, and we have to decide why we are bothering to communicate if it doesn't change what happens in the delivery room!

WC: I don't get your point.

DR: Perhaps you can try again to make clearer language, and we can see if you succeed in getting a more accurate return communication. I've given you some hints of how my mind works such that you can craft a more understandable version for my brain.

WC: It seems, as I indicated above, that you cannot make, or do not wish to make, links from what is IN language, to what it is in reality which the language is ABOUT. I guess that is the brain-washing effect of HL7.