ISO/IEC JTC 1/SC 34N0527

ISO/IEC logo


Information Technology --
Document Description and Processing Languages

TITLE: Narrative About the Topic Maps -- Reference Model
SOURCE: Mr. Patrick Durusau; Dr. Steven R. Newcomb
PROJECT: WD 13250-5: Information Technology - Topic Maps - Reference Model
PROJECT EDITOR: Mr. Patrick Durusau; Dr. Steven R. Newcomb
STATUS: Working Draft
ACTION: For review and comment
DATE: 2004-06-30
DISTRIBUTION: SC34 and Liaisons

Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
Network: [email protected]

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: [email protected]

Narrative About the Topic Maps -- Reference Model

Version 1.4, 2004-06-29

Table of Contents

1  Introduction
2  Subjects, Topics and Merging! Oh My!
2.1 Subjects
2.2 Topics
2.3 Merging
3  But What of the Assertion Model?
4  Conclusion

1   Introduction

The following narrative is informal. It makes no attempt to adhere to ISO styles or guidelines in its presentation of a reference model for topic maps. While recognizing that the "devil is in the details," we present this high-level view of the Topic Maps -- Reference Model (TMRM) in the hope that, having outlined the major concepts and clarified some misunderstandings, we can then proceed to realize those concepts in more formal, ISO conformant language.

2   Subjects, Topics and Merging! Oh My!

2.1   Subjects

Topic maps are a means of talking about subjects. But what is a subject? ISO/IEC 13250 defines subject as: "....any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever." This definition often appears in discussions of topic maps, but such discussions rarely consider all of its implications.

In the context of discussions about Topic Maps, the term subject clearly includes all the things we normally associate with our examples of the use of topic maps, such as people, places, works of art, etc. But subject also includes all the components of a topic map, such as associations, occurrences, roles, names, URIs, etc. Under ISO 13250's definition of subject, there are no restrictions on what's allowed to be a subject. A subject is whatever anybody chooses to talk about. Moreover, ISO 13250 places no restrictions on which XML constructs may be regarded as representing subjects.

So, if one accepts the full implications of ISO 13250's definition of subject, then it becomes clear that, inside any topic map instance, lots of subjects are being talked about, and only some of them are the subjects of <topic> elements. True, a single <topic> element has only one subject as its "invisible heart" (as 13250 poetically puts it), but the <occurrence>s inside it (for example) represent subjects, too. This raises the question: how does a 'Desperate Topic Map Processor (DTMP)' know which ones the author expected to undergo which kinds of processing and on what basis?

2.2   Topics

Topic is defined in ISO/IEC 13250 as: "An aggregate of topic characteristics, including one or more names, occurrences, and roles played in associations with other topics, whose organizing principle is a single subject." Unfortunately, that has been seen as privileging <topic> elements as a specially-privileged surrogates for 'subjects', and as justifying the exclusion of other syntactic constructs from being considered as equally valid surrogates for subjects. For example, would anyone seriously deny that an <association> is at least potentially interpretable as representing a subject which is a relationship between other subjects?

Of course, one can create a 13250-conforming topic map under the assumption that merging will apply only to surrogates that identify themselves as <topic>s. But design choices, good or bad, should not be allowed to obscure the underlying general principle that syntactic constructs can be surrogates for subjects. And topic map designers should be free to specify their design choices regarding how their uses of syntactic constructs are intended to allow them to control whether the information represented by them should participate in subject-based merging.

2.2.1   Aside on Surrogate Terminology

There is nothing particularly sacred about the syntactic surrogates that are used to compose a topic map. What has not been made sufficiently clear is that the TMRM distinguishes two types of surrogates when talking about topic maps.

Note 1: 

As Lars Marius Garshol notes in the Topic Maps -- Data Model (TMDM) proposal, "Topic maps may be represented in many ways: using topic map syntaxes in files, inside databases, as internal data structures in running programs, and even mentally in the minds of humans. All these forms are different ways of representing the same abstract structure."

The two kinds of subject surrogates distinguished by the TMRM are:

  1. Subject proxy demanders. These are syntactic constructs that are interpreted as one or more "subject proxies" (defined immediately below).

  2. Subject proxies. Each subject proxy is a surrogate for a single subject. When two subject proxies are regarded as being surrogates for the same subject, they can be merged, becoming a single subject proxy. A subject proxy is a collection of property/value pairs, including a subject identity property (SIP); SIPs are used to determine whether merging should occur.

There has been discussion in WG3 that indicates that the TMRM's use of the terminology property/value pairs has been misunderstood as somehow indicating certain limitations on the kinds of values that the properties of subject proxies can have. Perhaps the misunderstanding emanates from an assumption that "property/value pair" implies some sort of parsable syntax. However, this term was (and is) intended to mean "named value," and it is not intended to imply any constraints on the value whatsoever.

The distinction between subject proxy demanders and subject proxies is an important one, particularly if, in accordance with Garshol's observation, one is likely to encounter topic maps with divergent representations. In order to merge subject proxies that were derived from sources that are represented using diverse notations, it is necessary to distinguish each source's way of indirectly representing a given subject from the way in which the subject proxies that result from interpreting those sources indirectly represent that same subject.

2.3   Merging

In order to be able to interpret a topic map instance, a "Desperate Topic Map Processor (DTMP)" needs guidance. Without such guidance, many interpretations are possible, including interpretations that will result in a veritable blizzard of subject proxies. For example, correctly or incorrectly, the DTMP may conclude that the URIs that appear as attributes of <resourceRef> (and/or other) elements need their own subject proxies, so that it will be possible, after merging, to have a vantage point from which all of the purposes served by that URI will be conveniently knowable, just like other subjects. The DTMP shouldn't have to guess about which subjects are supposed to have proxies, and which aren't. Moreover, we can't expect people to invest in the creation of topic maps if they cannot be confident that their customers will interpret them correctly. One answer to the problem would be to establish a single standard syntax for interchanging topic maps, and a single standard way of interpreting instances of that syntax. But that answer would reduce the number of adopters of the Topic Maps paradigm to only those whose needs can be met by that particular interpretation of that particular syntax. (In any case, it's too late to choose this alternative. There are already multiple syntaxes, including two standard ones, and there are already multiple ways of interpreting even those two.) So what's the answer?

The TMRM offers an answer. That answer, as Daniel Rivers-Moore once famously remarked, has two parts:

Note 2: 

Daniel Rivers-Moore, for those new to the Topic Maps community, has been a significant contributor to XTM and NewsML, and he has been a friend to all in the community. Other duties have called him away but we labor in the expectation of his eventual return.

  1. The first part of the TMRM's answer is that both the topic map author and the recipients of that topic map should possess a disclosure of a so-called Topic Map Application (TMA) -- a set of constraints to which the interpretation of the topic map is supposed to conform. A TMA is not a piece of software; it's a set of conventions, used in some number of topic maps, regarding the properties of subject proxies; it is an "Application" of the Topic Maps paradigm.

    The property definitions disclosed in a TMA necessarily include definitions of subject identity properties (SIPs). By disclosing the details of SIPs, TMAs reveal all of the bases on which merging is supposed to occur.

  2. With a TMA in hand, we know how to express the result of interpreting any kind of information as subject proxies governed by that TMA. The second part of the TMRM's answer is to disclose the subject proxy demanders that must be recognized in the particular kind of information, and to specify the subject proxies that are demanded by those demanders.

    The number of kinds of information that can be viewed in topic map terms is boundless, and so are the kinds of syntactic constructs within them that can be regarded as subject proxy demanders. Therefore, the TMRM doesn't have much to say about disclosures of rules for interpreting information resources as topic maps, other than that the subject proxy demanders must be identified, and the proxies that they demand must be spelled out in terms of some TMA.

So, the primary focus of the TMRM is on TMAs, and specifically on what constitutes a disclosure of a TMA. Any disclosure of rules for interpreting instances a given syntax as a topic map consists of a TMA disclosure (or a reference to one), and a set of subject proxy demander recognition (and proxy construction) rules.

2.3.1   Subject proxies and their properties

As already noted, every subject proxy is a set of named values (aka "property/value pairs"), including a subject identity property (SIP).

Note 3: 

SIPs used to be called "SIDPs" -- Subject Identity Discrimination Properties.

When a TMA discloses the definition of an SIP, it must say how the values of any two such SIPs should be compared to determine whether the proxies in which they appear should be regarded as having the same subject -- whether they should be merged. According to the TMRM, all subject proxies are subject to merger with other subject proxies whose SIPs are regarded as equivalent. According to the TMRM, nothing else -- nothing other than subject proxies -- is subject to merging.

Note 4: 

Please note the words "regarded as" in the preceding paragraph. These words are intended to constitute a kind of disclaimer. Subjects are beyond the ken of topic maps. Indeed, subjects can only be grasped by conscious minds. Any comparisons of SIP values, and any merging of subject proxies that may be done as a result of such comparisons, are merely the operations of rules defined by the governing TMA. It's useful and accurate to think of a TMA as a disclosure of tricks that let computers give the illusion that they can recognize when two subject proxies are proxies for the same subject. Such tricks can be arbitrarily complex, so it's possible to make the illusion pretty convincing, robust, and useful, but, still, it's only an illusion.

Note 5: 

The terminological change from Subject Identity Discrimination Property to Subject Identity Property represents a recognition by the editors of the TMRM that the word discrimination was subtly misleading. The result of comparing of two SIPs can be either a determination that their respective subject proxies should be merged, or no such determination. The result of such a comparison cannot be, "These proxies cannot be merged", so it's inaccurate to say that SIPs discriminate between subjects. To convey, in formal terms, the idea that SIPs do not have the power of subject discrimination, we might say: "The significance of SIPs being non-equivalent, beyond non-merging under the rules of a particular TMA, is undefined."

TMAs define classes of properties (including classes of SIPs). Subject proxies consist of instances of those defined property classes, including instances of SIPs. By analogy, consider the merging defined by the TMDM for merging topic items (TMDM, 6.2 Merging topic items). The TMDM does not define any particular property value but rather a class of property values, which are used instead of some other class of property values, to determine if two topic items merge or not. In other words, when a TMA discloses a comparison algorithm for an SIP class, the algorithm only applies to the values of two instances of that class.

Note 6: 

If the meaning of the foregoing paragraph seems too obvious to you to be worth mentioning, that's good. It means that you probably understand what it's saying.

In addtion to SIPs, subject proxies may also have other properties (OPs). The difference between SIPs and OPs is that, unlike SIPs, OPs are not used to detect subject sameness. To put it more accurately: instances of OPs are not compared in order to determine whether subject proxies should merge under a given set of rules. Like SIPs, classes of OPs are defined by TMAs.   Aside on 'Built-in' versus 'Conferred'

Another source of confusion in the TMRM appears to have been the distinction between built-in and conferred property values. To author a topic map is to create subject proxies whose properties have built-in values, so at least some of the subject proxies in all topic maps will necessarily have built-in properties. There is no compulsion to use the "conferred property" concept.

The difference between "built-in" and "conferred" is in how a property instance got its value. If the value was conferred, then the value emanated from the operation of some rule -- perhaps an inference rule -- defined by the TMA. If the value was "built-in", then it was authored; it is a "given".

Some of the confusion may have arisen because TMAs can require that the topic maps that they govern will always include certain predefined subject proxies, and the properties of such predefined subject proxies are also "built-in". Such pre-authored subject proxies are needed by certain kinds of topic maps in order to 'bootstrap' their value-conferring machinery and/or other logic. (In the foregoing sentence, "certain kinds of topic maps" means "topic maps that are governed by certain TMAs".)

The notion of "conferred property" is slightly more complicated than the notion of "built-in" but only marginally so; the only added complexity is the rule that does the conferring. Such rules can be very simple, or very complex, or anywhere in between.

Values can be built into both SIPs and OPs, and they can also be conferred on both SIPs and OPs. If a value is conferred on an SIP, an interesting thing happens: a subject proxy springs into existence, and it has the conferred-upon SIP as its SIP.

Walking through an example of the conferring of a property value will help illustrate both the concept and its utility:

  1. Subject proxy "A" is for a subject that is a particular human being (let's call her "Mary").

  2. Subject proxy "A" has an "other property" (an OP). The name of this OP is "husband", and its built-in value is subject proxy "B". The subject of subject proxy "B" is a particular human being (let's call him "George"). The semantics of the "husband" property, as defined by the governing TMA, are such that the meaning of all this is that Mary is married to George.

  3. The governing TMA has the following rule: "If a subject proxy has a 'husband' property that has a value, then the proxy that is its value shall have conferred upon it a 'wife' property whose value is the subject proxy that has the 'husband' property." The effect of this rule is to cause wife proxies to be directly accessible from the contexts of corresponding husband proxies, whenever husband proxies are directly accessible from the contexts of their corresponding wife proxies. In this example, using a conferring rule is a way of gaining the convenience of making information redundantly and directly available in the context of multiple proxies, but without having to author or maintain the information redundantly.

Perhaps the single biggest source of confusion about "built-in properties" vs. "conferred properties" is that, for some reason, people think this distinction is inherent in the property class, whereas in fact it is only applicable to property instances. It is always meaningful to ask whether a particular value component of a property instance of a subject proxy is built-in or conferred; it is always one or the other. By contrast, it may not be meaningful to ask whether a TMA has defined a certain property as being built-in or conferred; TMAs can be defined in such a way that instances of both SIPs and OPs can have both built-in and conferred values. Indeed, it is even possible, within a single property value, for some of its components to be built-in, while others are conferred.

3   But What of the Assertion Model?

Yes, discussion of the TMRM would be incomplete without charts, arrows and lengthy tables on the assertion model.

Actually, upon reflection, the assertion model is part of a TMA. Indeed, since TMA modules are themselves TMAs, the assertion model can itself be defined as a TMA, and incorporated in other TMAs, as a module, by reference. Note that prior presentations of the assertion model have emphasized how following the assertion model gives rise to c-topics, t-topics, etc., and definitions of the SIPs as well as OPs for each such topic. (Topic is the term that presentations of the assertion model have used for subject proxy.)

The editors still consider the assertion model, as already described in drafts of the TMRM, the best available approach to the problem of reifying (proxifying?) relationships, in the spirit of 13250's <association> elements and other "topic characteristics". We remain convinced that benefits will accrue to adopters of the paradigm who also choose to adopt the assertion model. However, decisions as to whether to reify relationships, and, if reification is desired, the decisions as to the exact way in which relationships will be reified, can only be made by topic map designers. Removal of the assertion model from the TMRM greatly lessens the complexity of the TMRM and allows it to focus more clearly on the need to disclose the SIPs/OPs of subject proxies and the merging rules that govern them.

4   Conclusion

The efforts within WG3 to restate ISO 13250 have proceeded from two very different perspectives. One, that of the TMRM, sees the restatement as providing a way to describe the subject proxies found in the many forms of topic maps alluded to by Lars Marius Garshol. The other, that of the TMDM, sees the restatement as a means of putting "boots on the ground" and providing much needed guidance towards making topic maps in a particular syntax available. Both are needed and each has much to offer the other.

Differences in perspective and terminology have made what are deeply intertwined enterprises appear to be separate and possibly even freestanding efforts. In fact, however, nothing could be further from being the case. The TMRM without more is an interesting intellectual exercise and the TMDM, without the TMRM (or its equivalent) would limit the contexts in which adoption of the standard is an attractive option.

The restatement of ISO 13250 should provide both the means for early adoption of topic maps as a technology as well as the basis on which topic maps can continue to expand as particular approaches and technologies for implementation evolve in the future. The goal of the TMRM is to support both of those objectives.