ISO/IEC JTC 1/SC 34N0529

ISO/IEC JTC 1/SC 34

Information Technology --
Document Description and Processing Languages

TITLE:	A Proposed Foundational Model for Topic Maps
SOURCE:	Mr. Lars Marius Garshol
STATUS:	Proposal
ACTION:	For review and comment
DATE:	2004-07-22
DISTRIBUTION:	SC34 and Liaisons
REPLY TO:	Dr. James David Mason (ISO/IEC JTC 1/SC 34 Chairman) Y-12 National Security Complex Bldg. 9113, M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 Network: [email protected] http://www.y12.doe.gov/sgml/sc34/ ftp://ftp.y12.doe.gov/pub/sgml/sc34/ Mr. G. Ken Holman (ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada) Crane Softwrights Ltd. Box 266, Kars, ON K0A-2E0 CANADA Telephone: +1 613 489-0999 Facsimile: +1 613 489-0995 Network: [email protected] http://www.jtc1sc34.org

A Proposed Foundational Model for Topic Maps

Abstract

This document contains a proposal for a Foundational Model for Topic Maps, intended to meet a specific set of goals, and with some thoughts on how the model may fit into the current family of standards. Due to time constraints the proposal is unfortunately incomplete.

1 Introduction
2 Background
3 Representing TMDM
4 The role of the foundational model
5 Objections

1 Introduction

This is a proposal for a Foundational Model for topic maps that is intended to meet the following goals:

It should be simpler than TMDM.
It should be able to fully represent TMDM without loss of information.
It should be suitable as a common foundation for TMCL and TMQL.
It should be sufficiently formal to appeal to an academic audience.

The first three goals have been met to the author's satisfaction, although whether it could be made to meet the final goal the author is unable to judge. Also, whether the proposal meets the goals of the ISO committee are not clear to the author, who is uncertain of what these might be.

The label "Foundational Model" has been chosen for several reasons. First of all, to emphasize that this is not the Reference Model. Secondly, to emphasize that what the author is looking for is "a second model for topic maps," which may or may not be one that we already have.

2 Background

This proposal originally came out of the author's search for an efficient representation of topic maps, as well as attempts to bridge the RDF/TM gap. The easiest way to understand it may be to start by contemplating RDF triples, such as the one shown below.

(subject, property, object)

This model can easily represent the assignment of topic names to topics using (topic, topic-name-property, string-value), but fails to account for reification, scope, and variants. Similarly, it can easily represent occurrences using (topic, occurrence-type, uri/string-value), but again fails to account for scope and reification. Finally, the only way to represent associations in this model is to turn them into full nodes with each role player connected in by a triple of its own.

The solution adopted by this proposal is to assign each statement an explicit identity, carried as the fourth element in the tuple. Thus, statements in this model would look as follows:

(subject, property, statement-id, object)

The model is further extended to treat statement identities in the same way as other identities, which means that statements can be about other statements. This provides direct support for reification. This allows scope and variants to be expressed as shown below.

(topic, topic-name-property, X, "name string")
(X, scope-property, Y, theme)
(X, variant-property, Z, variant)

Occurrences can be expressed in a similar way, scope and reification being handled in much the same way. Associations must still receive an identity of their own, but scope and reification problems are now solved.

3 Representing TMDM

Using this model, plus a fixed TMDM vocabulary, the XTM file shown below would produce the following model.

<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink">
  <topic id="lmg">
    <baseName>
      <baseNameString>Lars Marius Garshol</baseNameString>
      <variant>
	<parameters>
	  <subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#sort"/>
	</parameters>
	<variantName>
	  <resourceData>garshol, lars marius</resourceData>
	</variantName>
      </variant>
    </baseName>

    <occurrence>
      <instanceOf>
	<subjectIndicatorRef xlink:href="http://psi.ontopia.net/xtm/occurrence-type/description"/>
      </instanceOf>
      <resourceData>The person who wrote this document.</resourceData>
    </occurrence>
  </topic>

  <association>
    <instanceOf>
      <subjectIndicatorRef xlink:href="http://psi.example.com/employed-by"/>
    </instanceOf>

    <member>
      <roleSpec>
	<subjectIndicatorRef xlink:href="http://psi.example.com/employer"/>
      </roleSpec>
      <topicRef xlink:href="#ontopia"/>
    </member>

    <member>
      <roleSpec>
	<subjectIndicatorRef xlink:href="http://psi.example.com/employee"/>
      </roleSpec>
      <topicRef xlink:href="#lmg"/>
    </member>
  </association>
</topicMap>

From this XTM file we'd get the model shown below. Numbers are model-internal identities; their only property is the ability to compare two identities to determine whether or not they are equal. Upper-case literals are also identities but refers to identities that form part of the TMDM vocabulary; human-readable identifiers are used here for readability.

(1, SOURCE_LOCATOR, 2, "file://example.xtm#lmg")
(1, TOPIC_NAME, 3, "Lars Marius Garshol")
(3, VARIANT, 4, "garshol, lars marius")
(4, SCOPE, 5, 6)
(6, SUBJECT_IDENTIFIER, 7, "http://www.topicmaps.org/xtm/1.0/core.xtm#sort")
(1, 8, 9, "The person who wrote this document.")
(8, SUBJECT_IDENTIFIER, 10, "http://psi.ontopia.net/xtm/occurrence-type/description")
(11, TYPE, 12, 13) /* 11 is an association */
(13, SUBJECT_IDENTIFIER, 14, "http://psi.example.com/employed-by")
(11, 14, 15, 1)
(14, SUBJECT_IDENTIFIER, 16, "http://psi.example.com/employee")
(11, 17, 18, 19)
(17, SUBJECT_IDENTIFIER, 20, "http://psi.example.com/employer")
(19, SOURCE_LOCATOR, 21, "file://example.xtm#ontopia")

The principal thing to notice here is the compactness and uniformity of the resulting model. 15 quadruples is enough to represent the entire thing.

4 The role of the foundational model

If we do create a second model for topic maps we need to work out how it is to fit in with the existing pieces of the puzzle, such as TMDM, XTM, CXTM, TMQL, and TMCL.

The thinking behind this model is that it will be defined as a model in its own right, with a TMDM vocabulary, and a transformation from TMDM to the foundational model. The definition will probably live in a corner of the TMQL specification and be part of the machinery used to define TMQL without the model itself being normative in any real sense. Further, TMCL can then use bits from the TMQL specification (the model, the vocabulary, the semantics) for its own purposes. XTM and CXTM would continue to be based on TMDM as they are today.

The diagram below shows this graphically:

The use of this model from within TMQL could be made very simple. Essentially, the core data model of TMQL could be sets of tuples (of varying arity). If we take tolog as the model then a magic predicate (not visible in the language) called '_quadruple' can be defined, and all other built-in predicates (except /=) defined using this. To wit:

type($X, $TOPIC) :- _quadruple($X, TYPE, $Q, $TOPIC).
topic-name($TOPIC, $NAME) :- _quadruple($TOPIC, TOPIC_NAME, $NAME, $V).    
variant($TNAME, $VARIANT) :- _quadruple($TNAME, VARIANT, $VARIANT, $V).

The 'direct-instance-of' and 'instance-of' predicates become more complex, but only marginally so. The path language component of TMQL could be compiled down to the base predicates, and additional non-topic map predicates/functions easily defined.

The TMQL core itself needs only a few operations:

Apply a predicate to an input tuple set and produce an output tuple set that is the result of applying the predicate.
Chaining two predicate applications. (AND. Trivial.)
Union of two tuple sets. (OR. Trivial.)
Set difference of two tuple sets. (NOT. Trivial.)

We may need to add additional operations for counting, projection, and SATISFIES/EVERY, but perhaps not.

The idea is that TMCL will make use of TMQL, and thus whether it needs to interact with the TMQL core directly is not clear.

5 Objections

Robert Barta has already raised the objection that while this model can represent everything that TMDM can represent, it does not preserve the constraints of the TMDM model. This is of course perfectly true, but the question is whether it is a relevant objection.

Since the thinking behind this proposal is that TMDM remain as it is and where it is, it is not necessarily a problem for the foundational model to not have the constraints in it. The constraints will be provided by TMDM, and the foundational model will be specified as a transformation from TMDM to the set of tuples. The query language and constraint language can operate on the set of tuples, safe in the knowledge that they must already conform to TMDM. (Obviously, we must ensure that topic map construction part of TMQL cannot produce instances of the foundational model that are not valid TMDM instances. It would be nice to avoid this part.)

Another potential objection is that the TMRM does (or has the potential to do) a number of things that this model does not do at present. This is again true, but again it is difficult to judge whether this is a problem. Unfortunately, it remains an open question what it is we want "the second model" to do.