ISO/IEC JTC 1/SC34 N0229

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

Title: A Topic Map Data Model, An infoset-based proposal
Source: Lars Marius Garshol
Project: TMQL
Project editor: Lars Marius Garshol, Hans Holger Rath
Status: Personal contribution
Action:
Date: 18 June 2001
Summary:
Distribution: SC34 and Liaisons
Refer to:
Supercedes:
Reply to: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mailk: mailto:[email protected]
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
11 West 42nd Street
New York, NY 10036
Tel: +1 212 642 4976
Fax: +1 212 840 2298
E-mail: [email protected]

A Topic Map Data Model

An infoset-based proposal

By: Lars Marius Garshol
Affiliation: Ontopia A/S
Date: $Date: 2001/06/18 09:55:41 $
Version: $Revision: 1.6 $

Table of contents

1. Purpose and scope

This document defines an abstract model for topic maps which makes explicit the implicit data models of ISO 13250 and XTM 1.0. It also defines a processing model for XTM 1.0 based on the data model.

The model is intended to present one possible approach to specifying a data and processing model for topic maps, believed by the author to be preferrable to other proposed approaches. It is hoped that this model may represent a first step on the way to a complete model for topic maps. Such a model would serve many purposes:

This document is not complete; it is an early draft intended to show a possible approach to defining the topic map model. In particular, this document has no official standing whatsoever. It is, as stated above, just a draft proposal.

In particular, more work is required on the following:

Steve Pepper provided many useful comments on this document. It is also informed by previous work done by Geir Ove Grønmo and Kal Ahmed.

The copyright in this document is owned by Ontopia AS. It may be redistributed and copied without restriction provided that it remain unchanged. Derivative works may be created freely.

2. A data model for topic maps

The abstract model for topic maps here presented is inspired by the XML Infoset, and uses a similar system of information items with named and typed properties. This approach has been chosen in order to avoid the problem inherent in any formalism: that the formalism must first be learned.

Some notes on the data types used in this model:

Resources

Resources, being external to the topic map itself, are not described in this model. Instead, they are represented by their locators, which are the only aspects of them visible in this model.

Resources are considered equal if their locators are equal.

Strings

In this data model strings are sequences of abstract Unicode characters. Processors may, at user option, normalize the strings, but this is not required by this specification.

Strings are considered equal if they consist of the exact same sequence of abstract Unicode characters. This implies that all comparisons are case-sensitive.

Sets

Sets are collections of unordered values that contain no values considered to be equal to each other. This implies that attempts to add values already in a set will be ignored.

Two sets are considered equal unless there exists a value in one set for which no equal value can be found in the other.

Null

The null value is used to indicate that properties have no value. It is distinct from the empty string, the empty resource, and the empty set.

Null is only considered equal to itself.

2.1. The topic map information item

A topic map information set has exactly one topic map information item, which is the hub item of the information set, from which all other parts can be reached. This item represents the entire topic map.

The topic map information item has the following properties:

[base locator]
This is the base address of the topic map itself, representing the location where the topic map is stored. This might be its address on the web, on disk, or in a database. The value may be null. A base locator may be assigned by implementation-specific means.
[source locators]
This is a set of locator information items representing the topic map information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the topic map information item by means not described by this specification.
[topics]
This is the set of topic information items contained in this topic map.
[associations]
This is the set of association information items contained in this topic map.

2.2. Topic information items

A topic information item represents a single topic in a topic map. The uniqueness constraint rules in this model are intended to ensure that topic information items whose subjects can automatically be shown to be the same are merged into a single topic information item automatically. The rules will not catch all cases of subject duplication, however, and so human intervention is expected to be necessary to ensure consistency in most cases.

Topic information items have the following properties:

[source locators]
This is a set of locator information items representing the topic information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the topic information item by means not described by this specification.
[classes]
This is the set of topic information items representing the classes of which this topic information item is an instance. The set may be empty.
[base names]
This is the set of base name information items belonging to this topic information item.
[occurrences]
This is the set of occurrence information items belonging to this topic information item.
[roles]

This is the set of association role information items representing this topic map information item's involvement in associations.

The topic information item must be set as the value of the [role player] property on each association role information item in this set.

[subject indicators]

This is a set of locator information items referring to the subject indicating resources of the topic information item. The set may be empty.

If any of the locator information items equal one of the locator information items found in the [source locators] property of another information item, the topic information item is considered to reify that information item.

[subject address]
This a locator information item referring to the resource that is the subject of this topic. The property may be null.

Topic information items are not considered equal unless their property values would have violated the uniqueness constraints had they been part of the same topic map information set.

2.3. Base name information items

A base name information item represents a topic name.

Base name information items have the following properties:

[source locators]
This is a set of locator information items representing the base name information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the base name information item by means not described by this specification.
[value]
This is the string that is the base name. It may be empty, but it may not be null.
[scope]
This is the set of topic information items that represent the scope of this base name. The interpretation of this value is dependent on the application. The set may be empty.
[variants]
This is the set of variant information items that represent the variant names of this base name.

Base name information items are considered equal if the values of their [value] and [scope] properties are equal.

2.4. Variant information items

A variant information item represents a variant of a base name as it appears in the set of variant names of that base name.

The properties of variant information items are:

[source locators]
This is a set of locator information items representing the variant information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the variant information item by means not described by this specification.
[value]
This is the string that is the variant name. It can be empty. The value can also be null, provided that the [resource] property is not null. If it is not null the [resource] property must be null.
[resource]
This is a locator information item referring to the resource which is this variant name. It can be null if the [value] property is not null, and it must be null if the [value] property is not null.
[scope]

This is the set of topic information items that represent the scope of this variant name. The interpretation of this value is dependent on the application.

The [scope] property of a variant name must at all times contain a superset of the set in the [scope] property of its basename parent.

Variant information items are considered equal if the values of their [value], [resource], and [scope] properties are equal.

2.5. Occurrence information items

An occurrence information item represents the occurrence relationship between a topic and a resource that is an occurrence of information about the subject of that topic.

The properties of occurrence information items are:

[source locators]
This is a set of locator information items representing the occurrence information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the occurrence information item by means not described by this specification.
[value]
This is the string that is the occurrence resource. It can be empty. The value can also be null, provided that the [resource] property is not null. If it is not null the [resource] property must be null.
[resource]
This is a locator information item referring to the resource which is the occurrence of information about the topic's subject. It can be null if the [value] property is not null, and it must be null if the [value] property is not null.
[scope]
This is the set of topic information items that represent the scope of this occurrence. The interpretation of this value is dependent on the application. The set may be empty.
[class]
This is the topic information item that defines the class of occurrences of which this occurrence information item is an instance.

Occurrence information items are considered equal if the values of their [value], [resource], [scope], and [class] properties are equal.

2.6. Association information items

Association information items represent associations between topic information items. They have the following properties:

[source locators]
This is a set of locator information items representing the association information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the association information item by means not described by this specification.
[scope]
This is the set of topic information items that represent the scope of this association. The interpretation of this value is dependent on the application. The set may be empty.
[class]
The topic that defines the class of associations, of which this association is an instance. The property may be null.
[roles]
The set of association role information items that represent the roles played by the topics that participate in the association. The set may not be empty.

Association information items are considered equal if the values of their [scope], [class], and [roles] properties are equal.

2.7. Association role information items

Association role information items represent the involvement of a participating topic in the association it participates in. The association role information item is the connection point between the topic and the association.

[source locators]
This is a set of locator information items representing the association role information item and used to address it. This set of locator information items may be empty; it may also contain locator information items assigned to the association role information item by means not described by this specification.
[class]
The topic information item that represents the nature of the role playing topic's involvement in the association. The property may be null.
[role player]

The topic information item that participates in the association; the topic that plays the role defined by the [class] property. The property may be null.

The topic information item referenced by this property must have the association role information item in its [roles] property.

Association role information items are considered equal if the values of their [class] and [role player] properties are equal.

2.8. Locator information items

Locator information items represent locators referring to resources, and are the binding point between the topic map model and the world of resources. Locators can use any syntax as long as they abide by the rules of this section.

Locator information items have the following properties:

[notation]

This value is a string that identifies the notation used in the [address] property. If the value is "URI" the [address] property must hold a valid URI, as defined by RFC 2396. If the value is not "URI" the first two characters of the [notation] property must be "X-".

The property may not be null, and it may not be the empty string.

[address]

This value is the address of the resource referred to by this locator information item, represented as a string. The value must be absolute, that is, its meaning must not depend on the context in which it is resolved.

The property may not be null, and it may not be the empty string.

Locator information items are equal if they have the same values in their [notation] and [address] properties.

2.9. Unique value constraints

The information items in a topic map information set are subject to the uniqueness constraints listed below. How violations of these constraints are handled is left to the implementation. Possible solutions are merging, in some cases, and to signal an error and reject the operation that would have caused a violation of the constraints.

3. XTM processing model

This section describes how to build a topic map information set from an XTM 1.0 document. The procedure here described always produces a topic map information set that is fully conforming to the constraints in the data model. The procedure is described in terms of what impact each XML element in the XTM document has on a processing context.

The initial processing context is nothing.

XTM processors operate on representations of the XML Information Set. How processors create such representations from XML source documents is beyond the scope of this specification. The application of transformations such as XSLT stylesheets, architectural forms, and SAX parser filters are expected, but support for this is not required.

Since XTM processors are based on the XML Information Set, they operate on a namespace view of XML documents. This means that the element names given in this specification will only match those in the XML Information Set provided that the value of the [namespace name] property of the element information items match 'http://www.topicmaps.org/xtm/1.0/'.

The attribute name 'xlink:href' used in this specification is a shorthand for the attribute information items having 'http://www.w3.org/1999/xlink' as the value of their [namespace name] property, and 'href' as the value of their [local name] property.

The locator information item of an XML element that has an id attribute value is created by setting the [notation] property to "URI" and the [address] property to the [base uri] of document XML information item in which the element appears, followed by "#" and the id attribute value.

URIs contained in attribute values are turned into locator information items by resolving them relative to the base URI of the XML document they are contained in. A new locator information item is then created with "URI" in its [notation] property, and the absolute URI in its [address] property.

Elements of types not mentioned in this specification are ignored during processing, though processors are allowed to warn about them.

3.1. The <topicMap> element

For each <topicMap> element in the XTM document a new topic map information item is created. This information item is added to the context and remains there as the current topic map information item as long as the children of the <topicMap> element are processed.

A locator information item is created and set as the [base uri] value of the created topic map information item. Its [notation] property is set to "URI", and its [address] property to the value found in the [base uri] property of the XML information set item representing the <topicMap> element.

If the <topicMap> element has an id attribute, a locator information item referring to the element is added to the [source locators] property of the topic map information item.

The [topics] and [associations] properties are initially empty.

3.2. The <topic> element

For each <topic> element in the XTM document a new topic information item is created and added to the context as the current topic information item.

A locator information item referring to the <topic> element is added to the set of [source locators] on the topic information item.

The children of the <topic> element are then processed, with the topic information item as the current topic information item in the context. Once all the children of the element have been processed the topic information item is merged into the topic map information set currently in the context according to the rules of section 5.1.

3.3. The <subjectIdentity> element

The <subjectIdentity> element itself has no effect, except to influence the interpretation of its child elements, as explained below.

If the <subjectIdentity> element has a <resourceRef> child, a locator information item is created for the URI in its 'xlink:href' attribute and set as the value of the current topic information item's [subject address] property.

For each <topicRef> child of the <subjectIdentity> element the a locator information item is created for the URI in the child's 'xlink:href' attribute and added to the [source locators] property of the current topic information item.

For each <subjectIndicatorRef> child, the a locator information item is created for the URI in its 'xlink:href' and added to the [subject indicators] property of the current topic information item.

3.4. The <baseName> element

For each <baseName> element in the XTM document a new base name information item is created and added to the context as the current base name information item.

If the <baseName> element has an 'id' attribute a locator information item referring to element is created and added to the set of [source locators] on the base name information item.

If the <baseName> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new base name information item as the current information item.

The children of the <baseName> element are then processed, with the base name information item as the current base name information item in the context. Once all the children have been processed, the base name information item is added to the [base names] property of the current topic.

3.5. The <baseNameString> element

The content of the <baseNameString> element is set as the value of the [value] property of the current base name information item.

3.6. The <variant> element

Each <variant> element causes a variant information item to be created. If the stack of variant information items is empty the [scope] property of the variant information item is set to the same value as that of the [scope] property of the current base name information item. If the stack is not empty the [scope] property is set to the same value as that of the variant information item currently on the top of the stack. When the [scope] property has been initialized the variant information item is added to the context at the top of the stack of variant name information items.

If the <variant> element has an 'id' attribute, a locator information item referring to the <variant> element is created and added to the [source locators] property of the variant information item.

The children of the <variant> element are then processed. Once all the children have been processed the variant information item is added to the [variants] property of the current base name information item. The variant information item is then removed from the stack of variant information items.

3.7. The <parameters> element

The <parameters> element itself has no effect, except to influence the interpretation of its child elements, as explained below.

For each <topicRef> child of the <parameters> element the topic information item referred to by the element (see section 3.15) is added to the [scope] property of the current variant information item.

For each <subjectIndicatorRef> child of the <parameters> the topic information item referred to by the element (see section 3.16) is added to the [scope] property of the current variant information item.

3.8. The <variantName> element

If the <variantName> element has a <resourceRef> child element, a locator information item is created from the URI in its 'xlink:href' attribute and set as the value of the [resource] property of the current variant information item.

If the <variantName> element has a <resourceData> child element, the content of that element is set as the value of the [value] property on the current variant information item.

3.9. The <occurrence> element

Each <occurrence> element causes an occurrence information item to be created.

If the <occurrence> element has an 'id' attribute, a locator information item referring to the element is created and added to the [source locators] property of the occurrence information item.

If the <occurrence> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new occurrence information item as the current information item.

If the <occurrence> element has an <instanceOf> child, a topic information item is added to the [class] property of the occurrence information item. The topic information item to be added is evaluated according to the rules of section 3.13.

If the <occurrence> element has a <resourceRef> child, a locator information item for the URI in that child's 'xlink:href' attribute is created and set as the value of the [resource] property of the occurrence information item.

If the <occurrence> element has a <resourceData> child, the content of that child is set as the value of the [value] property of the occurrence information item.

3.10. The <association> element

Each <association> element causes an association information item to be created.

If the <association> element has an 'id' attribute, a locator information item referring to the element is added to the [source locators] property of the association information item.

If the <association> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new association information item as the current information item.

If the <association> element has an <instanceOf> child, a topic information item is added to the [class] property of the association information item. The topic information item to be added is evaluated according to the rules of section 3.13.

The association information item is then set as the current association information item while its <member> children are processed. Once the children have been processed the association information item is added to the [associations] property of the topic map information item.

3.11. The <member> element

The effect of the <member> element is to set the current class of the context to null.

If the <member> element has a <roleSpec> child with a <topicRef> child element, the topic information item referred to by that element (see section 3.15) is set as the current class of the context.

If the <member> element has a <roleSpec> child with a <subjectIndicatorRef> child, the topic information item referred to by that element (see section 3.16) is set as the current class of the context.

For each <topicRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.15) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.

For each <subjectIndicatorRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.16) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.

For each <resourceRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.17) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.

3.12. The <mergeMap> element

For each <mergeMap> element the URI in its 'xlink:href' attribute is resolved to an absolute URI relative to the document it appears in. The resource it refers to is processed into a set of topic map information items according to the rules of section 3. [Remark: What if errors in proc?]

For each <topicRef> child of the <mergeMap> element the topic information item it refers (see section 3.15) to is added to the [scope] property of every information item that has such a property in the created topic map information set.

If the URI in the 'xlink:href' of the <mergeMap> element has no fragment identifier all the topic map information items created by processing the resource are merged with the current topic map information item. If the URI has a fragment identifier the topic map information item with a locator information item equivalent to this URI in its [source locators] property is merged with the current topic map information item. [Remark: Error handling!]

3.13. The <instanceOf> element

The <instanceOf> element is used to add a topic information item to the [class] property of several information item types. The definitions of the processing of the parent items of <instanceOf> reference this section to make it clear how the <instanceOf> element is to be interpreted when it appears in each particular context.

If the <instanceOf> element has a <topicRef> child, the topic information item referred to (see section 3.15) is added to the [class] or [classes] property.

If the <instanceOf> element has a <subjectIndicatorRef> child, the topic information item referred (see section 3.16) to is added to the [class] or [classes] property.

3.14. The <scope> element

The <scope> element itself has no effect, except to influence the interpretation of its child elements, as explained below.

For each <topicRef> child of the <scope> element the topic information item referred to by it (see section 3.15) is added to the [scope] property of the current information item.

For each <subjectIndicatorRef> child of the <scope> element the topic information item referred to by it (see section 3.16) is added to the [scope] property of the current information item.

For each <resourceRef> child of the <scope> element the topic information item referred to by it (see section 3.17) is added to the [scope] property of the current information item.

3.15. The <topicRef> element

The <topicRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.

This section only describes how to find the topic information item referred to by the <topicRef> element. How that item is used is defined in each context this section is referred to from.

To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. If the URI has no fragment identifier an error is signaled, and no topic information item is referred to. Once the URI has been evaluated and verified as correct, the following procedure is followed:

  1. If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.

  2. If the URI (minus its fragment part) is equal to the locator information item in the [base locator] property of the topic map information item a new topic information item is created, with the locator information item in its [source locators] property. This topic information item is the one referred to.

  3. If the URI (minus its fragment part) is not equal to the locator information item in the [base locator] property of the topic map information item, the resource it refers to is loaded and processed into a set of topic map information items according to the rules of section 3. [Remark: What if errors in proc? Also avoid loading TMs more than once.].

    If there is a topic information item in one of the created topic map information sets with the locator information item in its [source locators] property, the topic map information set it belongs to is merged with the current one according to the rules of section 5.3. This will then be the topic information item referred to.

    If no such topic information item exists one is created, and the locator information item is added to its [source locators] property. This will then be the topic information item referred to.

3.16. The <subjectIndicatorRef> element

The <subjectIndicatorRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.

This section only describes how to find the topic information item referred to by the <subjectIndicatorRef> element. How that item is used is defined in each context this section is referred to from.

To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. Once the locator information item has been created the following procedure is followed:

  1. If there already exists a topic information item with this locator information item in its [subject indicators] property, that topic information item is the one referred to.

  2. If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.

  3. If none of the above conditions are true, a new topic information item is created, and this locator information item is added to its [subject indicators] property.

3.17. The <resourceRef> element

The <resourceRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.

This section only describes how to find the topic information item referred to by the <resourceRef> element. How that item is used is defined in each context this section is referred to from.

To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. Once the locator information item has been created the following procedure is followed:

  1. If there already exists a topic information item with this locator information item in its [subject address] property, that topic information item is the one referred to.

  2. If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.

  3. If none of the above conditions are true, a new topic information item is created, and this locator information item is added to its [subject address] property.

4. ISO 13250 processing model

[Remark: This should be done the same way as the XTM processing model, but much shorter, since ISO 13250 is much simpler. HOWEVER, supporting HyTime correctly may make this difficult.]

5. Merging topic maps

This section describes how information items are merged. Merging is a basic operation of the topic map processing model, and is applied during processing of serialization syntaxes, during modification of topic map structures, and it be a basic end-user operation in an interactive environment.

5.1. Merging a topic into a topic map

A new topic information item is merged into a topic map information item by following these steps:

5.2. Merging a topic with another topic

A topic information item (A) is merged with another topic information item (B), by following the steps given below. If both A and B have non-null and different values for their [subject address] properties the merge operation fails with an error.

  1. Every occurrence of A in the [scope], [classes], [class], or [role player] property of an information item is replaced by B.

  2. The value of B's [source locators] property is set to the union of A's and B's values.

  3. The value of B's [classes] property is set to the union of A's and B's values.

  4. The value of B's [base names] property is set to the union of A's and B's values. Note that duplicate base names are merged according to the rules of section 5.4.

  5. The value of B's [occurrences] property is set to the union of A's and B's values.

  6. The value of B's [roles] property is set to the union of A's and B's values.

  7. The value of B's [subject indicators] property is set to the union of A's and B's values.

  8. If B's [subject address] property is null, while A's is not, B's [subject address] property is set to the value of A's.

  9. A is now removed from the [topics] property of the topic map information item.

  10. If there now exist any topics in the topic map that are in violation of the uniqueness constraints, those topics are merged according to the rules of this section.

5.3. Merging a topic map with another topic map

A topic map information item (A) is merged into another (B), by following these steps:

  1. Copy all the topic information items in A's [topics] property into B's [topics] property.

  2. Copy all the association information items in A's [associations] property into B's [associations] property.

  3. If any topic map information items violate the uniqueness constraints, they must be merged according to the rules of section 5.2. Merging is applied repeatedly until no topic information items violate the uniqueness constraints.

5.4. Merging base names

A base name information item (A) is merged with another base name information item (B) by following these steps:

  1. B's [source locators] property is set to the union of the values of A and B.

  2. B's [variants] property is set to the union of the values of A and B.

6. Conformance

An XTM processor is conformant with this specification provided that it:

  1. Make available to applications all the information described in section 2. More information may be provided, but nothing may be left out. The information may also be provided in different forms as long as they are equivalent to those here described.

  2. Can build structures representing topic map information sets from XTM documents in a way that produces identical results to those described in section 3. Note that if the XTM document being processed does not conform to the XTM DTD, there is no requirement that processors produce the same result as the algorithm here described; in fact, this specification does not describe what the result is to be at all in this case.

  3. If it can build structures from ISO 13250 topic map documents, it must do so according to the rules in section 4.

  4. If it can merge topic map structures it must do so in a way that produces results identical to those obtained by following the rules of section 5.