Title: | A Topic Map Data Model, An infoset-based proposal |
Source: | Lars Marius Garshol |
Project: | TMQL |
Project editor: | Lars Marius Garshol, Hans Holger Rath |
Status: | Personal contribution |
Action: | |
Date: | 18 June 2001 |
Summary: | |
Distribution: | SC34 and Liaisons |
Refer to: | |
Supercedes: | |
Reply to: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mailk: mailto:[email protected] http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 11 West 42nd Street New York, NY 10036 Tel: +1 212 642 4976 Fax: +1 212 840 2298 E-mail: [email protected] |
|
This document defines an abstract model for topic maps which makes explicit the implicit data models of ISO 13250 and XTM 1.0. It also defines a processing model for XTM 1.0 based on the data model.
The model is intended to present one possible approach to specifying a data and processing model for topic maps, believed by the author to be preferrable to other proposed approaches. It is hoped that this model may represent a first step on the way to a complete model for topic maps. Such a model would serve many purposes:
Enable interoperability between topic map processors by defining precisely what topic map processors are required to do.
Enable ancillary standards to be built on the topic map standard in a precise and controlled manner.
Make it easier for newcomers to topic maps to understand what their abstract structure is and how they work.
This document is not complete; it is an early draft intended to show a possible approach to defining the topic map model. In particular, this document has no official standing whatsoever. It is, as stated above, just a draft proposal.
In particular, more work is required on the following:
The requirements for error handling.
The description of URI operations and equality.
How the model describes the semantics of the topic map information must be carefully considered.
The interaction of reification with the model. To what extent reification is to be described also needs to be considered.
Whether the sorting of strings should be defined in any way.
Alignment with the XTM 1.0 specification and topicmaps.net's processing model.
How locators are handled, in the light of ISO 13250's support for and basis in HyTime.
The processing of ISO 13250 topic map documents.
Steve Pepper provided many useful comments on this document. It is also informed by previous work done by Geir Ove Grønmo and Kal Ahmed.
The copyright in this document is owned by Ontopia AS. It may be redistributed and copied without restriction provided that it remain unchanged. Derivative works may be created freely.
The abstract model for topic maps here presented is inspired by the XML Infoset, and uses a similar system of information items with named and typed properties. This approach has been chosen in order to avoid the problem inherent in any formalism: that the formalism must first be learned.
Some notes on the data types used in this model:
Resources, being external to the topic map itself, are not described in this model. Instead, they are represented by their locators, which are the only aspects of them visible in this model.
Resources are considered equal if their locators are equal.
In this data model strings are sequences of abstract Unicode characters. Processors may, at user option, normalize the strings, but this is not required by this specification.
Strings are considered equal if they consist of the exact same sequence of abstract Unicode characters. This implies that all comparisons are case-sensitive.
Sets are collections of unordered values that contain no values considered to be equal to each other. This implies that attempts to add values already in a set will be ignored.
Two sets are considered equal unless there exists a value in one set for which no equal value can be found in the other.
The null value is used to indicate that properties have no value. It is distinct from the empty string, the empty resource, and the empty set.
Null is only considered equal to itself.
A topic map information set has exactly one topic map information item, which is the hub item of the information set, from which all other parts can be reached. This item represents the entire topic map.
The topic map information item has the following properties:
A topic information item represents a single topic in a topic map. The uniqueness constraint rules in this model are intended to ensure that topic information items whose subjects can automatically be shown to be the same are merged into a single topic information item automatically. The rules will not catch all cases of subject duplication, however, and so human intervention is expected to be necessary to ensure consistency in most cases.
Topic information items have the following properties:
This is the set of association role information items representing this topic map information item's involvement in associations.
The topic information item must be set as the value of the [role player] property on each association role information item in this set.
This is a set of locator information items referring to the subject indicating resources of the topic information item. The set may be empty.
If any of the locator information items equal one of the locator information items found in the [source locators] property of another information item, the topic information item is considered to reify that information item.
Topic information items are not considered equal unless their property values would have violated the uniqueness constraints had they been part of the same topic map information set.
A base name information item represents a topic name.
Base name information items have the following properties:
Base name information items are considered equal if the values of their [value] and [scope] properties are equal.
A variant information item represents a variant of a base name as it appears in the set of variant names of that base name.
The properties of variant information items are:
This is the set of topic information items that represent the scope of this variant name. The interpretation of this value is dependent on the application.
The [scope] property of a variant name must at all times contain a superset of the set in the [scope] property of its basename parent.
Variant information items are considered equal if the values of their [value], [resource], and [scope] properties are equal.
An occurrence information item represents the occurrence relationship between a topic and a resource that is an occurrence of information about the subject of that topic.
The properties of occurrence information items are:
Occurrence information items are considered equal if the values of their [value], [resource], [scope], and [class] properties are equal.
Association information items represent associations between topic information items. They have the following properties:
Association information items are considered equal if the values of their [scope], [class], and [roles] properties are equal.
Association role information items represent the involvement of a participating topic in the association it participates in. The association role information item is the connection point between the topic and the association.
The topic information item that participates in the association; the topic that plays the role defined by the [class] property. The property may be null.
The topic information item referenced by this property must have the association role information item in its [roles] property.
Association role information items are considered equal if the values of their [class] and [role player] properties are equal.
Locator information items represent locators referring to resources, and are the binding point between the topic map model and the world of resources. Locators can use any syntax as long as they abide by the rules of this section.
Locator information items have the following properties:
This value is a string that identifies the notation used in the [address] property. If the value is "URI" the [address] property must hold a valid URI, as defined by RFC 2396. If the value is not "URI" the first two characters of the [notation] property must be "X-".
The property may not be null, and it may not be the empty string.
This value is the address of the resource referred to by this locator information item, represented as a string. The value must be absolute, that is, its meaning must not depend on the context in which it is resolved.
The property may not be null, and it may not be the empty string.
Locator information items are equal if they have the same values in their [notation] and [address] properties.
The information items in a topic map information set are subject to the uniqueness constraints listed below. How violations of these constraints are handled is left to the implementation. Possible solutions are merging, in some cases, and to signal an error and reject the operation that would have caused a violation of the constraints.
No two information items within the same topic map information set may contain the same locator information item in their [source locators] property.
No two topic information items may have the same locator information item in their [subject indicators] property.
No two topic information items may have the same locator information item as the value of their [subject address] property.
A topic information item may not contain in its [source locators] property the same locator information item that can be found in the [subject indicators] property of another.
A topic information item may not contain in its [source locators] property the same locator information item that can be found in the [subject address] property of another.
Two topic information items may not contain equal base name information items in their [base names] property. (This constraint is also known as the topic naming constraint.)
This section describes how to build a topic map information set from an XTM 1.0 document. The procedure here described always produces a topic map information set that is fully conforming to the constraints in the data model. The procedure is described in terms of what impact each XML element in the XTM document has on a processing context.
The initial processing context is nothing.
XTM processors operate on representations of the XML Information Set. How processors create such representations from XML source documents is beyond the scope of this specification. The application of transformations such as XSLT stylesheets, architectural forms, and SAX parser filters are expected, but support for this is not required.
Since XTM processors are based on the XML Information Set, they operate on a namespace view of XML documents. This means that the element names given in this specification will only match those in the XML Information Set provided that the value of the [namespace name] property of the element information items match 'http://www.topicmaps.org/xtm/1.0/'.
The attribute name 'xlink:href' used in this specification is a shorthand for the attribute information items having 'http://www.w3.org/1999/xlink' as the value of their [namespace name] property, and 'href' as the value of their [local name] property.
The locator information item of an XML element that has an id attribute value is created by setting the [notation] property to "URI" and the [address] property to the [base uri] of document XML information item in which the element appears, followed by "#" and the id attribute value.
URIs contained in attribute values are turned into locator information items by resolving them relative to the base URI of the XML document they are contained in. A new locator information item is then created with "URI" in its [notation] property, and the absolute URI in its [address] property.
Elements of types not mentioned in this specification are ignored during processing, though processors are allowed to warn about them.
For each <topicMap> element in the XTM document a new topic map information item is created. This information item is added to the context and remains there as the current topic map information item as long as the children of the <topicMap> element are processed.
A locator information item is created and set as the [base uri] value of the created topic map information item. Its [notation] property is set to "URI", and its [address] property to the value found in the [base uri] property of the XML information set item representing the <topicMap> element.
If the <topicMap> element has an id attribute, a locator information item referring to the element is added to the [source locators] property of the topic map information item.
The [topics] and [associations] properties are initially empty.
For each <topic> element in the XTM document a new topic information item is created and added to the context as the current topic information item.
A locator information item referring to the <topic> element is added to the set of [source locators] on the topic information item.
The children of the <topic> element are then processed, with the topic information item as the current topic information item in the context. Once all the children of the element have been processed the topic information item is merged into the topic map information set currently in the context according to the rules of section 5.1.
The <subjectIdentity> element itself has no effect, except to influence the interpretation of its child elements, as explained below.
If the <subjectIdentity> element has a <resourceRef> child, a locator information item is created for the URI in its 'xlink:href' attribute and set as the value of the current topic information item's [subject address] property.
For each <topicRef> child of the <subjectIdentity> element the a locator information item is created for the URI in the child's 'xlink:href' attribute and added to the [source locators] property of the current topic information item.
For each <subjectIndicatorRef> child, the a locator information item is created for the URI in its 'xlink:href' and added to the [subject indicators] property of the current topic information item.
For each <baseName> element in the XTM document a new base name information item is created and added to the context as the current base name information item.
If the <baseName> element has an 'id' attribute a locator information item referring to element is created and added to the set of [source locators] on the base name information item.
If the <baseName> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new base name information item as the current information item.
The children of the <baseName> element are then processed, with the base name information item as the current base name information item in the context. Once all the children have been processed, the base name information item is added to the [base names] property of the current topic.
The content of the <baseNameString> element is set as the value of the [value] property of the current base name information item.
Each <variant> element causes a variant information item to be created. If the stack of variant information items is empty the [scope] property of the variant information item is set to the same value as that of the [scope] property of the current base name information item. If the stack is not empty the [scope] property is set to the same value as that of the variant information item currently on the top of the stack. When the [scope] property has been initialized the variant information item is added to the context at the top of the stack of variant name information items.
If the <variant> element has an 'id' attribute, a locator information item referring to the <variant> element is created and added to the [source locators] property of the variant information item.
The children of the <variant> element are then processed. Once all the children have been processed the variant information item is added to the [variants] property of the current base name information item. The variant information item is then removed from the stack of variant information items.
The <parameters> element itself has no effect, except to influence the interpretation of its child elements, as explained below.
For each <topicRef> child of the <parameters> element the topic information item referred to by the element (see section 3.15) is added to the [scope] property of the current variant information item.
For each <subjectIndicatorRef> child of the <parameters> the topic information item referred to by the element (see section 3.16) is added to the [scope] property of the current variant information item.
If the <variantName> element has a <resourceRef> child element, a locator information item is created from the URI in its 'xlink:href' attribute and set as the value of the [resource] property of the current variant information item.
If the <variantName> element has a <resourceData> child element, the content of that element is set as the value of the [value] property on the current variant information item.
Each <occurrence> element causes an occurrence information item to be created.
If the <occurrence> element has an 'id' attribute, a locator information item referring to the element is created and added to the [source locators] property of the occurrence information item.
If the <occurrence> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new occurrence information item as the current information item.
If the <occurrence> element has an <instanceOf> child, a topic information item is added to the [class] property of the occurrence information item. The topic information item to be added is evaluated according to the rules of section 3.13.
If the <occurrence> element has a <resourceRef> child, a locator information item for the URI in that child's 'xlink:href' attribute is created and set as the value of the [resource] property of the occurrence information item.
If the <occurrence> element has a <resourceData> child, the content of that child is set as the value of the [value] property of the occurrence information item.
Each <association> element causes an association information item to be created.
If the <association> element has an 'id' attribute, a locator information item referring to the element is added to the [source locators] property of the association information item.
If the <association> element has a <scope> child element, that element is processed according to the rules of section 3.14, with the new association information item as the current information item.
If the <association> element has an <instanceOf> child, a topic information item is added to the [class] property of the association information item. The topic information item to be added is evaluated according to the rules of section 3.13.
The association information item is then set as the current association information item while its <member> children are processed. Once the children have been processed the association information item is added to the [associations] property of the topic map information item.
The effect of the <member> element is to set the current class of the context to null.
If the <member> element has a <roleSpec> child with a <topicRef> child element, the topic information item referred to by that element (see section 3.15) is set as the current class of the context.
If the <member> element has a <roleSpec> child with a <subjectIndicatorRef> child, the topic information item referred to by that element (see section 3.16) is set as the current class of the context.
For each <topicRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.15) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.
For each <subjectIndicatorRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.16) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.
For each <resourceRef> child of the <member> element a new association role information item is created, with the topic information item referred to by that element (see section 3.17) as the value of its [role player] property. If there is a current class in the context, that topic information item is set as the value of the item's [class] property. The association role information item is added to the [roles] property of the topic information item referred to. The item is also added to the [roles] property of the current association information item.
For each <mergeMap> element the URI in its 'xlink:href' attribute is resolved to an absolute URI relative to the document it appears in. The resource it refers to is processed into a set of topic map information items according to the rules of section 3. [Remark: What if errors in proc?]
For each <topicRef> child of the <mergeMap> element the topic information item it refers (see section 3.15) to is added to the [scope] property of every information item that has such a property in the created topic map information set.
If the URI in the 'xlink:href' of the <mergeMap> element has no fragment identifier all the topic map information items created by processing the resource are merged with the current topic map information item. If the URI has a fragment identifier the topic map information item with a locator information item equivalent to this URI in its [source locators] property is merged with the current topic map information item. [Remark: Error handling!]
The <instanceOf> element is used to add a topic information item to the [class] property of several information item types. The definitions of the processing of the parent items of <instanceOf> reference this section to make it clear how the <instanceOf> element is to be interpreted when it appears in each particular context.
If the <instanceOf> element has a <topicRef> child, the topic information item referred to (see section 3.15) is added to the [class] or [classes] property.
If the <instanceOf> element has a <subjectIndicatorRef> child, the topic information item referred (see section 3.16) to is added to the [class] or [classes] property.
The <scope> element itself has no effect, except to influence the interpretation of its child elements, as explained below.
For each <topicRef> child of the <scope> element the topic information item referred to by it (see section 3.15) is added to the [scope] property of the current information item.
For each <subjectIndicatorRef> child of the <scope> element the topic information item referred to by it (see section 3.16) is added to the [scope] property of the current information item.
For each <resourceRef> child of the <scope> element the topic information item referred to by it (see section 3.17) is added to the [scope] property of the current information item.
The <topicRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.
This section only describes how to find the topic information item referred to by the <topicRef> element. How that item is used is defined in each context this section is referred to from.
To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. If the URI has no fragment identifier an error is signaled, and no topic information item is referred to. Once the URI has been evaluated and verified as correct, the following procedure is followed:
If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.
If the URI (minus its fragment part) is equal to the locator information item in the [base locator] property of the topic map information item a new topic information item is created, with the locator information item in its [source locators] property. This topic information item is the one referred to.
If the URI (minus its fragment part) is not equal to the locator information item in the [base locator] property of the topic map information item, the resource it refers to is loaded and processed into a set of topic map information items according to the rules of section 3. [Remark: What if errors in proc? Also avoid loading TMs more than once.].
If there is a topic information item in one of the created topic map information sets with the locator information item in its [source locators] property, the topic map information set it belongs to is merged with the current one according to the rules of section 5.3. This will then be the topic information item referred to.
If no such topic information item exists one is created, and the locator information item is added to its [source locators] property. This will then be the topic information item referred to.
The <subjectIndicatorRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.
This section only describes how to find the topic information item referred to by the <subjectIndicatorRef> element. How that item is used is defined in each context this section is referred to from.
To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. Once the locator information item has been created the following procedure is followed:
If there already exists a topic information item with this locator information item in its [subject indicators] property, that topic information item is the one referred to.
If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.
If none of the above conditions are true, a new topic information item is created, and this locator information item is added to its [subject indicators] property.
The <resourceRef> element is used in many different contexts throughout the XTM DTD. This section is referred to from the definitions of how to process some of these contexts, in order to keep the size of this specification down. In some cases other rules apply, and in those cases this section is not referenced.
This section only describes how to find the topic information item referred to by the <resourceRef> element. How that item is used is defined in each context this section is referred to from.
To find the topic information item, a locator information item representing the URI in the 'xlink:href' attribute must first be created. Once the locator information item has been created the following procedure is followed:
If there already exists a topic information item with this locator information item in its [subject address] property, that topic information item is the one referred to.
If there already exists a topic information item with this locator information item in its [source locators] property, that topic information item is the one referred to.
If none of the above conditions are true, a new topic information item is created, and this locator information item is added to its [subject address] property.
[Remark: This should be done the same way as the XTM processing model, but much shorter, since ISO 13250 is much simpler. HOWEVER, supporting HyTime correctly may make this difficult.]
This section describes how information items are merged. Merging is a basic operation of the topic map processing model, and is applied during processing of serialization syntaxes, during modification of topic map structures, and it be a basic end-user operation in an interactive environment.
A new topic information item is merged into a topic map information item by following these steps:
If there exists other topic information items in the target topic map information item which, together with the new topic information item, violates the uniqueness constraints, these topic information items are merged together according to the rules of section 5.2. The order in which they are merged is immaterial.
If no such topic information items exist, the new topic information item is added to the [topics] property of the target topic map information item.
A topic information item (A) is merged with another topic information item (B), by following the steps given below. If both A and B have non-null and different values for their [subject address] properties the merge operation fails with an error.
Every occurrence of A in the [scope], [classes], [class], or [role player] property of an information item is replaced by B.
The value of B's [source locators] property is set to the union of A's and B's values.
The value of B's [classes] property is set to the union of A's and B's values.
The value of B's [base names] property is set to the union of A's and B's values. Note that duplicate base names are merged according to the rules of section 5.4.
The value of B's [occurrences] property is set to the union of A's and B's values.
The value of B's [roles] property is set to the union of A's and B's values.
The value of B's [subject indicators] property is set to the union of A's and B's values.
If B's [subject address] property is null, while A's is not, B's [subject address] property is set to the value of A's.
A is now removed from the [topics] property of the topic map information item.
If there now exist any topics in the topic map that are in violation of the uniqueness constraints, those topics are merged according to the rules of this section.
A topic map information item (A) is merged into another (B), by following these steps:
Copy all the topic information items in A's [topics] property into B's [topics] property.
Copy all the association information items in A's [associations] property into B's [associations] property.
If any topic map information items violate the uniqueness constraints, they must be merged according to the rules of section 5.2. Merging is applied repeatedly until no topic information items violate the uniqueness constraints.
A base name information item (A) is merged with another base name information item (B) by following these steps:
B's [source locators] property is set to the union of the values of A and B.
B's [variants] property is set to the union of the values of A and B.
An XTM processor is conformant with this specification provided that it:
Make available to applications all the information described in section 2. More information may be provided, but nothing may be left out. The information may also be provided in different forms as long as they are equivalent to those here described.
Can build structures representing topic map information sets from XTM documents in a way that produces identical results to those described in section 3. Note that if the XTM document being processed does not conform to the XTM DTD, there is no requirement that processors produce the same result as the algorithm here described; in fact, this specification does not describe what the result is to be at all in this case.
If it can build structures from ISO 13250 topic map documents, it must do so according to the rules in section 4.
If it can merge topic map structures it must do so in a way that produces results identical to those obtained by following the rules of section 5.