ISO/IEC JTC 1/SC34 N0356

ISO/IEC

ISO/IEC JTC 1/SC34

Information Technology —

Document Description and Processing Languages

Title: The Standard Application Model for Topic Maps
Source:Lars Marius Garshol, Graham Moore, JTC1/SC34
Project:ISO 13250
Project editor:Steven R. Newcomb, Michel Biezunski, Martin Bryan
Status:Editor's draft
Action:For review and comment
Date:2002-12-04
Summary:
Distribution:SC34 and Liaisons
Refer to:ISO/IEC JTC 1/SC34 N0329, 2002-07-26
Supercedes:ISO/IEC JTC 1/SC34 N0329, 2002-07-26
Reply to:Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:[email protected]
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642-4937
Fax: +1 212 840-2298
E-mail: [email protected]

The Standard Application Model for Topic Maps

Editor's draft 04 12 2002

This version:
ISO/IEC JTC 1/SC34 N0356
Previous versions:
ISO/IEC JTC 1/SC34 N0329, 2002-07-26
ISO/IEC JTC 1/SC34 N0299, 2002-05-02
ISO/IEC JTC 1/SC34 N0229, 2001-06-18
Editors:
Lars Marius Garshol , Ontopia <[email protected]>
Graham Moore , Empolis <[email protected]>

NOTICE:

Editor's working copy! This document has no official status.

Abstract

This document defines the structure and interpretation of topic maps by defining the semantics of topic map constructs using prose, and the structure of such constructs using a semi-formal data model.

Together with the Reference Model specification and the HyTM syntax specification this document will supersede [ISO13250]. Together with the XTM syntax specification this document will supersede [XTM]. It is intended to become part of the new ISO 13250 standard. For more information on this process, see [tm-guide].

This is $Revision: 1.34 $.

Table of Contents

1 Introduction
2 The metamodel
    2.1 The basic types
    2.2 Constraints
3 Information item types
    3.1 Locator items
    3.2 Source locators
    3.3 The topic map item
    3.4 Topic items
        3.4.1 Identifying subjects
        3.4.2 Topic characteristics
        3.4.3 Scope
        3.4.4 Reification
        3.4.5 Properties
    3.5 Topic name items
    3.6 Variant items
    3.7 Occurrence items
    3.8 Association items
    3.9 Association role items
4 Merging
    4.1 Merging topics
    4.2 Merging base names
    4.3 Merging topic maps
    4.4 Merging variant items
    4.5 Merging occurrence items
    4.6 Merging association items
    4.7 Merging association role items
    4.8 Merging locator items
5 Published subjects
    5.1 The type-instance relationship
    5.2 The supertype-subtype relationship
    5.3 Variant name scopes
6 Conformance

Appendices

A References
B Resolved issues (Non-Normative)


1 Introduction

Topic maps are abstract structures that can encode knowledge about a domain and connect this encoded knowledge to information resources that are considered relevant to the domain. Topic maps are organized around topics, which represent subjects of discourse, associations representing relationships between the subjects, and occurrences, which connect the subjects to pertinent information resources.

Topic maps may be represented in many ways: using topic map syntaxes in files, stored in databases, as internal data structures in running programs, and even mentally in the minds of humans. All these forms are different ways of representing the same abstract structure, and it is this structure that is defined in this document, in the form of a data model.

This international standard requires that topic map implementations must have internal representations of topic maps that have a clearly documented correspondence to the model defined in this document. This international standard also defines a number of structural constraints and operations on instances of the model, to which its implementations must conform.

The process of exporting topic maps from an implementation's internal representation to an instance of a topic map syntax is known as serialization. The opposite process, that of building such a representation from information encoded using a topic map syntax, is known as deserialization. Conforming specifications of topic map syntaxes must define these processes in terms of the model specified in this specification, although they are not required to define both operations. A syntax may be conforming even if it does not represent all the information in this model.

A topic map processor is any module or system that can process topic maps in conformance with this standard. It is assumed that the topic map processor does its work on behalf of another module known as the topic map application. This international standard assumes that a topic map processor will do deserialization on behalf of the application, and that the processor will manage the topic maps on behalf of the application.

Issue (container-props):

Distinguish between properties which have containment semantics, and those which are references?

Issue (scope-extension):

The scope of ISO 13250 is currently restricted to only defining the issues related to the interchange of topic maps. Should that scope be extended so that the standard can also cover application-internal issues?

Issue (infinite-subject-spaces):

How should values from infinite subject spaces be represented in topic maps?

Issue (term-application):

Do we need to define the term 'application' formally?

Ed. Note:(larsga)
Go through XTM 1.0: ensure consistency, and make notes about divergences. Ditto for ISO 13250:2000.

Ed. Note:(larsga)
Should we add more explanatory text with examples etc? Something along the lines of the "Gentle intro" of XTM 1.0?

Issue (def-terms):

There should be a clear mapping between all ISO 13250 and XTM 1.0 defined terms with terms in SAM.

Ed. Note:
Must define the term 'statement', or ditch it altogether. Also have a look at 'assignment'.

Ed. Note:
Rewrite this document in the correct style for an ISO standard.

2 The metamodel

The metamodel used in this document is the same as that used by the XML Information Set [infoset]. A topic map's information set consists of a number of information items, which are abstract representations of some part of the topic map. Every information item is an instance of some information item type, which specifies a number of named properties which the information item must have. Throughout this international standard the term "information item" refers to information items in general, while information items of particular types are referred to as "topic items", "base name items", and so on.

The names of these properties are written in square brackets: [property name], following the convention used in [infoset]. Every property has an associated type that constrains what values it may have. The values of the information item's properties constitute the information recorded about that part of the topic map.

Certain properties in the model are specified as computed properties, which means that they are specified in terms of how their values may be produced from other properties in the model. This reflects that while the properties are conceptually present they are strictly redundant.

All types defined in this international standard, whether basic types or information item types, have a well-defined test of equality. This equality test is used to avoid duplicate values in properties of type set throughout the model. Information items have identity, independent of their values, so items can be compared both by identity and by value.

UML diagrams [UML] are used in addition to the infoset formalism for purposes of illustration. These diagrams are purely informative, and in cases of discrepancy between the diagrams and normative prose, the prose is definitive.

Ed. Note:
Make UML contain computed properties as well.

2.1 The basic types

The properties of information items may not only contain other information items, but also values of the following types. These three types are known as the basic types.

String

Strings are sequences of abstract Unicode characters [unicode].

Strings are equal if they consist of the exact same sequence of abstract Unicode characters. This implies that all comparisons are case-sensitive, and that Unicode normalization is used.

Issue (string-normalization):

The text as written implies that processors must use a Unicode normalization form, without requiring any particular one. The Web Character Model requires Normalization Form C, as does the current XML 1.1 Working Draft. Requiring normalization improves string comparison, but imposes a possibly unwelcome burden on implementors.

Issue (string-bidi):

For full internationalization it is necessary to support bidirectional text in names and occurrences. This requires that certain kinds of information be provided about the text.

Set

Sets are collections of zero or more unordered elements that contain no elements that are equal to each other. Attempts to add a new element that is equal to one already in the set will not cause the set to change; instead the new element must be merged with the equal element already in the set, following the rules for merging information items of their particular type (see section 4 Merging). In topic map information sets, the elements of a set are always information items.

Two sets are equal unless there exists an element in one set for which no equal element can be found in the other.

Null

Null is used to indicate that properties have no value; it does not necessarily indicate that the value of the property is unknown. In this model null can never be contained in a set.

Null is distinct from all other values (including the empty set and the empty string); it is only equal to itself.

2.2 Constraints

The model defined in this international standard contains not only basic types and information item types with named properties, but also constraints on the allowed instances of the model. The purpose of these constraints is to disallow uses of the model that are inconsistent. These constraints are only those which can be applied independent of any knowledge of the topic map's domain, so an instance of this model may conform to these constraints and still be inconsistent.

This international standard requires topic map processors to be able to detect violations of the constraints marked as 'SAM constraints' on behalf of topic map application. This international standard does not concern itself with when or how this detection is done, nor with how violations are reported.

3 Information item types

The Standard Application Model for topic maps is defined as the set of information item types and properties that is presented in this section.

3.1 Locator items

An information resource is a resource that can be represented as a sequence of bytes, and thus could potentially be retrieved over a network. Topic maps can refer to information resources external to themselves in order to make statements about them. These information resources are not part of the topic map; they are only referenced from it.

A locator is a string that references one or more information resources. Locators are always expressed in some notation, which defines their formal syntax and interpretation. The definition of locator notations is outside the scope of this specification.

Issue (locator-reference):

Must locators really refer to information resources? Some URN schemes allow resources that are not information resources to be addressed. This affects the definitions of "information resource", "locator", as well as the [subject identifiers] and [subject address] properties.

Issue (locator-normalization):

If a locator syntax allows equivalent locators to be given different syntactical expressions normalization must be applied in order to take this into account. Where should the text that sets out this requirement go? Does it belong in this document, or in the syntax specifications?

In instances of this model locator items represent locators. Locator items have the following properties:

  1. [notation]: A non-empty string. The string is the name of the notation used by this locator. If the string is "URI" the notation is that described in [RFC2396]. If not, the two first characters of the string must be "X-"; all values that do not begin with "X-" are reserved.

    Issue (prop-notation-interp):

    Is it likely that the term "IRI" will replace "URI" in the foreseeable future? Does there need to be well-defined mechanism for adding new possible values for the [notation] property?

    Ed. Note:
    HyTime locators must also have a defined notation.

  2. [reference]: A non-empty string. The string is the locator, whose interpretation and syntax is governed by the value of the [notation] property.

Locator information items are equal if they have the same values in their [notation] and [reference] properties.

3.2 Source locators

The source locators of an information item is a set of locators that may be used to refer to the item. When a topic map information set is created through deserialization from some topic map syntax source locators are created that point back to the syntactical constructs that gave rise to the information items in the information set. In these cases the source locators will point to the minimal syntactical construct of origin, which means that for topic items created from the XTM syntax, for example, the source locator will point to the originating topic element, rather than the containing topicMap element.

This international standard does not specify how and when source locators are assigned to information items, but leaves this to the deserialization specifications for each syntax. For topic maps not created by deserialization from a syntax it is not required that any source locators be assigned. Information items may also have source locators assigned to them by means not constrained by this specification, in order that they may be referred to.

Source locators are used in this specification to define reification, in the syntax specifications to ensure that when information is deserialized from different information resources cross references to topic map constructs are correctly interpreted. Other parts of this international standard will use source locators to define mechanisms for referencing topic map constructs.

SAM Constraint: Duplicate source locators

It is an error for two different information items to have locator items that are equal in their [source locators] properties, unless they are topic items. If they are topic items they must be merged according to the procedure in 4.1 Merging topics.

Issue (prop-srcloc-interchange):

None of the interchange syntaxes allow source locators to be interchanged. Is interchange of source locators a desired feature? Do the syntaxes need to be extended to accomodate source locators?

3.3 The topic map item

A topic map is a set of topics and associations, which may be represented in many forms. Its purpose is to convey information about subjects through the assignment of characteristics to topics representing those subjects. The topic map itself has no meaning or significance beyond its use as a container for the information about those subjects; in particular, the topic map does not represent anything but itself.

The topic map item may be reified, however, in order to make statements about the topic map, that is, the collection of topics and associations, as a whole. These statements may for example provide traditional metadata such as author, version, copyright, or they may reference system metadata such as a schema for the topic map, external documentation of it, and so on.

There is exactly one topic map item in each information set, and all information in the set is available from the properties of that item. Every topic map item represents a single topic map.

Topic map items have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the topic map.

  2. [topics]: A set of topic items. This is the set of all the topics in the topic map.

  3. [associations]: A set of association items. This is the set of all the associations in the topic map.

  4. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

Issue (prop-reifier-computed):

Is it acceptable for the [reified] property to be computed, or must it be a fundamental property?

Issue (reifier-merging):

What happens if two different topic items reify the same item? Should they be merged?

Two topic map items are equal if the values of their [topics] and [associations] properties are equal. Note that since two topic map items cannot appear in the same topic map information set this rule is only used to verify conformance to a syntax specification.

SAM Constraint: Single topic map

Every instance of the Standard Application Model must contain exactly one topic map item.

Issue (prop-base-locator):

Is a base locator property on the topic map item needed by other specifications?

3.4 Topic items

A subject is anything that has identity. In the most generic sense, a subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. In particular, it is anything on which the creator of a topic map chooses to discourse.

Examples of subjects for which topics may be created are:

  • The moon.

  • The Soviet Union. This subject no longer exists as an organizational unit, but the idea still exists, and so is still a subject.

  • The letters 'A', 'B', 'C', and 'D'. This is a single subject, a set with four elements.

Issue (subject-vs-resource):

Should the standard state outright that "subject" and "resource" (as per RFC 2396) are the same thing? (Quote: "A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.")

A topic is a symbol used within a topic map to represent some subject, about which the creator of the topic map wishes to make statements. Topics are proxies for the subjects they represent in order to allow statements to be made about the subjects through the assignment of characteristics to the topics that represent them.

Every topic represents one, and only one, subject. The process of merging ensures that whenever two topics are known to represent the same subject they are merged. It may well be, however, that two topics may represent the same subject without this being detectable by the rules of this standard. Applications and users are therefore free to merge topics so long as it is not clear that they represent different subjects.

Issue (subject-identity-establish):

ISO 13250 states that subject identity may be "inferred from the topic's characteristics." Does SAM need words to the same effect?

3.4.1 Identifying subjects

A subject indicator is an information resource that is referred to from a topic map in an attempt to unambiguously identify the subject of a topic to a human being. Any information resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.

A subject identifier is a locator that refers to a subject indicator. Topic maps contain only subject identifiers, and consequently it is the subject identifier that is the basis for merging.

A subject address is a locator that refers to the information resource that is the subject of a topic. The topic thus represents that particular information resource. Different locators are considered to address different information resources. If a topic item has a subject address it is assumed that the topic represents the information resource the subject address refers to.

Ed. Note:(larsga)
Add examples to make this more understandable.

Issue (term-subject-indicator-def):

If the subject identifier is a locator that does not refer to an information resource, what is the subject indicator then? This also applies to the subject address.

Issue (term-subject-address-def):

At what level of interpretation does the topic represent the resource? Does it represent that storage location? The stream of bytes? The stream of bytes interpreted in some particular way? The standard must either leave the details open or clarify this. Note that it may be impossible to clarify when the interpretation of locators is left undefined.

Merging in topic maps is defined in terms of subject identifiers and subject addresses.

Issue (topic-naming-constraint):

Should the standard retain the topic naming constraint?

3.4.2 Topic characteristics

Topic names, occurrences, and association roles are collectively known as topic characteristics, as they are the only characteristics topics may have in a topic map. A topic characteristic assignment is the statement that a certain topic characteristic belongs to a certan topic. In the information set this is represented by the inclusion of an information item representing a topic characteristic in the value of a property of a topic item. Any topic characteristic assignment constitutes a statement about the subject represented by the topic.

The properties of topic items that do not represent topic characteristics are not statements about the subject; they are statements about the topic. As such they are part of the topic map machinery, rather than statements about the subject represented in the topic map.

Issue (term-topic-characteristic-reified):

Does the thing reified by a topic count as a characteristic of the topic? It is the subject of the topic, so the question is perhaps whether we are interested in the characteristics of topics or subjects.

3.4.3 Scope

All topic characteristic assignments have a scope, which defines the context within which the assignment is valid. Outside the context represented by the scope the assignment is not known to be valid. Formally, a scope is composed of a set of subjects that together define the context. That is, the topic characteristic is known to be valid only in contexts where all the subjects in the scope apply.

Issue (term-scope-def):

This definition of scope is different from that of XTM 1.0 and ISO 13250:2000, in that it explicitly says topic characteristic assignments are valid for each of the subjects in its scope individually. Is that acceptable?

If the scope of a topic characteristic assignment is the empty set the statement is considered to have unlimited validity, and the assignment is said to be in the unconstrained scope.

Issue (scope-unconstrained-rep):

How should the unconstrained scope be represented?

Precisely how a subject defines a context is not defined by this standard, but left for those creating topic maps to define as part of the definition of their subjects.

Examples of the use of scope are given below:

  • The term "Suomi" is the name of Finland in the context of Finnish. This corresponds to assigning the base name "Suomi" to a topic representing Finland, and giving it as scope a topic representing Finnish.

  • According to Norman Davies World War II started on June 6, 1937 [Davies]. This corresponds to creating a topic representing WWII, and assigning to it the string "June 6, 1937" as an occurrence of type "start date", and giving this occurrence as scope a topic representing the person Norman Davies.

  • According to Peter T. Daniels, the Devanagari script is an instance of the script type "abugida," whereas according to William Bright it is an "alphasyllabary". This corresponds to having two "class-instance" associations, each scoped with a topic representing the relevant authority.

Issue (prop-scope-structure):

Should it become possible for the scopes of topic characteristic assignments to have internal structure?

Ed. Note:
This section needs to be reconsidered, then rewritten.

3.4.4 Reification

Every topic represents one subject, and the relationship between the two is always one of representation. However, the term reification is used for situations where the subject represented by the topic is part of the topic map.

In many cases it is desirable to be able to attach additional information to topic map constructs such as topic names or associations. One may want to give an association occurrences, or to associate a topic name with certain topics. The basic topic map model does not allow this, but through reification this can be done by creating a topic that reifies the topic map construct. The necessary information can then be attached to the reifying topic, since the fact that the topic represents the topic map construct is detectable by software.

Reification is done by giving the reifying topic a subject identifier that refers to the topic map construct that is being reified. In model terms, this means that if an information item has a source locator item that is equal to one of the items in the [subject identifiers] property of a topic, that topic item reifies the information item.

Note that one topic cannot reify another. To make one topic the subject indicator of another implies that the two topics represent the same subject, and they will therefore be merged, and thus become a single topic.

Issue (reification-effects):

If you reify a topic name, does that affect your allowed type?

3.4.5 Properties

Topic items represent topics, and have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the topic.

  2. [topic names]: A set of topic name items. This is the set of topic names assigned to this topic.

  3. [occurrences]: A set of occurrence items. This is the set of occurrences assigned to this topic.

  4. [roles played]: A set of association role items. This is the set of association roles played by this topic.

    The value of this property can be computed as follows: it is the set of all association role items whose [role playing topic] property contains this topic item.

  5. [subject identifiers]: A set of locator items. The locator items refer to the subject indicators of this topic.

  6. [subject address]: A locator information item, or null. The locator, if present, refers to the information resource that is the subject of this topic.

    Issue (prop-subj-address-values):

    Should this property only accept a single value?

    Issue (prop-subj-address-class):

    Are topics representing information resources allowed as types?

    Issue (prop-subj-address-scope):

    Are topics representing resources allowed as themes in scopes?

    Issue (strings-as-subjects):

    Should it be possible to create topics that represent strings, and for it to be formally clear that these topics do represent particular strings? If so, how?

  7. [reified]: an information item, or null. This is the information item reified by this topic item; that is, the topic map construct that is the subject of this topic.

    The value of this property can be computed as follows: if any information item has in its [source locators] property a locator item equal to one in the [subject identifiers] property of this topic item, that information item is the value of the [reified] property. If no such information item is found the value is null.

SAM Constraint: Single reified

The computation that produces the value of the [reified] property must yield a single information item, as topics are required to have only one subject.

SAM Constraint: Source locator and subject identifier namespace

Topic items may not have the same locator item both in its [source locators] and [subject identifiers] property.

Two topic items are equal if they, when present in the same information set, are required to be merged, according to the rules of 4.1 Merging topics. If not they are not equal.

Issue (prop-reifier-topic):

Should topic items have a [reifier] property? Should it be possible to reify topics? If so, how?

Ed. Note:
How do we tell information sets apart?

3.5 Topic name items

A base name is a name or label for a subject, expressed as a string. That is, it is something that identifies the subject (though not necessarily uniquely) and can be used as a label for the subject in user interfaces. The notion of a base name corresponds closely to the common sense notion of a name. Suitable base names for people, countries, and organizations are their names, while base names for documents, musical works, and movies might be their titles. Base names may have variant names, which are alternative forms of the base name that may be more appropriate in specific contexts.

Issue (term-base-name-def):

The definitions of 'base name' and basename.[value] are too naïve in the presence of the TNC. If we are to have the TNC they must change.

Base names always have a scope, which defines in what contexts the topic name is an appropriate label for the subject. A subject may have any number of base names, and the only basis for choosing which base name(s) to use in any given situation is their scope.

A topic name is a base name together with its associated variant names. It is the topic name which is a topic characteristic; the base name and variant names are only parts of the topic name characteristic.

Topic name items represent topic names, and have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the topic name.

  2. [value]: A string. This string is the base name.

    Issue (prop-value):

    Is [label] a better name?

  3. [scope]: A set of topic items. This set is the scope that represents the validity of this base name as a label for the subject.

  4. [variants]: A set of variant items. This set contains the variant names that are alternative forms of the base name.

  5. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

Base name information items are equal if the values of their [value] and [scope] properties are equal.

Issue (names-as-subjects):

Should base name items be merged, so that assertions made about one base name will also apply to all other base names that have the same identity? (This also applies to occurrences.)

Issue (names-with-types):

Should base names be allowed to have both types and scopes in the same way that occurrences do?

3.6 Variant items

A variant name is an alternative form of a base name that may be more suitable in certain contexts than the base name itself. The scope of the variant name is the only basis for establishing what variant name is most suitable in any given situation. A variant name may be a string, but it may also be any other kind of information resource.

When choosing a label for a topic, applications are expected to select the base name they consider most appropriate, and then evaluate which of the forms of that base name is best suited for display in that particular context, which may be the base name or one of its variants. This standard does not constrain the process by which this is done.

Section 5.3 Variant name scopes defines some published subjects that may be useful for scope variant names.

Variant items represent variant names, and have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the variant name.

  2. [value]: A string, which may be empty, or it may be null. The string, if set, is the variant name.

  3. [resource]: A locator item, or null. The locator, if set, refers to the information resource that is the variant name.

  4. [scope]: A non-empty set of topic items. This set is the scope that describes in what context(s) the variant name may be preferred as a label for the topic.

    Issue (prop-variant-scope-superset):

    ISO 13250:2000 allows a display name to have a scope that is a subset of that of the corresponding base name. This apparent contradiction needs to be resolved.

  5. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

SAM Constraint: Variant scope

The value of the [scope] property of each variant item must be a true superset of the value of the [scope] property of the base name item that is its parent.

SAM Constraint: Value/resource exclusion

Exactly one of the [value] and [resource] properties must contain null.

Variant information items are equal if the values of their [value], [resource], and [scope] properties are equal.

3.7 Occurrence items

An occurrence is a relationship between a subject and an information resource. The precise nature of this relationship is described by the occurrence type, a subject which is attached to the occurrence. Occurrences are generally used to attach information resources to the subjects they are relevant to. Note that the occurrence is properly not the resource, but the relationship between it and the subject. The information resource may either be a string inside the topic map or an external information resource.

All occurrences have a scope, which defines the contexts in which the occurrence relationship between the information resource and the subject is valid.

An occurrence item represents an occurrence and have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the occurrence.

  2. [value]: A string, which may be empty, or it may be null. The string, if set, is the information resource the occurrence connects with the subject.

  3. [resource]: A locator item, or null. The locator, if set, is the information resource the occurrence connects with the subject.

  4. [scope]: A set of topic items. This set is the scope that describes in what context the occurrence relationship may be considered valid.

  5. [occurrence type]: A topic item, or null. The topic item represents the subject that defines the nature of the occurrence relationship.

  6. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

SAM Constraint: Value/resource exclusion

Exactly one of the [value] and [resource] properties must contain null.

Occurrence information items are equal if the values of their [value], [resource], [scope], and [occurrence type] properties are equal.

3.8 Association items

An association is a relationship between one or more subjects. Associations have an association type, a subject which describes the nature of the relationship. The involvement of each subject in the relationship is called its association role.

All associations have a scope, which defines the context in which the statement represented by the association can be considered valid. The scope applies to the association roles as characteristics of the topics that play these roles, but all association roles in an association have the same scope, and so the scope is considered to apply to the association as a whole as well.

Association items represent associations, and have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the association.

  2. [scope]: A set of topic items. This set is the scope that describes in what context the association may be considered valid.

  3. [association type]: A topic item, or null. The topic item represents the association type of the association.

  4. [roles]: A non-empty set of association role items. The association role items represent the association roles that make up this association.

  5. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

Association items are equal if the values of their [scope], [association type], and [roles] properties are equal.

3.9 Association role items

An association role connects two pieces of information within an association: the subject participating in the association, known as the association role player, and the subject defining the nature of its participation, known as the association role type.

Ed. Note:(larsga)
Should we include examples to make this clear? Here, or under association?

Ed. Note:(larsga)
The UML should declare that there is an inverse of rolePlayingTopic, which is roles.

Association role items represent association roles, and may have the following properties:

  1. [source locators]: A set of locator items. This is the set containing the source locators of the association role.

  2. [role playing topic]: A topic item. This is the topic item that represents the association role player.

  3. [role type]: A topic item. This is the topic item that represents the association role type.

  4. [reifier]: A topic item, or null. The topic item is the topic that reifies this information item.

    The value of the property can be computed as follows: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.

Association role information items are equal if the values of their [role type] and [role player] properties are equal.

4 Merging

Merging is a process applied to topic maps in order to reduce the number of redundant topics representing the same subject. This specification requires merging to be performed in certain cases, but these are insufficient to guarantee that there will always be one topic per subject. Applications are therefore allowed to merge topics in cases where it is not clear that the topics represent different subjects.

Merging is triggered for information items of all types whenever an attempt is made to add an information item to a set that is equal to another already in that set.

Issue (merge-use-of-schemas):

The presence of a TMCL schema may allow applications to improve the result of merging topics/topic maps by providing enough information to allow implementations to do additional transformations and redundancy removal. How should the SAM specification deal with this possibility?

4.1 Merging topics

Whenever two different topic items have

  • at least one equal locator item in their [subject identifiers] properties,

  • at least one equal locator item in their [source locators] properties,

  • an equal locator item in their [subject address] property, or

  • an equal locator in the [subject identifiers] property of the one topic item and the [source locators] property of the other.

they must be merged.

Merging of topic items is done by the following procedure. The two topic items to be merged are known as A and B.

  1. Create a new topic item C.

  2. Replace A by C wherever it appears in one of the following properties of some information item: [topics], [scope], [occurrence type], [association type], [role playing topic], and [role type].

  3. Repeat for B.

  4. Set C's [source locators] property to the union of the values of A and B's [source locators] properties.

  5. Set C's [subject identifiers] property to the union of the values of A and B's [subject identifiers] properties.

  6. If A or B has a value for the [subject address] property, set this as the value of C's [subject address] property.

  7. Set C's [base names] property to the union of the values of A and B's [base names] properties.

  8. Set C's [occurrences] property to the union of the values of A and B's [occurrences] properties.

SAM Constraint: Single subject address

A and B cannot both have a locator item in their [subject address] property if those locator items are different.

Issue (merge-srcloc-vs-subjid):

What happens when the same locator appears as a source locator for one topic and as a subject identifier for another? Does that locator then become a source locator or a subject identifier of the merged topic? This question arises both under deserialization and under merging.

4.2 Merging base names

Base names are merged when the [base names] property of a topic item contains two equal base name items. The procedure for merging two base name items A and B is given below.

  • Create a new base name item C.

  • Set C's [source locators] value to the union of the value of the [source locators] properties of A and B.

  • Set C's [value] value to the value of the [value] property of A or B. If merging was triggered by the rules of this specification A and B should have the same value. If not, applications may choose either value.

  • Set C's [scope] value to the value of the [scope] property of A or B. If merging was triggered by the rules of this specification A and B should have the same value. If not, applications may choose either value.

  • Set C's [variant] value to the union of the [value] properties of A and B.

Once merging is complete, A and B must be removed from the set, and C added in their place.

4.3 Merging topic maps

Topic map merging is an operation that given two separate topic maps produces a single topic map which contains all the statements made by the two source topic maps. This specification is not concerned with when topic map merging occurs, but specifies a minimal procedure for merging topic maps. Topic map processors are free to make additional merges based on extra information available to them.

The procedure of merging two topic maps uses two topic maps, a master topic map A and a subordinate topic map B. All information items in B are merged into A, and A contains the information from both topic maps after the merge is complete.

  1. Set the [topics] property of B to the union of the [topics] properties of A and B.

  2. Set the [associations] property of B to the union of the [associations] properties of A and B.

4.4 Merging variant items

Two variant items, A and B, are merged by following the procedure below.

  1. Create a new variant item, C.

  2. Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.

  3. Set C's [value], [resource], and [scope] properties to the value of A's [value], [resource], and [scope] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.

Once merging is complete, A and B must be removed from the set, and C added in their place.

4.5 Merging occurrence items

Two occurrence items, A and B, are merged by following the procedure below.

  1. Create a new occurrence item, C.

  2. Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.

  3. Set C's [value], [resource], [scope], and [occurrence type] properties to the value of A's [value], [resource], [scope], and [occurrence type] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.

Once merging is complete, A and B must be removed from the set, and C added in their place.

4.6 Merging association items

To merge two association items, A and B, follow the procedure below.

  1. Create a new association item, C.

  2. Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.

  3. Set C's [scope], [roles], and [association type] properties to the value of A's [scope], [roles], and [association type] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.

Once merging is complete, A and B are removed from the set, and C added in their place.

4.7 Merging association role items

Two association role items, A and B, must be merged by following the procedure below.

  1. Create a new association role item, C.

  2. Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.

  3. Set C's [role type] and [role playing topic] properties to the value of A's [role type] and [role playing topic] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.

Once merging is complete, A and B must be removed from the set, and C added in their place.

4.8 Merging locator items

Locator items are not merged. If one locator item being added to a set is equal to one already in it the new locator can be discarded as being redundant.

5 Published subjects

A published subject indicator is a subject indicator that is published and maintained at an advertised location for the purposes of supporting topic map interchange and mergeability. A published subject is any subject for which there exists at least one published subject indicator. A published subject identifier is the subject identifier of a published subject indicator.

This section defines a number of published subjects in the expectation that many topic map applications will need them. These subjects form a central part of the topic map standard, yet there is no requirement that applications use them. Applications are free to define their own alternatives.

Ed. Note:(larsga)
Make sure we conform to the PubSubj TC guidelines (once they are in place).

Issue (psi-set-psi):

How does one uniquely identify the set of published subjects defined in SAM? Is there a need to do so? Is a published subject for these published subjects needed? (Does it include itself?)

Issue (psi-identification):

How does one determine which subjects are published subjects and which are not? Is it necessary for the SAM model to provide a mechanism for this at all?

Issue (psi-topicmaps.org):

Should the subject identifiers defined by XTM 1.0 be retained as they are, or should new equivalent ones be defined to replace the originals?

Issue (psi-publishing):

Where should the indicators for the subjects published in the new ISO 13250 be published?

5.1 The type-instance relationship

A type is a set of individual subjects, each of which is an instance of the type. Types are generally used to represent sets of subjects which share some commonality, but this specification does not restrict the possible uses of types, or their meanings. A type may itself be an instance of another type, and there is no limit to the number of types a subject may be an instance of. Scope applies to this association type in just the same way as it does to any other.

The type-instance relationship is not transitive. That is, if A is a type of which B is an instance, and B is a type of which C is an instance, it does not follow that C is an instance of A.

The type-instance relationship between two topic items can be asserted in topic maps using an association item that conforms to the following rules:

  • The [association type] property must be set to a topic item that has in its [subject identifiers] property a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#class-instance".

  • The [roles] property must contain exactly two association role items.

  • One of the association items in the [roles] property must have its [role type] property set to a topic whose [subject identifiers] property is set to a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#class". The [role player] property will then contain the topic item representing the type.

  • One of the association items in the [roles] property must have its [role type] property set to a topic whose [subject identifiers] property is set to a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#instance". The [role player] property will then contain the topic item representing the instance.

Associations that use one or more of the published subjects defined in this section, but which do not conform to these structural rules, are not considered to represent type-instance relationships. Their interpretation is not defined by this international standard.

This specification does not require implementations to actually represent the type-instance relationship using associations, as long as they conform to the rules of this specification.

5.2 The supertype-subtype relationship

A type may be a subtype of another, which is then considered the supertype of the first. If B is the subtype of A, it follows that every instance of B is also an instance of A. The converse is not necessarily true. The relationship between a supertype and its subtypes is known as the supertype-subtype relationship. A type may have any number of subtypes and supertypes. Scope applies to this association type in just the same way as it does to any other.

The supertype-subtype relationship is transitive, which means that if B is a subtype of A, and C a subtype of B, C is also a subtype of A.

Issue (psi-subclassing-loops):

Are subtype loops allowed?

The supertype-subtype relationship between two types can be asserted in topic maps using an association item that conforms to the following rules:

  • The [association type] property must be set to a topic item that has in its [subject identifiers] property a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass".

  • The [roles] property must contain exactly two association role items.

  • One of the association items in the [roles] property must have its [role type] property set to a topic whose [subject identifiers] property is set to a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#superclass". The [role player] property will then contain the topic item representing the supertype.

  • One of the association items in the [roles] property must have its [role type] property set to a topic whose [subject identifiers] property is set to a locator item with [notation] set to "URI" and [reference] set to "http://www.topicmaps.org/xtm/1.0/core.xtm#subclass". The [role player] property will then contain the topic item representing the subtype.

Associations that use one or more of the published subjects defined in this section, but which do not conform to these structural rules, are not considered to represent supertype-subtype relationships. Their interpretation is not defined by this international standard.

5.3 Variant name scopes

The subject identifier http://www.topicmaps.org/xtm/1.0/core.xtm#sort (notation "URI"), identifies the notion of suitability of a variant name for use as a sort key for a subject. A variant item that has a topic with this subject identifier in its [scope] property represents a variant name intended to be used as a sort key for the topic item it belongs to.

Issue (op-sorting):

Does this standard need to define how sorting of topics is done? It is a highly fundamental operation. On the other hand, users may want flexibility in this regard.

The subject identifier http://www.topicmaps.org/xtm/1.0/core.xtm#display (notation "URI"), identifies the notion of suitability of a variant name for use as a display name for a subject. A variant item that has a topic with this subject identifier in its [scope] property represents a variant name intended to be used as a label for the topic item it belongs to. This published subject is included only for backwards compatibility with ISO 13250:2000.

Issue (psi-display-name):

If the topic naming constraint is not retained by the standard, is there then any need for this published subject? Or will the base name then take over the function previously fulfilled by this published subject?

6 Conformance

A topic map processor conforms to this standard provided it meets the requirements listed below.

  • The topic map processor must make all the information described in 3 Information item types available to applications, and document how its representation of topic map corresponds to the model defined in that section.

    Information beyond that described by the model defined in this international standard may be provided, but none may be left out. It is allowed to only allow locators that follow the "URI" notation, and to represent these as strings.

  • The topic map processor must be able detect and report all violations of the SAM constraints.

  • The topic map processor must detect all attempts to add duplicate values to set properties, and also perform all merges according to the rules of section 4 Merging.

  • It is not required that topic map processors treat computed properties differently from the other properties in any way. Their values must be the same as if they had been computed using the procedure defined in this specification.

A References

Davies
Europe: A History, by Norman Davies, Oxford University Press, 1996, ISBN 0-19-820171-0.
infoset
XML Information Set, J. Cowan and R. Tobin, Editors. World Wide Web Consortium. 24 October 2001. The latest version of XML Information set is available at http://www.w3.org/TR/xml-infoset.
ISO13250
ISO/IEC 13250:2002 Topic Maps, ISO, Geneva, 2002.
RFC2396
IETF RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax., T. Berners-Lee, R. Fielding, L. Masinter. 1998.
tm-guide
Guide to the topic map standardization process, Lars Marius Garshol, 2002-06-23, ISO/IEC JTC1 SC34/N0323.
UML
...
unicode
The Unicode Consortium. The Unicode Standard, Version 3.0. Reading, Mass.: Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.
XTM
XML Topic Maps (XTM) 1.0 Specification, TopicMaps.Org, 2001.

B Resolved issues (Non-Normative)

This section contains all the issues that have been present in earlier versions of this document, but have since been resolved. They are included here for reference. Follow the links in order to find the resolutions, as well as background material on each issue.

Issue (type-vs-class):

Should all types be called classes, all classes types, or should both terms be used? Which terms should be used where?

Issue (prop-classes):

Should we have the topic.[classes] property? It means either that classness cannot be scoped, or that classness has a double representation. The question is: is scoping of classness important? Does it cause problems for implementations and applications?

Issue (meta-uml-normative):

Should the UML diagrams be made normative?

Issue (occurrence-traversal):

Do the 'linktrav' and 'listtrav' attributes of the HyTM syntax have model significance?

Issue (prop-subj-address-name):

Is topic.[subject address] the right name for the property? We have [subject indicator], not [subject identifier], so why [subject address] rather than [subject resource]?

Issue (xtm-def-occurrence-type):

According to XTM 1.0 the default occurrence type is the "occurrence" published subject. Should this standard follow its lead? If so, what does it mean?

Issue (psi-generics):

Should the "topic", "association", and "occurrence" PSIs be specified in the SAM? If so, what do they mean, and what is their function?

Issue (term-subject-address):

The term "subject address" does not correlate with the term "subject indicator", since "subject address" corresponds more clearly to "subject identifier". A better name should be found.

Issue (term-tm-processor):

Do the definitions of the terms 'topic map processor' and 'topic map applications' have unwanted consequences for what software architectures can be conforming implementations of this standard?

Issue (term-theme):

Is the term 'theme' useful, or best forgotten?

Issue (xtm-implicit-constraints):

The XTM DTD contains a number of implicit constraints, such as that an addressable subject may not be used as a theme or a type. Should these constraints be mirrored in the SAM?

Issue (term-empty-string):

Do we need to define what the empty string is? What about non-empty string (used liberally throughout)?

Issue (term-empty-set):

Do we need to define what the empty set is? And the empty string?

Issue (op-topic-map-compare):

Should we define an equality criterion for topic map items? There is no need for duplicate removal for topic maps, but on the other hand that would be what is needed to define the conformance requirements on serialization implementations.

Issue (prop-schema):

Should topic map items have a [schema] property that may contain their schema definition(s)? This would make it clear where to find the topic map schema. On the other hand, the TMCL specification should perhaps have its own rules for specifying how to find the schema of a topic map. It may be better to keep the levels strictly apart.

Issue (term-subject-def):

Should the standard say as little as possible about the nature of subjects, or should it be more detailed in order to provide guidance to readers? The current text is detailed, but may be too much so.

Issue (term-subject-identity):

Is the term "subject identity" needed? It is defined in XTM 1.0, but it is not clear that there is any use for it. The XTM 1.0 definition is: "That which makes two subjects identical, or distinguishes one subject from another."

Issue (term-topic-characteristic-assignment):

Are topic characteristic assignments statements about the topics's subject, or about the topic?

Issue (merge-same-subj-ind-addr):

Is it allowed for a topic to have the same locator item both as subject identifier and as subject address? If it does, must not this mean that the topic has two subjects?

Issue (term-topic-name):

XTM 1.0 has a term "topic name", but it is not clear how it relates to the term "base name". Its use in XTM 1.0 seems to be inconsistent. Is the term useful, or should it be abandoned?

Issue (prop-value-null):

If it may not be null, why may it be empty?

Issue (variant-in-basename):

ISO 13250:2000 does not restrict display/sort names to a single base name, by design. Is it acceptable for SAM to do so?

Issue (assoc-role-player-type):

Must both [role playing topic] and [role type] have values?

Issue (merge-prop-srclocs):

Should source locators of B be copied into A? If they are, it is implied that A is the same topic map as B, which is not true. Also, topics reifying B will then also reify A, which means that any statements made about B will also apply to A.