JTC 1 / SC 34 N 431


Information Technology --

Document Description and Processing Languages

Title: Requirements for the Canonical XML Topic Maps Specification
Source: Khalil Ahmed, JTC1/SC34
Project: JTC 1 NP Number – ISO/IEC  13250-4
Project editor: Steve Pepper
Status: Author's draft
Action: For Review and Comment
Date: 2003/07/28
Distribution: National Bodies and Liaisons of SC34
Refer to:

Requirements for the Canonical XML Topic Maps Specification

1. Background

The development of a canonicalisation for the topic map data model was first proposed for the roadmap for the ISO 13250 family of standards in [N278]. The first draft proposal for Canonical XML Topic Maps [N0395] was developed by Steve Pepper.

2. Introduction

2.1 Purpose and Scope

Canonicalisation is the process by which a data model is reduced to a serialised form such that two logically equivalent data model instances result in an identical byte-by-byte serialization.

Canonical XML Topic Maps (CXTM) will define a means to express a topic map processed according to the processing rules defined in ISO 13250-2: Topic Maps -- Data Model [DM] in a canonical form. The canonicalisation will be based on the model defined by ISO 13250-2 and henceforth referred to as the Topic Maps Data Model. Such a canonical form will enable the instance of the Topic Maps Data Model constructed by one topic map processor to be directly compared to that constructed by another topic map processor.

This document will define the requirements for the CXTM specification and is intended as a guide to the principles which should be applied in the development of the specification.

CXTM is to be developed by ISO/JTC SC34 WG3 and this requirements document is for members of the committee and implementors who have expressed an interest in the development of a canonical expression of the Topic Maps Data Model.

2.2 Definitions, Acronyms and Abbreviations

The keywords "MUST," "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMEND", "MAY", and "OPTIONAL" will be used in this document as defined in [RFC 2119].

2.3 References

  1. [13250] ISO/IEC 13250:2003 Topic Maps, ISO, Geneva, 2002.
  2. [CXML] Canonical XML Version 1.0, W3C, 2001-03-15
  3. [DM] ISO 13250-2:Topic Maps -- Data Model, Lars Marius Garshol, Graham Moore, 2002-12-04, ISO/IEC JTC1 SC34 N0356
  4. [N278] Roadmap of the Family of Topic Map Standards, Lars Marius Garshol, 2002-12-17, ISO/IEC JTC1 SC34 N0278
  5. [N323] Guide to the topic map standardization process, Lars Marius Garshol, 2002-06-23, ISO/IEC JTC1 SC34 N0323.
  6. [N388] Summary of Voting on SC 34 N 358 – Restatement of Topic Maps, 2003-03-26, ISO/IEC 13250:2002 ISO/IEC JTC1 SC34 N0388.
  7. [RFC 2119] Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, March 1997.

3. Overview of CXTM Requirements

3.1 Dependencies

Pre-existing algorithms for canonicalisation of data structures and especially of graph structures may contribute to the definition of CXTM. CXTM will be based on the next version of ISO 13250 as defined by [N323] and will be defined in terms of a canonical representation of the Topic Maps Data Model as defined by [DM].

The W3C recommendation on Canonical XML [CXML] may contribute to the definition of CXTM.

3.2 CXTM MUST specify

3.2.1 Canonicalisation of the data model defined by ISO 13250-2: Topic Maps -- Data Model

CXTM shall be specified in terms of the data model defined in ISO 13250-2 and will not be specified on any serialization format for topic maps. This will automatically allow it to support any syntax from which valid instances of the Topic Maps Data Model may be constructed including, but not limited to the XML syntax defined by ISO 13250-3: Topic Maps -- XML Syntax and the HyTime-based syntax defined by ISO 13250-4: Topic Maps -- HyTime Syntax.

3.2.2 Canonical Object Identity

For each type of information item defined by the Topic Maps Data Model, CXTM must define an algorithm for expressing the identity of an information item of that type. The information item identity must be unique to that information item in a given instance of the Topic Maps Data Model.

3.2.3 Canonical Sort Order Information Item Type Sort Order
CXTM must define a canonical ordering of the types of information items defined by the Topic Maps Data Model. If information item type A is sorted higher than information item type B, then all information items of type A will be treated as sorting higher than any information item of type B. Instance Sort Order
For each information item type defined by the Topic Maps Data Model, CXTM must define a comparison algorithm which enables two information items of the same type from the same instance of the Topic Maps Data Model to be compared and their relative positions in a canonical sorting order established. The canonical sorting order must be consistent such that if A > B and B > C then A > C and such that for items A1 and B1 the ordering must be the same as items A2 and B2 when A1 == A2 and B1 == B2. Additionally, no two different information items in a given instance of the Topic Maps Data Model may be defined to be equal under the canonical sort order.

The instance sort order algorithms must produce the same results on all machines, regardless of locale, operating system, programming language and input format.

The instance sort order must not be sensitive to changes in the URI from which the input file(s) is/are loaded.

3.2.4 Topic Maps Data Model Instance Comparison Algorithm

The Topic Maps Data Model instance comparison algorithm will enable two model instances to be compared and be declared to be either canonically equivalent or not.

Editor's Note: This section will be removed. It remains at the moment to preserve section numbering for comments.

3.2.5 An XML representation of the canonicalised model

An XML representation of the canonicalised model will allow the canonicalised output of topic map processors to be written to a file to support the development of test suites. Further, it may be possible to use XML canonicalisation algorithms to directly compare XML representations of canonicalised instances of the Topic Maps Data Model.

3.2.6 Rules For Determining Non-Canonicalisable Topic Maps

It has not yet been proven that a deterministic canonicalisation algorithm for all topic maps can be found. While the committee will strive to develop a completely deterministic algorithm, if it is the case that the final algorithm is not capable of dealing with some specific topic map model instances then the specification will define the rules for identifying such models so that they can be avoided by the users of this standard.

3.3 CXTM SHOULD specify

Anything to add here ?

3.4 CXTM MAY specify

Anything to add here ?

4 Non-Requirements

4.1 CXTM WILL NOT specify

4.1.1 Expression of Error Conditions

Some topic map processing may require a processor to indicate one or more error conditions. In such a case, it may not be possible for an instance of the Topic Maps Data Model to be constructed by the processor. In such cases, there will be no defined canonicalisation.

4.1.2 Test Suite

It is recommended that a test suite be developed, but it will not be within the scope of this ISO project of work. However, we encourage other standards bodies such as OASIS to consider it.

If a suite were developed, it could possibly consist of a set of topic maps in a variety of syntaxes that map to the Topic Maps Data Model and XML representations of the CXTM instance expected after processing those topic maps. In addition the test suite may define combinations of topic maps which may be merged and the CXTM instance expected after merge processing. The test suite may also define topic maps which are either invalid against their syntax schema or which are expected to cause one or more errors to be raised during the processing of the topic map.

5 Contributors

The editor would like to thank the following individuals for their contributions of review comments and additional requirements to this document.

Lars Marius Garshol, Steve Pepper