ISO/IEC JTC 1/SC 34N0905
ISO/IEC JTC 1/SC 34
Information Technology --
Document Description and Processing Languages
|TITLE:||Text for CD Ballot - ISO/IEC CD 13250-6 Information Technology - Topic Maps - Compact Syntax (CTM)|
|SOURCE:||Mr. Steve Pepper|
|PROJECT:||CD 13250-6: Information technology - Topic Maps - Compact syntax|
|PROJECT EDITOR:||Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh; Mr. Steve Pepper|
|STATUS:||Committee Draft (CD)|
|ACTION:||For National Body ballot|
|DISTRIBUTION:||SC34 and Liaisons|
|REFER TO:||N0905b - 2007-09-09 - Ballot due 2007-12-09 - ISO/IEC CD 13250-6 Information Technology - Topic Maps - Compact Syntax (CTM)|
Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: [email protected]
Topic Maps — Compact Syntax
|3.1||About the syntax|
|3.3||Common syntactical constructs|
|3.3.2||Creating IRIs from strings|
|3.3.3||Creating IRIs from QNames|
|3.5.4||Topic Map Reifier|
|3.13.4||Template Import Directive|
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
ISO/IEC 13250-6 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information Technology, Subcommittee SC 34, Document Description and Processing Languages.
ISO/IEC 13250 consists of the following parts, under the general title Topic Maps:
- Part 1: Overview and Basic Concepts
- Part 2: Data Model
- Part 3: XML Syntax
- Part 4: Canonicalization
- Part 5: Reference Model
- Part 6: Compact Syntax
- Part 7: Graphical Notation
CTM (Compact Topic Maps) is a text-based notation for representing topic maps. It provides a simple, lightweight notation that complements the existing XML-based interchange syntax defined in [XTM] and can be used for
manually authoring topic maps;
providing human-readable examples in documents;
serving as a common syntactic basis for TMCL and TMQL.
The principal design criteria of CTM are compactness, ease of human authoring, maximum readability, and comprehensiveness rather than completeness. CTM supports all constructs of the [TMDM], except item identifiers on constructs that are not topics.
Since CTM is not designed primarily as an interchange syntax, care should be taken when using CTM as a basis for interchanging topic maps.
This part of ISO/IEC13250 should be read in conjunction with [TMDM] since the interpretation of the CTM syntax is defined through a mapping from the syntax to the data model there defined.
Topic Maps — Compact Syntax
This part of ISO/IEC13250 defines a text-based notation for representing instances of the data model defined in [TMDM]. It also defines a mapping from this notation to the data model. The syntax is defined through an EBNF grammar.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
Each of the following documents has a unique identifier that is used to cite the document in the text. The unique identifier consists of the part of the reference up to the first comma.
Unicode, The Unicode Standard, Version 5.0.0, The Unicode Consortium, Reading, Massachusetts, USA, Addison-Wesley Developer's Press, 2007, ISBN 0-321-48091-0, http://www.unicode.org/versions/Unicode5.0.0/
XML 1.0, Extensible Markup Language (XML) 1.0, W3C, Third Edition, W3C Recommendation, 04 February 2004, http://www.w3.org/TR/REC-xml/
TMDM, ISO 13250-2 Topic Maps — Data Model, ISO, 2006, Lars Marius Garshol, Graham Moore, http://www.isotopicmaps.org/sam/sam-model/
XTM, ISO 13250-3 Topic Maps — XML Syntax, ISO, 2006, http://www.isotopicmaps.org/sam/sam-xtm/
XSDT, XML Schema Part 2: Datatypes Second Edition, W3C, W3C Recommendation, 28 October 2004, http://www.w3.org/TR/xmlschema-2/
RFC3986, RFC 3986 - Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, 2005, http://www.ietf.org/rfc/rfc3986
RFC3987, RFC3987 - Internationalized Resource Identifiers (IRIs), M. Duerst, M. Suignard, 2005, http://www.ietf.org/rfc/rfc3987.txt
3 Syntax description
3.1 About the syntax
The acronym CTM is often used to refer to the syntax defined in this part of ISO/IEC13250. Its full name is Compact Topic Maps Syntax.
This clause defines the syntax of CTM documents using an EBNF grammar based on the notation described in [XML 1.0]. It also defines the semantics of such documents using prose that specifies the mapping from CTM documents to [TMDM]. The full EBNF can be found in Annex A.
The process of exporting a topic map from an implementation's internal representation of the data model to an instance of a Topic Maps syntax is known as serialization. The opposite process, deserialization, is the process of building an instance of an implementation's internal representation of the data model from an instance of a Topic Maps syntax.
This clause defines how instances of the CTM syntax are deserialized into instances of the data model defined in [TMDM]. Serialization is only implicitly defined, but implementations should guarantee that for any data model instance the CTM serialization produced by the implementation should, when deserialized to a new data model instance, produce one that has the same canonicalization as the original data model instance, according to [ISO 13250-4].
The input to the deserialization process is:
A CTM document.
An absolute IRI. This is the IRI from which the CTM document was retrieved, known as the document IRI. This IRI shall always be provided, as it is necessary in order to assign the item identifiers of the topic items created during deserialization. If the CTM document was not read from any particular IRI the application is responsible for providing an IRI considered suitable.
Deserialization is performed by processing each component of the document in document order. Components are defined in terms of text that matches a syntactic variable of the EBNF. For each component encountered the operations specified in the clause for the corresponding syntactic variable are performed.
Whenever a new information item is created, those of its properties which have set values are initialized to the empty set; all other properties are initialized to null.
Each CTM processor shall be aware of the following prefixes
This is the namespace for the XML Schema Datatypes.
This is the namespace for this part of ISO/IEC13250
3.3 Common syntactical constructs
Comments are fragments of the character stream which are ignored by a CTM processor. Comments are allowed where whitespace characters are allowed. Comments are introduced by a hash (#) and continue until the end of the current line, or until the end of the text stream, whichever comes first.
3.3.2 Creating IRIs from strings
|||iri||→||'An absolute IRI according to RFC 3987'|
|||relative-iri||→||'A relative IRI according to RFC 3987'|
|||doc-reference||→||iri-ref | relative-iri|
To create an IRI from a string, unescape the string by replacing %HH escape sequences with the characters they represent, and decode the resulting character sequence from UTF-8 to a sequence of abstract Unicode characters. The resulting string is turned into an absolute IRI by resolving it against the document IRI.
3.3.3 Creating IRIs from QNames
QNames are used to abbreviate IRIs. They are declared as follows:
A QName causes a locator to be created. During deserialization, the IRI to which the prefix is bound is concatenated with the local part. The result of such a process is always an absolute IRI.
If the prefix has not been bound to an IRI as specified in 3.13.1, an error shall be flagged.
3.3.4 IRI References
IRI references are either QNames or absolute IRIs. They are interchangeable: Everywhere an IRI can be used a QName can also be used (provided the prefix is defined at that point).
3.3.5 Topic References
Topics are referenced by an item identifier, a subject identifier, or a subject locator.
|||topic-ref||→||identifier | subject-identifier | subject-locator | wildcard | named-wildcard|
|||identifier||→||'[_a-zA-Z][a-zA-Z0-9_-.]* @@@TODO: NCName (c.f. XML)'|
During deserialization, one topic item is created for each topic-ref.
If the topic-ref is an identifier and a default prefix has been declared, a locator is created by concatenating the default prefix and the value of the identifier. The locator is added to the [subject identifiers] property of the topic.
If the topic-ref is an identifier and no default prefix has been declared, a locator is created by concatenating the document IRI, a # character, and the value of the identifier. The locator is added to the [item identifiers] property of the topic.
If the topic-ref is specified by a subject identifier, a locator is created and added to the [subject identifiers] property of the topic.
If the topic-ref is specified by a subject locator, a locator is created (the leading = is not part of the locator) and added to the [subject locators] property of the topic.
If the topic-ref is specified by a wildcard, a locator is created. The locator shall use the document locator as base with a concatenated # and a random identifier. That locator shall not be equal to any other subject identifier, subject locator or item identifier contained in the topic map. The unique, newly created locator is added to the [item identifiers] property of the topic. This operation shall not cause any merging operation.
If the topic-ref is specified by a named-wildcard, at the first occurrence of such construct a unique locator is created which shall use the document locator concatenated with # and a random identifier such that the result is unique in the topic map. This locator is added to the [item identifiers] property of the newly created topic. This operation shall not cause any merging operation. If the named-wildcard is used again, the same topic is reused. No whitespaces are allowed between the * and the identifier.
The difference between an ordinary identifier and a named-wildcard is that an identifier causes the creation of the same item identifier every time the topic map is deserialized. A named-wildcard (and an ordinary wildcard) causes the generation of a new, random item identifier every time the topic map is deserialized.
If the topic item created through deserialization of a topic-reference is equal to another topic item (c.f. [TMDM] 5.3); the two topic items are merged according to the procedure given in [TMDM].
# A topic referenced by the subject locator "http://www.isotopicmaps.org/" = http://www.isotopicmaps.org/ # A topic referenced by a subject identifier http://psi.example.org/John_Lennon # A topic with a unique, randomly created item identifier. Within the CTM # document it is not possible to reference this topic by item identifier. * - "A topic" # A topic with a unique, randomly created item identifier which may be # referenced again within the CTM document. *foo - "A new, unique topic" # Another reference to the same topic later in the document *foo isa: subject
The scope construct is used to assign a scope to an information item.
During deserialization, each topic-ref element is processed according to the procedure described in 3.3.5. These topic items are gathered into a set that is assigned as the value of the [scope] property of the Topic Maps construct being processed.
The reifier construct is used to refer from the Topic Maps construct on which it appears to the topic reifying that construct. The reference is a topic-ref as described in 3.3.5
During deserialization the topic-ref is resolved into a topic item following the procedure in 3.3.5. The topic item is set as the value of the [reifier] property of the Topic Maps construct being processed.
The type construct is used to assign a type to the Topic Maps construct in which it occurs. The type is always a topic, indicated by the topic-ref.
During deserialization the topic-ref produces a topic item following the procedure in 3.3.5, which is set as the value of the [type] property of the information item being processed.
A literal is a string value with an optional datatype.
For convenience, this part of ISO/IEC13250 support native identification of integers, decimals, IRIs, dates and dateTime values.
|||number||→||decimal | integer|
|||integer||→||sign ? '[0-9]+'|
|||decimal||→||sign ? ('[0-9]+' '.' '[0-9]*' | '.' '[0-9]+')|
|||sign||→||'+' | '-'|
|||date||→||'A value that matches http://www.w3.org/TR/xmlschema-2/#date'|
|||date-time||→||'A value that matches http://www.w3.org/TR/xmlschema-2/#dateTime'|
|||string||→||quoted-string | triple-quoted-string|
|||quoted-string||→||'"' '('\"' | '\\' | [^"\])*' '"' '@@@TODO: Correct RegEx'|
|||triple-quoted-string||→||'"""' '[^\"\"\"]' '"""' '@@@TODO: Correct RegEx'|
The following implicit datatypes are associated with the above mentioned literals:
If the iri-ref is a QName, the QName converted into an IRI by the procedure described in 3.3.3.
Any literal can be expressed by representing the value as a string and appending the datatype qualifier (^^) and a iri-ref which indicates the datatype.
If the literal occurs inside an occurrence (3.8), the occurrence [value] property is set to the value of the literal and the occurrence [datatype] property is set to the datatype of the literal.
If the literal occurs inside a variant (3.10), the variant [value] property is set to the value of the literal and the variant [datatype] property is set to the datatype of the literal.
42 # equivalent to "42"^^xs:integer "12-22"^^xs:gMonthDay
3.4.1 Escape Syntax
The following escape sequences can be used inside strings:
- Any unicode character
- \u HEX HEX HEX HEX
- double quotes
3.5 Topic Map
The topicmap component acts as a container for the topic map but has no further significance. It is declared as follows:
The prolog occurs at the start of the CTM document and consists of an (optional) encoding directive and an (optional) version directive.
3.5.2 Encoding Directive
The encoding directive specifies the character encoding used by the document.
If the encoding declaration is omitted, UTF-8 encoding is assumed. The name of the encoding shall be given as a string in the form recommended by [XML 1.0].
If the encoding is provided, it shall occur in the first line of the document.
3.5.3 Version Directive
The version directive states the version number of the CTM syntax, which is currently "1.0". It is declared as follows:
The version directive tells the parser which version of the CTM syntax to use during deserialization. Currently the only legal version is 1.0, as defined by this part of ISO/IEC13250.
While the version directive may be omitted, this part of ISO/IEC13250 recommends its usage for future compatibility.
If the version directive is provided, it shall occur after the encoding directive on the second line of the document. If the encoding directive is omitted, the version directive shall occur on the first line of the document.
3.5.4 Topic Map Reifier
During deserialization the topic is resolved following the procedure in 3.6 and the topic item is set as the value of the [reifier] property of the topic map.
If the topicmap-reifier production occurs more than once in a CTM document, the produced topic item gets merged with the existing topic map reifier according to the procedure given in [TMDM].
The following CTM fragments produce exactly the same topic map.
# Fragment A ~ tm-reifier - "My topic map" # Fragment B ~ tm-reifier tm-reifier - "My topic map"
The topic construct is used to declare a topic, assign identifiers to it, and make statements about it, through the assignment of names and occurrences, or the invocation of templates which generate associations (and/or additional names and occurrences). It starts with a topic reference and ends with either a blank line or a period.
|||topic||→||topic-ref identity * ( assignment | topic-dependent-invocation )* topic-end|
|||identity||→||subject-identifier | subject-locator|
|||assignment||→||name | occurrence|
|||topic-end||→||'\s+' '.' | '^\s*$'|
During deserialization a topic item is produced and given an identifier by processing the topic-ref in accordance with the procedure described in 3.3.5.
If additional subject identifiers or subject locators are specified, a locator is created for each of them according to the procedure described in 3.3.5 and added to the [subject identifiers] or [subject locators] property of the topic respectively.
If either of these procedures makes the topic equal to another topic item (c.f. [TMDM] 5.3); the two topic items are merged according to the procedure given in [TMDM].
If the topic construct contains template invocations (3.12), the first argument is automatically bound to the topic item produced by the procedures above.
john-lennon # a topic with an item identifier and nothing else lennon . mccartney . harrison . starr . # four topics on one line # A topic defined by its subject locator = http://www.isotopicmaps.org/ - "The ISO Topic Maps Web Site" # A topic defined by a subject identifier http://psi.example.org/John_Lennon - "John Lennon" # A topic with a local identifier and a subject identifier john http://psi.example.org/John_Lennon - "John Lennon"
The association construct is used to add associations to the topic map. It is declared as follows:
|||association||→||type '(' roles ')' scope? reifier?|
|||roles||→||role (',' role)*|
|||role||→||role-type ':' player reifier?|
During deserialization an association item is created for each association and added to the [associations] property of the topic map item. The [type] property of the association is set to the topic produced by the type as decribed in 3.3.8.
During deserialization an association role item is created for each role-type / player pair. The role item is added to the [roles] property of the association item.
If the type is an IRI, there shall be at least one whitespace between the IRI and the (, otherwise the parenthesis will be part of the IRI.
member_of(group: The_Beatles, member: John_Lennon)
The occurrence construct is used to add occurrences to a topic. It is declared as follows:
|||occurrence||→||type ':' literal scope? reifier?|
During deserialization the occurrence construct causes an occurrence item to be created, and added to the [occurrences] property of the topic item created by the procedure decribed in 3.6.
Paul_McCartney birthday: 1942-02-18 webpage: http://en.wikipedia.org/wiki/Paul_McCartney
The name construct is used to add a topic name to a topic. It is declared as follows:
During deserialization the name construct causes a topic name item to be created, and added to the [topic names] property of the topic item created by the procedure decribed in 3.6.
If the type is not specified, the [type] property of the topic name item is set to the topic item whose [subject identifiers] property contains http://psi.topicmaps.org/iso13250/model/topic-name; if no such topic item exists, one is created.
john-lennon - "John Lennon" # Name with the default name type - fullname: "John Winston Lennon" # Name of type 'fullname'
The variant construct is used to add a variant name to a topic name. It is declared as follows:
During deserialization the variant construct causes a variant item to be created and added to the [variants] property of the topic name item created by the procedure described in 3.9. After the scope has been processed, the topics in the [scope] property of the topic name item created by the parent name construct are added to the [scope] property of the variant name item.
%prefix tm http://psi.topicmaps.org/iso13250/model/ # a topic with a sort name variant john - "John Lennon" ("lennon, john" @ tm:sort)
Templates are containers for arbitrary Topic Maps constructs and certain CTM directives. The template body consists of ordinary topics and associations and allows topic references (3.3.5) and literals (3.4) to be replaced by variables. They are defined as follows:
|||template||→||'def' template-name '(' parameters? ')' template-body 'end'|
|||parameters||→||variable (',' variable)*|
In the template-body variables are allowed wherever topic-refs or literals are allowed.
The declaration of a template does not change the topic map content until the template has been invoked by a template-invocation (3.12).
If a template with the same identifier has already been defined, an error is flagged.
# Declaration of a template that sets the topic type to 'person', creates an # occurrence of type 'birthday', and creates an association of type 'born-in' def born($person, $date, $place) $person isa: person birthday: $date . born-in(person : $person, birthplace : $place) end # Invocation of the template inside a topic block mccartney born(1942-06-12, Liverpool) # Invocation of the template outside topic block born(mccartney, 1942-06-12, Liverpool) # Both of the above have the same effect as the following mccartney isa: person birthday: 1942-06-12 . born-in(person: mccartney, birthplace: Liverpool)
If a template contains a prefix directive (3.13.1), the prefix is only valid within the template body. Prefixes previously declared outside the template are accessible within the template body.
If a named-wildcard is used inside the template body, it is accessible only within the template. A named-wildcard causes the creation of a topic each time the template is invoked.
Template names and topic identifiers do not share the same namespace. If a template with name A exists, it is still possible to declare a topic with the identifier A.
The following templates are predefined:
The is a template creates a type-instance relationship between two topics (c.f. [TMDM] 7.2).
def isa($instance, $type) %prefix tm http://psi.topicmaps.org/iso13250/model/ tm:type-instance(tm:instance: $instance, tm:type: $type) end
The a kind of template creates a supertype-subtype relationship between two topics (c.f. [TMDM] 7.3).
def ako($sub, $super) %prefix tm http://psi.topicmaps.org/iso13250/model/ tm:supertype-subtype(tm:subtype: $sub, tm:supertype: $super) end
3.12 Template Invocation
|||template-invocation||→||template-reference '(' arguments? ')'|
|||topic-dependent-invocation||→||template-reference (':' | argument | '(' argument ',' arguments ')')|
|||template-reference||→||template-name | qname|
|||arguments||→||argument (', ' arguments)?|
|||argument||→||topic-ref | literal|
A template invocation causes the statements of the template to be added to the topic map with the variables in the statements replaced by the specified arguments.
If the template was not previously defined, an error is flagged.
If the template-reference is a qname and the prefix was not created by the procedure described in 3.13.4, an error is flagged.
If any variables are not bound, an error is flagged.
# Template invocation within a topic declaration mccartney isa: person plays-for(The_Beatles, piano) mccartney has-shoesize: 42 # Template invocation outside of topic declarations plays-for(john, The_Beatles, guitar) has-shoesize(john, 45)
Directives are used to define the environment for a CTM processor. Each directive shall occur in a line of its own. Only a comment is allowed on the same line.
3.13.1 Prefix Directive
The prefix directive is used to associate an IRI with an identifier. It is declared as follows:
|||prefix||→||'%prefix' identifier? iri|
During deserialization the prefix component binds the identifier to the IRI.
If the identifier is already bound, an error is flagged unless the identifier is bound to one and the same IRI.
An error shall be flagged if the concatenation of the iri and the local part of a QName does not produce a valid IRI.
%prefix wiki http://en.wikipedia.org/wiki/ wiki:John_Lennon # QName used as a subject identifier description: wiki:John_Lennon # QName used as the value of an occurrence
If no identifier is specified, the IRI becomes the default prefix.
The presence of a default prefix affects the interpretation of identifiers as described in 3.3.5.
# prefix directive with no identifier %prefix http://psi.example.com/ # Topic with the subject identifier http://psi.example.com/John_Lennon John_Lennon - "John Lennon"
3.13.2 Include Directive
The include directive is used to include another CTM document into the CTM file. The other document is referenced by an IRI.
The referenced document should use CTM syntax, otherwise an error is flagged.
During deserialization the include component causes the referenced topic map to be immediately deserialized into a data model instance. The new data model instance (B) is then merged into the current one (A) by
Iterating through all topic item in B's [topics] property and identifying those topics which contain a locator in their [item identifiers] property which starts with the referenced document locator.
For each of the identified topics a locator is created by subtracting the referenced document locator from the item identifier which start with the referenced document locator.
The result of that subtraction is concatenated with the document locator of A. The result is an absolute locator which is added to the [item identifiers] property of the topic.
Adding all topic items in B's [topics] property to A's [topics] property.
Adding all association items in B's [associations] property to A's [associations] property.
Adding topics and associations to A may trigger further merges, as described in [TMDM].
3.13.3 Mergemap Directive
The mergemap directive is used to merge an external topic map into the topic map produced by deserializing the CTM topic map.
This directive is declared as follows:
|||mergemap||→||'%mergemap' doc-reference notation?|
The topic map to be merged can be in any syntax, which shall be declared (using the notation component) if it is not CTM.
This part of ISO/IEC13250 uses the following identifiers for Topic Maps syntaxes:
A conforming CTM processor shall support the above mentioned syntaxes. For any other syntax a CTM processor may flag an error.
During deserialization the mergemap component causes the referenced topic map to be immediately deserialized into a data model instance. The new data model instance (B) is then merged into the current one (A) by
Adding all topic items in B's [topics] property to A's [topics] property.
Adding all association items in B's [associations] property to A's [associations] property.
Adding topics and associations to A may trigger further merges, as described in [TMDM].
3.13.4 Template Import Directive
The template-import directive is used to make templates from another CTM document usable in the current document.
|||template-import||→||'%from' doc-reference 'import' ('*' | template-names) ( 'as' identifier )?|
|||template-names||→||template-name (',' template-name)*|
A template-import causes the import of the specified templates. (If the wildcard (*) is used, all templates are imported).
If identifier is not specified, the templates are imported into the local namespace and can be invoked as if they had been specified in the current CTM document.
If identifier is specified, a prefix is created, after which the templates can be invoked with the QName notation (identifier:template-name). If the prefix is already in use, an error is flagged.
This directive imports the template definitions only. Topics and associations are not imported.
%from http://example.org/my-template-lib.ctm import member_of, plays mccartney member_of The_Beatles plays piano # The same as above, using the prefix %from http://example.org/my-template-lib.ctm import * as my-lib mccartney my-lib:member_of The_Beatles my-lib:plays piano
A Syntax (informative)
ISO/IEC 13250:2003, Topic Maps, 2003, http://www.y12.doe.gov/sgml/sc34/document/0322_files/iso13250-2nd-ed-v2.pdf
ISO 13250-4, Topic Maps — Canonicalization, http://www.isotopicmaps.org/sam/cxtm/