ISO/IEC JTC 1/SC 34N0934
ISO/IEC JTC 1/SC 34
Information Technology --
Document Description and Processing Languages
|TITLE:||Revised 13250-6 Topic Maps -- Compact Syntax -- Issues|
|SOURCE:||Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh|
|PROJECT:||WD 13250-6: Information technology - Topic Maps - Compact syntax|
|PROJECT EDITOR:||Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh; Mr. Steve Pepper|
|DISTRIBUTION:||SC34 and Liaisons|
Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: [email protected]
Revised CTM - Issues for the ISO meeting Kyoto 2007
Item identifier syntax
For item identifiers a new marker (
') was introduced.
Action: Needs approval from WG and TMQL-editors. Related TMQL issue: tmql-item-references
Multiline comments were introduced
(: comment :).
Action: Needs approval from WG and TMQL-editors.
CTM uses a tilde (
~) to introduce the reifying topic. TMQL uses a slightly different
<~) and uses the tilde as axis shortcut.
Action: Is the difference acceptable?
CTM provides a mechanism to import templates. In Montreal was a request to remove this feature and to unify the functionality with the "include" directive. The editors were unable to follow that request because this unification led to problems with "prefix" declarations.
Action: Unification required or is the determination acceptable?
CTM introduced the
Action: Decide if
undef should be part of CTM. If it
belongs to CTM, the definition / semantics of
undef must be provided from TMQL.
Related TMQL issue: tmql-undef-vs-null
Angle brackets around IRIs
We introduced an "angle brackets" syntax for IRIs.
Action: TMQL should reintroduce IRIs which are embedable into angle brackets. Related TMQL issue: tmql-iri-ambiguity
CTM provides two possibilities to delimit topics: The '.' and an empty line. The latter makes the following topic declarations impossible:
neil-young - "Neil Young" # The name is detached from the topic "neil-young" created # A CTM processor would assume a topic here, because of the following empty line (album: harvest-moon, creator: neil-young) # Oh oh, the user wanted an association
Action: Do we want to keep the meaningful whitespaces or do we want to keep the '.' as topic end-delimiter only? Is the empty line a theoretical problem or does it exist in real-life? (The meaningful whitespace forces the user to keep the topic declaration (names, occurrences) close together).
Unicode escape sequences
The Unicode escape sequences are limited to string literals
Action: Would the unicode escape sequences make sense for topic identifiers, too? This would require some kind of preprocessing: First all unicode escape sequences are replaced by their codepoints. If a codepoint conflicts with the grammar, an error is issued. Only characters which are allowed at that grammar rule are allowed.
QNames vs. IRIs
CTM and TMQL may have a problem to decide if something is meant as IRI or as QName.
Given, the parser detects
foo:bar: Is that a QName or an IRI? According to RFC 3987 the
parser may interpret it as IRI and not as QName.
We enforce that
foo:baris interpreted as IRI unless
foowas previously defined as prefix.
Problem: The parser would never detect undeclared prefixes, since it assumes an IRI.
We enforce that either the IRI notation or the QName notation requires delimiters.
We enforce that CTM (and TMQL) parsers are aware of official IRI schemes. If a parser detects
foois not an official IRI scheme, it assumes an QName. If the prefix
foowas not declared, an error is reported.
Problem: Parser must be aware of schemes. At least it must be aware of those schemes which are "rootless" (they do not have a slash after the colon (15 schemes, currently)).