ISO/IEC JTC 1/SC 34N0934

ISO/IEC logo

ISO/IEC JTC 1/SC 34

Information Technology --
Document Description and Processing Languages

TITLE: Revised 13250-6 Topic Maps -- Compact Syntax -- Issues
SOURCE: Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh
PROJECT: WD 13250-6: Information technology - Topic Maps - Compact syntax
PROJECT EDITOR: Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh; Mr. Steve Pepper
STATUS: Draft
ACTION: Review
DATE: 2007-11-16
DISTRIBUTION: SC34 and Liaisons
REPLY TO:

Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
Network: [email protected]
http://www.y12.doe.gov/sgml/sc34/
ftp://ftp.y12.doe.gov/pub/sgml/sc34/

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: [email protected]
http://www.jtc1sc34.org



Revised CTM - Issues for the ISO meeting Kyoto 2007

Item identifier syntax

For item identifiers a new marker (') was introduced.

Action: Needs approval from WG and TMQL-editors. Related TMQL issue: tmql-item-references

Multiline comments

Multiline comments were introduced (: comment :).

Action: Needs approval from WG and TMQL-editors.

Reifier syntax

CTM uses a tilde (~) to introduce the reifying topic. TMQL uses a slightly different syntax (<~) and uses the tilde as axis shortcut.

Action: Is the difference acceptable?

Template imports

CTM provides a mechanism to import templates. In Montreal was a request to remove this feature and to unify the functionality with the "include" directive. The editors were unable to follow that request because this unification led to problems with "prefix" declarations.

Action: Unification required or is the determination acceptable?

undef literal

CTM introduced the undef literal.

Action: Decide if undef should be part of CTM. If it belongs to CTM, the definition / semantics of undef must be provided from TMQL. Related TMQL issue: tmql-undef-vs-null

Angle brackets around IRIs

We introduced an "angle brackets" syntax for IRIs.

Action: TMQL should reintroduce IRIs which are embedable into angle brackets. Related TMQL issue: tmql-iri-ambiguity

Meaningful Whitespaces

CTM provides two possibilities to delimit topics: The '.' and an empty line. The latter makes the following topic declarations impossible:

    neil-young
    
    - "Neil Young"  # The name is detached from the topic "neil-young"


    created         # A CTM processor would assume a topic here, because of the following empty line
    
    (album: harvest-moon, creator: neil-young)   # Oh oh, the user wanted an association

Action: Do we want to keep the meaningful whitespaces or do we want to keep the '.' as topic end-delimiter only? Is the empty line a theoretical problem or does it exist in real-life? (The meaningful whitespace forces the user to keep the topic declaration (names, occurrences) close together).

Unicode escape sequences

The Unicode escape sequences are limited to string literals

Action: Would the unicode escape sequences make sense for topic identifiers, too? This would require some kind of preprocessing: First all unicode escape sequences are replaced by their codepoints. If a codepoint conflicts with the grammar, an error is issued. Only characters which are allowed at that grammar rule are allowed.

QNames vs. IRIs

CTM and TMQL may have a problem to decide if something is meant as IRI or as QName.

Given, the parser detects foo:bar: Is that a QName or an IRI? According to RFC 3987 the parser may interpret it as IRI and not as QName.

Action:

  1. We enforce that foo:bar is interpreted as IRI unless foo was previously defined as prefix.

    Problem: The parser would never detect undeclared prefixes, since it assumes an IRI.

  2. We enforce that either the IRI notation or the QName notation requires delimiters.

  3. We enforce that CTM (and TMQL) parsers are aware of official IRI schemes. If a parser detects foo:bar and foo is not an official IRI scheme, it assumes an QName. If the prefix foo was not declared, an error is reported.

    Problem: Parser must be aware of schemes. At least it must be aware of those schemes which are "rootless" (they do not have a slash after the colon (15 schemes, currently)).