ISO/IEC JTC 1/SC34 N0347

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

TITLE: Structuring Scope Document
SOURCE: M. de Graauw
PROJECT:
PROJECT EDITORS:
STATUS:
ACTION: For information and discussion in Baltimore, MD.
DATE: 22 November 2002
SUMMARY:
DISTRIBUTION: SC34 and Liaisons
REFER TO:
SUPERCEDES:
REPLY TO: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:[email protected]
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele Desautels, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642-4937
Fax: +1 212 840-2298
E-mail: [email protected]

 

Structuring Scope

Author Marc de Graauw
Version 2
Date November 20, 2002

Copyright Marc de Graauw 2002. The right is hereby given to all to reproduce and distribute this work in its entirety as long as the authorship of Marc de Graauw is recognized and this copyright notice is included.

Introduction

There are three important aspects to scope:

  1. The interpretation of scope (or the semantics of scope).
  2. The triggering of merging based on scope in the Topic Naming Constraint.
  3. The effect of scope on the merging of topic characteristics when two topics are to be merged.

In this piece I will study aspects 1 and 3, and only fleetingly pay attention to 2. It is not important for the examples what causes the merge to occur. Just assume there is a good reason for the merge to occur (like: you telling the TM engine to merge them).

First I will delve into scope and merging behaviour. This is the actual use case: what behaviour would I want to have. Next I will survey some possible ways of structuring scope and look at the consequences for the use case. The first attempt seems to fail: I have included it since the reasons why it fails are noteworthy. I do not see this as a concrete proposal for some structured scope scenario which I support, but rather as an exploration which should be viewed in conjunction with other proposals for structuring scope.

This paper was not conceived out of the blue. Steve Pepper and Geir Ove Grønmo, Bernard Vatant and Kal Ahmed all have explored structured scope. If one notes any similarities between my proposal(s) and theirs, one can be pretty sure I've stolen them. See below for a very short discussion of their proposals.

Thanks to Lars Marius Garshol for commenting on an earlier draft.

Notation

I will use an ultra-shorthand to denote topics, characteristics, scope and merging.

  • Uppercase letters denote topics: T, U, VERYNICETOPIC
  • Lowercase letters denote topic characteristics: Ta, Tb, Tsomename
  • Brackets denote scope: Ta{X, Y}
  • Comma's make up a list of topics
  • Merging is indicated with => : Ta{X}b{Y}, Ta{X}c{Z} => Ta{X}b{Y}c{Z}

Merging and the elimination of redundant characteristics

The reason why scope and merging are important is the elimination of redundant characteristics when merging. I want to be able to have this merge occur:

Example 1: Merging should remove unnecessary scoped topic characteristics
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} => Vfrance{EN,FR}frankrijk{NL}frankreich{DE}

Topic T says 'france' is the English name for the topic and 'frankrijk' the Dutch name, topic U 'france' is the English and French name and 'frankreich' the German one, (the merged) topic V says it all. (Of course for country names one would prefer to use the country PSI's, but as an example this will do fine.) Note that this merge does not occur based on the TNC since the topics do not have identical scope sets on 'france'. I do not want this merge:

Example 2: Current merging leaves redundant topic characteristics
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} => Vfrance{EN}france{EN,FR}frankrijk{NL}frankreich{DE}

This says 'france' is the English name for this topic twice. Merging lots of topic maps with lots of shared characteristics will create an unentangable mess. This kind of merging behaviour is wanted when the scope is to be interpreted as: a topic characteristic is valid when ANY of the topics in the scope apply. In general:

Merging Rule 1: ANY-scopes on the same characteristic merge to the union of the originating scopes
Ta{X, Y}, Ua{X} => Va{X, Y}

Since a is valid when ANY of {X, Y} apply, Ta{X, Y} already expresses Ta{X}.

When scope is interpreted as: a topic characteristic is valid when ALL of the topics in the scope apply, this merge is not wanted. Instead we would want:

Merging Rule 2: ALL-scopes on the same characteristic merge to the smaller scope when one is a subset of the other
Ta{X, Y}, Ua{X} => Va{X}

Since a is valid when ALL of {X} apply, it is already implied that a is valid when ALL of {X, Y} apply. An example:

Example 3: Merging should remove unnecessary scoping topics
Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL} => Vmass{EN}massa{NL}

T says 'massa' is the Dutch translation of 'mass' in the realm of physics. U says 'massa' is always the Dutch equivalent of 'mass'. Assuming this is true, the extra condition that the realm is physics is no longer necessary and removed after the merge. It might be tempting to mirror the behaviour of ANY-scopes and assume ALL-scopes merge to the intersection (I did so at first drafting this paper).

Example 4: Merging ALL-scopes to the intersection of both is wrong
Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL,DIETING} => Vmass{EN}massa{NL}

If massa is a name in Dutch of a subject in physics and of a subject in dieting, it is wrong to assume it is the name in Dutch of this subject in general. The correct merging would be:

Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL,DIETING} =>
Vmass{EN}massa{NL,PHYSICS}massa{NL,DIETING}

There is another relation between scope and merging which I shall call EXACT-scope. See Example 2 above. This is the kind of merging ISO13250:2000, XTM 1.0 and SAM support. Merging of topic characteristics only takes place when two scope sets match exactly:

Merging Rule 3: EXACT-scopes only merge characteristics with equivalent scope sets
Ta{X, Y}, Ua{X} => Va{X, Y}a{X}

And since scopes are sets:

Ta{X, Y}, Va{Y, X} => U{X, Y}

Frankly I find it hard to find a good use case for this merging behaviour. It seems to me merging as shown for AND-scopes or merging as shown for ANY-scopes would be preferable in most circumstances. It also seems to me at odds with the interpretation of scope. ISO 13250:2000 views scopes as ANY-scopes ("a given scope is the union of the subjects of the set of themes used to specify that scope", "If it is desired to specify a scope which is the intersection (rather than the union) of two topics, this can be accomplished by creating a topic whose subject is that intersection, and then by using that topic as a theme.") and SAM interprets scopes as AND-scopes ("Formally, a scope is composed of a set of subjects that together define the context. That is, the topic characteristic is known to be valid only in contexts where all the subjects in the scope apply. "). XTM does not provide an interpretation for scope. Yet the merging mechanisms for topic characteristics merge as would be expected for EXACT-scopes.

So both ANY-scopes and AND-scopes have their own proper uses. Now how could ANY-scopes and AND-scopes be expressed?

The Simple Theory of Types

The basic idea here is to retain the scope element in XTM, and to add types to the scoping topics to add extra information about the kind of scope and the kind of merging which is wanted. Ditto for HyTM and SAM, but I will use XTM in this writing.

Interpretation Rules:

  1. If scoping topics in a scope have no type, interpret as "ANY"
  2. If scoping topics in a scope have one shared type, interpret as "ANY"
  3. If scoping topics in a scope have only distinct types, interpret as "ALL"
  4. A scope set consisting of a single scoping topic can be interpreted as "ANY" or "ALL"

1. ensures maximum backward compatibility with the ISO-interpretation of scope. However, 1. could read "interpret as ALL" and be in line with the current SAM proposal. At first sight this neatly yields the desired merging behaviours.

Example 5: Scoping topics with one shared type are ANY-scopes
EN, NL, FR and DE are topics of type LANGUAGE.

Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} => Vfrance{EN,FR}frankrijk{NL}frankreich{DE}

Since the topics {EN,FR} have one shared type they should be interpreted as "ANY" under Interpretation Rule 2, {EN} may be interpreted as "ANY" under Interpretation Rule 4 and merging occurs because of Merging Rule 1 as desired.


Example 6: Scoping topics with only distinct types are ALL-scopes
EN, NL  are topics of type LANGUAGE. PHYSICS is a topic of type DISCIPLINE.

Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL} => Vmass{EN}massa{NL}

Since the topics {NL,PHYSICS} have only distinct types they should be interpreted as "ALL" under Interpretation Rule 3, {EN} may be interpreted as "ALL" under Interpretation Rule 4 and merging occurs because of Merging Rule 2 as desired.

Unfortunately things get messier when we add more types.

Example 7: Simple Theory of Types with multiple types: a mess?
P has types A and B
Q has types B and C
R has types C and D
S has types D and A

The following scopes are ANY-scopes since there is a shared type:

Ta{P, Q}
Ta{Q, R}
Ta{R, S}
Ta{S, P}

The following scopes are ALL-scopes since there is no shared type:

Ta{P, R}
Ta{Q, S}

Now how could this be interpreted?

Ta{P, Q, R, S}

Since there is no shared type, this should be "ALL", but is this still intuitive?

So it seems the Simple Theory of Types holds well when scoping topics only have a single type, but breaks down in multiple inheritance contexts. Now we could limit this problem if we only looked at types which contain some PSI. Then of course there could be a processing requirement which states scoping topics may only have a single type with this PSI. But still, it does not look like the most elegant of solutions.

The Good Ol' ISO Way

ISO 13250:2000 says in Note 5: "If it is desired to specify a scope which is the intersection (rather than the union) of two topics, this can be accomplished by creating a topic whose subject is that intersection, and then by using that topic as a theme." Intersection and union are misnomers, but when we read this as: "If it is desired to specify a scope which is an ALL-scope (rather than an ANY-scope), this can be accomplished by creating a topic whose subject is that ALL-scope, and then by using that topic as a theme." If we follow this approach, the question boils down to how a processor could recognize a scoping topic to represent an ALL-scope and how to recognize the topics in the ALL-scope. This could be done using an association whose type is "ALL-scope" and whose members are the scoping topics comprising the ALL-scope. The association could be reified and then used as a scoping topic in another scope (which would be an ANY-scope).

Interpretation rules:

  1. Interpret a scope consisting of multiple scoping topics as "ANY"
  2. If a scoping topic reifies an association of type "ALL-scope", interpret the members of that association as "ALL"
  3. A single scoping topic may be interpreted as "ANY" or "ALL"

Each individual scoping topic constitutes a complete context of validity in itself.

Example 8: Scoping topics are ANY-scopes
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} => Vfrance{EN,FR}frankrijk{NL}frankreich{DE}

Since the topics {EN,FR} should be interpreted as "ANY" under Interpretation Rule 1, and {EN} should be interpreted as "ANY" too, merging occurs because of Merging Rule 1 as desired.

Syntactically ANY-scopes would look just like scope looks now:

Syntax: ANY-scopes
<!-- Scoping topics EN and FR omitted -->
<topic id="france">
  <baseName>
    <scope>
      <topicRef xlink:href="#EN"/>
      <topicRef xlink:href="#FR"/>
    </scope>
    <baseNameString>France</baseNameString>
  </baseName>
</topic>

This means merging of topic characteristics would have to be different than currently specified in XTM 1.0 or the SAM. Since this is not desirable in general, it could be achieved by indicating on the Topic Map that it wants its topics to be subjected to Merging Rule 1 (though I am of the opinion that current merging, as described in Merging Rule 3 is counterintuitive in most (or all) contexts).

Example 9: ALL-scopes as associations
The reified association of type ALL-scope is indicated as ALL{NL,PHYSICS}

Tmass{EN}massa{ALL{NL,PHYSICS}}, Umass{EN}massa{NL} => Vmass{EN}massa{NL}

Since the topics {NL,PHYSICS} should be interpreted as "ALL" under Interpretation Rule 2, and {NL} may be interpreted as "ALL" too under Interpretation Rule 3, merging occurs because of Merging Rule 2 as desired.


Syntactically ALL-scopes could look like this:

Syntax: ALL-scopes
<topic id="mass">
  <baseName>
    <scope>
      <!--- Use an (reified) association as scoping topic -->
      <subjectIndicatorRef xlink:href="#assoc"/>
    </scope>
    <baseNameString>Massa</baseNameString>
  </baseName>
</topic>

<!-- Scoping topics NL and PHYSICS omitted -->
<!-- The association used in scope above -->
<association id="assoc">
  <instanceOf>
    <!-- Association is of type ALL-scope -->
    <subjectIndicatorRef xlink:href="http://www.example.org/xtm/1.0/scope.xtm#all-scope"/>
  </instanceOf>
  <member>
    <topicRef xlink:href="#NL"/>
  </member>
  <member>
    <topicRef xlink:href="#PHYSICS"/>
  </member>
</association>

More examples:

Example 10: Interpretation of more complex uses of ALL-scopes
Ta{ALL{P, Q}, ALL{R, S}}

Ta is valid when both P and Q apply, or when both R and S apply ( (P & Q) V (R & S) in propositional logic).

The reverse can be expressed too:
Ta is valid only when P or Q applies and R or S applies ( (P V Q) & (R V S) ):

Ta{ALL{P, R}, ALL{P, S}, ALL{Q, R}, ALL{Q, S}}

Of course as the complexity of the examples increases, the ease of finding credible use cases diminishes.

Apparently the Good Ol' Iso Way is quite expressive. One cannot express such a thing as: this characteristic is only valid when X does not apply. Possibly a NOT-scope could be added. Possible use cases are: do not show this when the level of the user is 'beginner'. A set of processing rules could be:

Rule Example
Merge ANY-scopes to their union Ta{X,Y}, Ua{X,Z} => Va{X,Y,Z}
Ta{ALL{X,Y}}, Ua{Z} => Va{{ALL{X,Y},Z}
Eliminate equal ALL-scopes Ta{ALL{X,Y}, ALL{Y,X}} => Ta{ALL{X,Y}}
Eliminate redundant ALL-scopes Ta{ALL{X,Y}},ALL{X}} => Ta{ALL{X}}
Reduce single-topic ALL-scopes to ANY-scopes Ta{ALL{X},Y} => Ta{X,Y}

Other Structured Scope Approaches

Steve Pepper & Geir Ove Grønmo

In Towards a general theory of scope Steve and Geir discuss scope and conclude a more structured scope might be needed. They - briefly - explore some variants:

  1. "Principal" and "incidental" themes: distinguish between scope for disambiguation "Hamlet the character" versus "Hamlet the play". This is a good idea, but not relevant to this paper.
  2. Use "axes of scope": distinguish between different categories of scoping topics, such as natural language, location, publisher, and use that as a starting point for making more intuitive user interfaces. This approach superficially bears some resemblance to the Simple Theory of Types sketched above. However, Steve and Geir merely signal the need to study structured scope and do not go very far into making an actual proposal (having covered a lot of ground in their paper before they reach axes of scope).

Steve and Geir also explore 'context', which is a set of topics a user can use to filter or rank a Topic Map. Context is the counterpart of scope and rules could be established how certain kinds of contexts operate on certain kinds of scope.

Bernard Vatant

Bernard Vatant's submitted some proposals to the SC34WG3 mailing list summer 2002: 1, 2, and 3. His approach starts from distinguishing the roles scoping topics play within a scope, i.e. language, region, time. Scoping topics with the same role will be ANY-scopes, the sets of scoping topics with the same role will be ALL-scopes. Bernard's example:

Example 11: Bernard Vatant: Using roles on scopes
Ta{france, navarre, 1589-1610}

Here france and navarre both play a 'region' role, 1589-1610 plays a 'time' role. The interpretation is: Ta is valid when (ANY of 'france', 'navarre' applies) AND (ANY of '1589-1610' applies).

Bernard's approach has similarities to the Simple Theory of Types, but will not suffer from the same problems since it uses roles instead of types (the Simple Theory of Types actually was an investigation to see whether Bernard's approach would work with types instead of roles). Nevertheless, this approach is less expressive than the Good 'Ol ISO Way sketched above since it wouldn't allow ANY-scopes on scoping topics with different roles and wouldn't allow ALL-scopes on scoping topics with the same role. (The latter does not make sense if the scoping topics having the same role are mutually exclusive.) The use of roles for scoping topics is quite natural.

Kal Ahmed

Kal has written a fairly complete proposal. Kal also distinguishes user context from scope and explores how to process an Topic Map with scoped characteristics in a certain user context. Kal also uses roles to further qualify scoping topics. The Good 'Ol ISO Way resembles Kal's approach a lot - no coincidence, since I took a good look at Kal's work before I sat down. They resemble each other so much in fact that when I commented on some aspects of his syntax, he scribbled a variant of his syntax proposal which basically is the Good 'Ol ISO Way with roles added.

Conclusion

Both the approaches I have sketched show a way to pursue a more structured scope. Both do so while maintaining near 100% backward compatibility: the way scope is now used is retained, and only an interpretation is provided for special cases: when scoping topics have a type of a certain kind, and when scoping topics are the reification of some kind of association. The Simple Theory of Types seems to fail. In more complex contexts it is not intuitive, and fixing it is not elegant. The Good Ol' ISO Way is promising. Then there are Bernard's and Kal's approaches, which to me seem at least as promising. The introduction of 'user context' I believe belongs firmly in the application domain, not in the standards (which doesn't mean exploring this notion is not useful). Some believe structured scope and the interpretation of scope belong entirely in the application domain. I would prefer the meaning of scope to be something which the standards describe, and I would draw the line standard/application between interpretation of scope/user context.

What next? It seems to me this all should not go into the SAM. It could however be used as input for the TMCL effort. It would be nice if TMCL supported implementing behaviour as described above. If have already stated reasons why the EXACT-scope behaviour of the current specs is unsatisfying. More specific, it seems at odds with an interpretation of 'normal' scope as "ANY" or "ALL". Most important, this paper hopefully drafts a use case for further pursuing structured scope and may be a building block in such an effort.