ISO/IEC JTC 1/SC34 N0403, 2003-04-03

ISO/IEC JTC 1/SC34

Information Technology —

Document Description and Processing Languages

Title:	AsTMa! Language Definition
Source:	Robert Barta, Bond University
Project:	ISO 18048
Project editor:	Lars Marius Garshol, H. Holger Rath
Status:	Individual contribution
Action:	For review and comment
Date:	2003-04-11
Summary:	Specification of the AsTMa! topic map constraint language
Distribution:	SC34 and Liaisons
Refer to:
Supercedes:
Reply to:	Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:[email protected] http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Mrs. Sara Desautels, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 25 West 43rd Street New York, NY 10036 Tel: +1 212 642-4937 Fax: +1 212 840-2298 E-mail: [email protected]

AsTMa! Language Definition

Robert Barta Bond University

[email protected]

Constraining Topic Map (TM) instances is a necessary component of TM engineering. To make these constraints explicit we propose a Topic Map Constraint Language, AsTMa!. As such, AsTMa! is a part of the AsTMa language family which was designed to facilitate authoring, but also constraining and querying topic maps. This document provides a semi-formal definition of the language. Note that this is still an evolving project and that this document is a current snapshot only.

This document has no formal status. It is a technical report of Bond University.

Draft v0.6, 2003-04-07

1 Introduction
2 Conventions
3 Validation Relation
4 Maplets and Maplet Denotations
5 Unconditional Constraints
6 Maplet Patterns
7 Quantified Constraints
8 Constraints
9 Functions
10 Syntax Extensions
11 Moderators
12 Summary
13 Acknowledgements
A Appendix: Language Skins (Non-normative)
B Appendix: Predefined Functions (Normative)
C Appendix: Syntax (Normative)
D Appendix: External Functions (Non-normative)
E Appendix: Directives (Normative)
F Appendix: References (Non-normative)

1 Introduction

The objective of any TMCL (Topic Map Constraint Language) is to allow to formally specify rules for Topic Map documents.

In some respect a TMCL has a similar purpose as schema languages for relational databases and for XML applications. In all cases the constraint is supposed to characterize the possible values of the data (or the document). As Topic Map data may not only reside in documents containing Topic Map notated text but may also be generated from other data sources, we abstract away from documents and only regard Topic Map instances.

Constraints behave like type definitions: Given a constraint and a particular instance of a topic map it must be well-defined and computable in a reasonable time whether the instance validates against the constraint, or not. In the case of Topic Maps a constraint language must allow to specify

· a vocabulary, i.e. a set of topics,

· their respective types and relationships between them, and

· patterns which must, may or must not occur within a given topic map.

For our purposes we collectively use the term ontology to encompass all the above aspects of a particular knowledge domain. Accordingly, AsTMa! can be used as an ontology specification language.

As AsTMa! is making use of the authoring language AsTMa= it is mostly line oriented. It extends the authoring language by introducing variables which can be bound to text patterns similar to regular expressions and also variables which bind whole structures like in Prolog. We also peruse concepts of other languages e.g. the quantifiers 'forall' and 'exists' from logic-oriented languages and path expressions from XPath to add expressitivity or simplify the notation.

Following the general theme of the AsTMa* language family, AsTMa! was designed not so much as an orthogonal language, but more bias toward human authoring.

FIXME: Maybe make design goals more explicit?

2 Conventions

FIXME: use of regexp

FIXME: use of grammars (EBNF)

FIXME: use of relational algebra

3 Validation Relation

Given a particular map M, one may now ask whether M conforms to an ontology specified by a constraint O. Formally, we write O ||- M if M validates against the ontology O.

In practical settings, though, we will have to validate a map against several ontologies. Examples are situations where a customer has an ontology different from that of a vendor, or, when a company-wide ontology is refined and extended by a branch ontology. In these cases, topic map instances will have to comply with several such constraints. Technically this means that we will have to find ways to merge, i.e. combine constraints by logical operators having the following properties:

O₁ ^ O₂ ||- M <=> O₁ ||- M ^ O₂ ||- M

and, because of symmetry

O₁ v O₂ ||- M <=> O₁ ||- M v O₂ ||- M

With these requirements set, in the following we will specify the semantics of AsTMa!. For every map and for every AsTMa! expression this validation relation will be defined and can so be checked automatically.

We semi-formally define the relation ||- bottom-up. We start formalizing atomic parts of a map, maplets, and extend these to patterns which can be matched against topic map instances. Finally, we add quantifiers and boolean operators to allow the construction of rules.

4 Maplets and Maplet Denotations

To formally capture when exactly a topic map instance conforms to a constraint, we introduce maplets (Fig.1). A maplet is a part of a map, consisting of a single association and all topic characteristics of the topics mentioned in that association. This includes the association type, the association scope and all players and roles in the members.

Fig.1 Maplet connecting Topics

To also allow a single topic to be a maplet, we introduce a trivial association of type sum-ergo-sum and assume that any topic is a player in it:

(sum-ergo-sum)

topic: any-topic-here

4.1 Topic Characteristics

The topic characteristics are represented by a tuple (type, scope, value).

The type is indicated by a URI and specifies whether the characteristics is a baseName, an occurrence with a topic reference or inline data, etc. This is done using the predefined PSIs defined in [XTM].

The scope is another URI indicating the one topic which scopes the characteristic (We only use one scoping topic) in AsTMa*. For the unconstrained scope we will use the PSI defined in [XTM].

The value is a string comprising the value of the characteristics.

4.2 Maplet Merging

We will use maplets as atomic constructs. Obviously, any map can be constructed from these maplets or, conversely, can be fragmented into maplets.

Merging Maplet to build a Map

As operator + to combine maplets we simply use the merge operator as defined in the XTM standard[XTM00]. Consequently, any map M can be broken down into maplets m₁, ...,m_N:

M = + ^N_n=1 m_n

We call the set of maplets G_M = {m₁, ..., m_n} for a particular map M the generator of M. M is a submap of M', M << M', if G_M << G_M'. Obviously, the set of all submaps of M, 2^M, is a lattice.

5 Unconditional Constraints

We distinguish between maplets and maplet denotations which are text fragments in AsTMa= whereas maplets themselves are an abstraction of a representation an TM processor would use after having processed the maplet denotation (this is comparable to SAM[SAM03]). Thus, a maplet may be represented by a number of different maplet denotations.

We formalize the set of all maplet denotations as M₌.

Every maplet denotation is an unconditional constraint. Unconditional constraints only help to define the vocabulary aspect of an ontology. Per-se they do not impose any restrictions on a topic map structure and they only set the stage for more complex constraints. Nonwithstanding, an unconditional constraint is a constraint.

Any given map M satisfies any given unconditional constraint c₌ is element of M₌:

c₌ ||- M

It is straightforward to define a merge operator between unconditional constraints (we will later extend this to general constraints). If c₁ and c₂ are unconditional constraints, then c₁ + c₂ is also an unconditional constraint defined via

c₁ + c₂ ||- M <=> c₁ ||- M ^ c₂ ||- M

Again, as any map satisfies any unconditional constraint, that map also satisfies any merge of unconditional constraints. This also implies that any AsTMa= denotation A is an AsTMa! constraint, although all maps M will trivially conform to it:

A ||- M

6 Maplet Patterns

Maplet patterns themselves are no constraints but we will need them to build compound constraints later. Basically, maplet patterns are maplet denotations in AsTMa= with some of the text being replaced by variables and regular expressions.

Example:

$t (tutorial)

bn: /tutorial/i

oc: m|http://|

Here we simply reuse the matching semantics of regular expressions to define whether a particular maplet matches a maplet pattern. The maplet denotated by

astma (tutorial)

bn: AsTMa Tutorial

oc: http://astma.it.bond.edu.au/

matches the maplet pattern from above, whereas the maplet denotated by

astma (tutorial)

bn: AsTMa Tutorial

oc: no URL here

does not.

Before we define more expressive pattern matching we need variables which can be bound to values during the pattern matching process described below. After that we can define patterns for an open and closed expectation we could have on matching maplets.

FIXME unification-not: Maybe make a note that his is only about matching, not unification or reasoning. Unification would be necessary whenever two ontologies are going to be compared.

6.1 Variables

AsTMa! distinguishes between three different kinds of variables:

Reference Variables

Reference variables can only have topic identifiers as values and can match against topic identifiers.

Regular Expression Variables

These variables always appear as part of an Extended Regular Expression. They can only contain strings and can be matched against strings

Structural Variables

These variables are bound to whole submaps.

Since it is clear from the context what kind of variable is meant, there is no need to distinguish between them syntactically.

All variables are prefixed with the character $ or @. The name itself must follow the pattern \w[[:alnum:]-_]+. While variables of a different kind inhabit different namespaces, it is erroneous, though, to have within one scope two or more variables of different kinds but with the same name. This is mainly thought as a safety guard against casual errors.

6.1.1 Variable Scope

The scope of a variable is determined by the syntactic form of a constraint and will be defined later. Variables are implicitely declare by simply using them. In it's scope any further occurrence of a variable implies that all these occurrences must be bound to the same value during one match.

A notable deviation from the usual rules is that variables with different names must be bound to different values. This somewhat unorthodox interpretation helps to keep constraints concise and can be overridden if necessary by using the =~ operator, as in $a =~ $d.

6.1.2 Variable binding during Matching

During the matching process patterns will be matched against maplets. In this process variables within the patterns will be assigned values from the maplets (variable binding).

If - during such a match - an unbound variable is matched against a maplet value, that variable will be bound to that value for this match. This binding will remain the same for a single match; for such a match the variable will not change its value (immutable). Thus the set of variable bindings is characterizing a match.)

If - during a match - an already bound variable will be matched against a maplet value, the value of the variable and that in the maplet will have to be identical. Otherwise the match would not succeed.

6.1.3 Singular and Plural Variables

During the matching process variables are assigned values. In some cases we expect the variable to hold exactly one value. These variables are those which are prefixed with $. In other cases, though, we allow one variable to hold several values which are acquired during the matching process. This is indicated by the prefix @.

Example:

bn: $bn  # $bn will get one value during one match

some-role : @roles  # @roles will get several values during one match

Note that these variables are technically bags and not sets as they may contain a certain value more than once. Thus there is no inherent sequence. It also implies that two or more occurrences of a particular plural variable are compared as bags.

If a singular variable $v is to be matched against a particular constant, then the notation $v =~ value can be used. If $v should be matched against some other $w, then $v =~ $w can be used.

For a plural variable @v the value must be a bag for which we will use the notation @v =~ (value1, value2, ...). To make two bags be matched to the same values, we simply use @v =~ @w.

6.1.4 Reference Variables

Reference variables can appear anywhere in a maplet pattern where a topic reference can appear. During the matching process the variable will be bound to a topic reference (or to several topic references for plural variables).

Example:

$car (ferrari)

In cases where we do not care about the value to be matched we can also use the wildcard *. (Note, though, that this is no wildcard in the sense of regular expressions.)

6.1.5 Regular Expressions Variables

A regular expression (RE) can appear anywhere in a maplet pattern where arbitrary text is allowed, that is in occurrences, basenames and subject indicators. In the pattern matching process a corresponding maplet part of the map will be matched against regular expressions in a maplet pattern.

The regular expressions themselves follow the Perl syntax and are enclosed by the '/' character as in /.*/. As in Perl any other character can be used if the pattern starts with the character 'm' and the following character is the same as the last non-space character on this line.

Example:

bn: m|regexp pattern here|

In the above pattern we have introduced one regular expression using '|' as pattern terminator. This pattern will be matched against the string of a base name characteristic.

The use of Extended REs also allows to capture matched strings into variables. Here we make use of the facility to add a capture name for a text pattern using the notation (?var-name:pattern).

Example:

bn: /(?:b=.*)/

Here we introduced a variable b which will capture any basename string of a matched maplet.

For notational convenience text variables can also written without a pattern in the $ notation if that is simply /^\S.*\S$/ (any string except with no leading or trailing blanks). So, (?var-name:^\S.*\S$)) is equivalent to $var-name.

FIXME explicit-var-match: Is it necessary to add the syntax $v =~ /P/?

If we are not interested in the value itself and just need a placeholder, we can also use the wildcard character *. As now no name is associated with a matched value, such variables do not take part in the pattern matching process.

Example:

bn: *

6.2 Matching Rules

In the course of matching a maplet against a pattern various components of topics (such as baseNames, occurrences, etc.) and associations will have to be matched. This section contains the rules governing this process.

The individual components of a maplet pattern are:

instance-class patterns

With such a pattern an instance-class relationship is constrained, as, for example in $car (ferrari). This also includes instance-class constraints for associations.

(scoped) topic characteristics

Topic characteristics are constrained by using notations like bn: $basename or oc @ S ($type) : /http:/. Association roles are not regarded as characteristics here.

association role

Components like these constrain players and the roles in associations, eg. as in author: $a

For the following we assume that a given maplet should be matched against a component of a given maplet pattern. When a match is successful and how value bindings will occur is covered by this list.

· (topic instance-of:)

If a maplet pattern component is of the form

topic (type)

then a maplet set matches if there is a maplet of type instance-of where topic and type play the roles instance and class, respectively; or there is a maplet of type instance-of where topic plays the role instance and some other topic T plays the class, whereby T is a sub-type of type.

If topic is a variable and it is already bound to a value, the match will only succeed if that value plays an instance role in an instance-of association and the topic type matches like above. If it is not bound, it will be bound to any topic id in the maplet set.

FIXME: Not perfectly clear. And not overly elegant.

If the type is a variable and it is bound to a value there must be an association like outlined above between the type and the topic to make the match succeed. If the variable is not bound it will be bound to any value where such an association exists.

· (association instance-of:) If a maplet pattern component is of the form

(type)

then a maplet set matches if there is a maplet of type type. If type is a variable $v then it depends whether $v is already bound to a value or not. If it is then there must be a maplet in the set of exactly this type to make the match succeed. Otherwise, $v will be bound to a type of any maplet in the set.

FIXME: scoped association to be added.

· (basename constant:) If a maplet pattern component is of the form

bn: some-text

then a maplet matches if there is a topic characterstics of type 'baseName' which exactly matches the given text. All leading and trailing blanks are ignored, so is the scope.

· (basename pattern:)If a maplet pattern component is of the form

bn-or-oc-or-in: (?v=P)

with a variable v and a pattern P, then a maplet matches if a topic characteristic is of type 'baseName', 'occurrenceRef', or 'occurrenceData', respectively, and the according value matches the P. The scope value is ignored. If v is not yet bound to a value, it will be bound to the complete value of the characteristic (not only the part of the string which was matched). If v is already bound to a value, then that value must be identical to that of the base name string. The pattern P is ignored in this case.

· (characteristic scope:) If a maplet pattern component is of the form

bn-or-oc-or-in @ scope : ...

then additional to the rules for that particular topic characteristic type the scope is matched:

o If the scope is *, it will be ignored.

o If the scope is a constant S, then the topic characteristic in the maplet must have this as scope value.

o If the scope is a reference variable $s, then it depends whether $s is already bound to a value. If it is, then this value must be identical with the scope value in the topic characteristic. If not, then $s is bound to that value.

· (characteristic type:) If a maplet pattern component is of the form

bn-or-oc-or-in (type) : ...

then additional to the rules for that particular topic characteristic the type is matched:

o If the type is *, it will be ignored.

o If the type is a constant T, then the topic characteristic in the maplet must either have this as value or any other type which is a direct or indirect sub-type of T.

FIXME: match-global-view: need global view to other maplets here.

FIXME: Need somewhere definition of sub-type.

o If the type provided in the component is a reference variable $t, then it depends whether $t is already bound to a value. If it is, then this value must be identical with the type in the topic characteristic. If not, then $t is bound to that value.

FIXME: define-uris: need to define URIs for all of these types?

· (association role:) If a maplet pattern component is of the form

role : ...

then a maplet set will match if there is a maplet which contains the role given. If the role is not a constant, but a variable, then it depends whether this variable is already bound. If it is ... Otherwise the rule for matching the association player must be applied.

FIXME: this has to be factored out. Really.

· (association player:) If a maplet pattern component is of the form

... : player : ...

then a maplet set will match if there is a maplet which contains the role given. If the role is not a constant, but a variable, then it depends whether this variable is already bound.....

FIXME: this has to be factored out. Really.

Otherwise the rule for matching the association role must be applied.

· (subject indicator:) If a maplet pattern component is of the form

sin : value

then a set of maplets match if there is a topic characteristics of type 'subjectIndicator' and the value matches.

6.3 Open Maplet Patterns

If p is a maplet pattern, then [ p ] is a open maplet pattern. A maplet set S matches [ p ] if it satifies all of the following:

· all topic characteristic components in p can be matched to one and the same set of topic characteristics in one maplet m is element of S.

· all topic identifiers in an topic instance-of component of p match the player which is characterized by exactly the same topic characteristic set.

· All topic identifier in a association instance-of within p match the topic in that map.

FIXME: Need MUCH MUCH MUCH MORE formalism for this!!!

· All association role components in p match.

· All association player components in p match.

The maplet set can contain additional information which has no counterparts in p.

Example: Consider the open maplet pattern

[ * (tutorial)

  bn: /tutorial/i

  oc: /http:/ ]

The maplet denoted via

astma (tutorial)

bn: AsTMa Tutorial

bn @ de : AsTMa Einfuehrung

oc: http://astma.it.bond.edu.au/tutorial1

oc: http://astma.it.bond.edu.au/tutorial2

would match as it has a matching id, the type (instanceOf) matches and all characteristics (basename and occurrence) in the pattern have a matching counterpart in the maplet. Any additional characteristics (the basename with a different scope and the second additional occurrence) has no influence on the matching outcome.

6.4 Closed Maplet Patterns

If p is a maplet pattern, then ] p [ is a closed maplet pattern. A maplet set S matches ] p [ if it satisfies all of matching rules of a open maplet pattern, but all topics are minimal in the sense that they do not contain additional roles and topic characteristics which are explicitely not matched.

FIXME: well could be clearer.

Example: Flipping the [] brackets in the earlier example gives us such an closed maplet pattern:

] * (tutorial)

  bn: /tutorial/i

  oc: /http:/ [

Now the maplet above will not match anymore as the additional basename and the occurrence violate the maplet pattern which allows only one basename and only one occurrence.

7 Quantified Constraints

Quantified constraints build the core of the AsTMa! language. They finally allow to define rules which a conforming map must, may, or must not follow. To cater for that we have to introduce quantifiers and with them structural variables.

In contrast to regular expression variables and reference variables, structural variables are bound to whole submaps (maplet sets) when these are successfully matched against a maplet pattern. Otherwise the same rules for variables apply for them.

7.1 Syntax

Syntactically, AsTMa! follows the line-oriented paradigm and introduces also indentation to structure expressions syntactically:

· A constraint is not interrupted by an empty line or a line containing a comment.

· Constraints on the same logical level must be indented to the same column. Inconsistent indentation must be flagged by an AsTMa! parser.

Examples of valid constraints are:

# Example 1

exist $t [ * (person) ]

# Example 2, means the same, but with confusing line breaks

exists

    $t [

* (person)

# Example 3

forall [ $t (person) ]

   => exists [ (is-employed-at)

               employee : $t ]   # indentation not enforced

# Example 4

forall [ $t (person) ]

   => exists [ (is-employed-at)

               employee : $t ]   # indentation not enforced

      and                        # indentation important

      exists [ (is-married-with)

               partner : $t ]    # indentation not enforced

Examples of invalid indentations are:

# Example 5

forall [ $t (person) ]

   => exists [ (is-employed-at)

               employee : $t ]

   and                           # indentation invalid

    exists [ (is-married-with)   # indentation invalid

               partner : $t ]

7.2 Existence Quantified Constraints

If p is an open or closed maplet pattern and $v is a structural variable, then exists $v p is a (quantified) constraint.

If a variable is not needed as it is not necessary to refer to the matched submap in other parts of the constraint, then the variable can also be omitted as in exists p. In this case an internally generated variable will be used.

If c is such a constraint, then var(c) is the set of variables mentioned in that constraint. The scope of these variables starts with their first appearance and reaches until the end of the maplet pattern.

Informally, a map M conforms to th constraint exists $v p, if there is at least one submap m << M matching p.

More formally, we define a function b which computes all variable assignments for the successful matches. In this process only the smallest submaps of M are regarded which can satisfy p.

If var(c) is the set of variables involved and we introduce an arbitrary order on these variables (e.g. using alphanumeric sorting), then every single binding is an ordered tuple with one value for each variable. For $v the matched submap will be used. Technically, b (exists $v p, M) returns thus a relation. In the following this allows us to base the semantics on relational algebra.

7.3 Conditional Quantified Constraints

Some constraints only should apply to specific submaps of the map under consideration. To specify such constraints we have to first select these submaps and check then the rest of the constraint with each of them.

For variables this means that all variables mentioned in the condition are visible also in the rest of the constraint. This defines the scope of variables.

Quantified constraints can also be combined using the usual boolean operators AND, OR and NOT although some restrictions apply to reduce the expressiveness of the language.

Example: The following is a valid quantified constraint:

forall $t [ $p (person) ]

   => exists $t [ bn @ unconstrained-scope: * ]

and

      exists [ (is-employed-by)

               employee : $p ]

whereas the following is not:

forall $t [ $p (person) ]

   => not forall $t [ bn @ unconstrained-scope: * ] # invalid

      => exists [ (is-employed-by)

                  employee : $p ]

7.3.1 Positive Quantified Constraints

If c is a quantified constraint, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c is a (conditional) quantified constraint. As with existence quantified constraints, the structural variable can be omitted if it is not needed. In that case an internally generated variable will be used.

Example X: Let us consider

forall [ $c (car) ]

   => exists [ $c

               bn: /ferrari/i ]

Such a conditional constraint would enforce that all submaps of the map under consideration which include a (direct or indirect) instance of a car also include the some topic such that it has base name characteristic which's value matches the word ferrari.

More formally, b is a function computing the assignment of values to variables. b (forall $v p, M) is computed as all tuples which have values for their corresponding variables in such a way, that p is successfully matched with a submap of M. In this case only the biggest submaps which match are considered.

For conditional constraints the bindings are computed as

b (f => g, M) = b (f, M) |><|r b (g, M)

whereby |><|r is the natural right join as known from the relational algebra.

Example X: In the following example we first try to match cars. For all of these matched maplets there must exist another maplet in the map matching the is-property-of association.

forall [ $t (car)

         bn: /ferrari/i ]

      => exists [ (is-property-of)

                   owner : millionaire

                   object: $t ]

In these cases we will often have to refer to topics, associations or parts thereof. This can be done with variables as above. For every topic of type car which we find in our map, its topic identifier will be bound to the variable $t. Our expectation is that the same topic plays the role of the object in an is-property-of association.

7.3.2 Negative Quantified Constraints

If c is a quantified constraint, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => not c is a (conditional) quantified constraint.

Intuitively, we check first all submaps matching the pattern and then have to ensure that none of these submaps match successfully with c. More formally, we define b to be:

b (f => not g, M) = b (f, M) - pi_var(f) b (f => g, M)

by simply deducting those bindings from b (f, M) which follow f => g.

7.3.3 Conjuction Quantified Constraints

If c and d are quantified constraints, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c and d is a (conditional) quantified constraint.

The and takes not only care that both constraints c and d have to be matched by every submap matching p. In this case all variables are within one scope coercing appearences in c and d to have the same values. This is formalized by using the natural join on the individual bindings:

b (f => c and d, M) = b (f => c, M) |><| ₌ b (f => d, M)

7.3.4 Disjunction Quantified Constraints

If c and d are quantified constraints, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c or d is a (conditional) quantified constraint.

In constrast to the conjunction variables in c and d are completely independent. The individual branches can be evaluated separately:

b (f => c or d, M) = b (f => c, M) union b (f => d, M)

8 Constraints

Finally, we can build constraints. First we define that any quantified constraint is already a constraint. The validity of such constraints is defined as

exists $v p ||- M <=> b (exists $v p, M) is not empty

and if c = forall $v p then

c => d ||- M <=> b (c, M) = pi_var(c) (b (c => d, M))

In the first case a map M is valid if there is at least one binding for the variables in b. In the second case all the bindings which forall $v p has in M must also be bindings for forall $v p => d whereby the projection operator pi only filters out those parts of the tuples which are relevant for c.

To build more complex constraints, we introduce boolean constants and operators to combine constraints.

8.1 True and False

For formal reasons we define true and false as constraints. All maps satisfy true, no map satisfies false:

true ||- M for all maps M

false ||- M for all maps M

8.2 Negation

In a next step we define the not operator. It allows to cover situations in which we are interested not to occur in conforming maps. If c is a constraint, then not c is defined via:

( not c ) ||- M <=> not ( c ||- M )

8.3 Conjunction

If c₁ and c₂ are constraints then c₁ and c₂ is a constraint, whereby we define

c₁ and c₂ ||- M <=> b (c₁, M) |><| ₌ b (c₂, M) is non-empty

Obviously, we use the natural join to define the conjunction of tuple sets. This implies that all variables have to be shared.

8.4 Disjunction

Also c₁ or c₂ is a constraint defined by

c₁ or c₂ ||- M <=> c₁ ||- M v c₂ ||- M

Here all variables are independent from each other; the clauses can be evaluated separately.

9 Functions

AsTMa! also offers language constructs to define new (boolean) functions. Their primary purpose is to keep constraints concise and to reuse particular domain knowledge.

Functions can be declared at any place within an AsTMa! instance and can be referenced via their name. The names follow the pattern /\w[[:alnum:]-_]+/.

function-definition -> 'function' function-name '(' [ formal-param-list ]')' 'returns' constraint

param-list        -> < ',', parameter-name [ ':' value ] >

Functions can have parameters. Formal parameters are listed within a () pair in a comma separated list and their names also follow the pattern /\w[[:alnum:]-_]+/. If for a parameter there exists a default value it is specified after the name following a colon (:). Optionally these values can be enclosed by "" to avoid ambiguity.

A function is referenced via its name followed by a list of actual parameters. The assignment of actual parameters to formal parameters is processed positional. If the actual parameter list is shorter than the formal parameter list, the defaults will be used. If there are no defaults, an error will be flagged.

FIXME: no recursion, this is more macro expansion.

Example: The following defines a function to control whether a particular topic T is used as a topic or association type:

function is-used-as-type (T)

  returns

     exists [ * (T) ]

OR

     exists [ (T) ]

Outside the function it can be referenced like in the following:

is-used-as-type (elephant)

FIXME: Should functions only be for top-level constraints?

An AsTMa! processor must understand the predefined functions listed in Appendix ?.

10 Syntax Extensions

In this section we extend the AsTMa! syntax without adding more expressiveness. The added constructs serve as convenience and help to keep real-world constraints concise.

10.1 Implicit AND

When using the exists operator often individual situations are requested as in

exists [ $c (car)

         bn: /ferrari/i ]

AND

exists [ (is-owned-by)

         owner : rho

         property : $c ]

If the exists keyword is not followed by a structural variable, the individual patterns can be combined into one:

exists [ $c (car)

         bn: /ferrari/i

         (is-owned-by)

         owner : rho

         property : $c ]

FIXME: does this work for forall? What about variables?

10.2 Negating Topic Characteristics

In some case it is necessary to constrain that topics should not have particular characteristic. Following the AsTMa! core language this can be done by not allowing the topic to exhibit that characteristic:

not exists $t [ bn @ de : * ]

Effectively, we disallowed all topics to contain base names scoped to the german language. The same can also be achieved with the following abridged syntax:

forall [ ! bn @ de : * ]

The internal ! operator works for occurrences (oc), inline text occurrences (in) and basenames (bn).

If two or more negations are used then the not conditions are ANDed:

forall [ ! bn : /stopword/

         ! oc : m|http://www.stop.com/| ]

# is equivalent with

NOT exists $t [ bn : /stopword/ ]

AND

NOT exists $t [ oc : m|http://www.stop.com/| ]

Conversely, if we would want to constrain that a particular basename does not occur in some topics we notate:

exists [ ! bn @ de : * ]

which is equivalent with

not forall [ bn @ de : * ]

FIXME: This only can work at the outest level? Allow it?

FIXME: should we allow bn @ ! de : /..../ or oc (!tutorial) : ....?

10.3 Negated Regular expressions

Also regular expressions within maplet patterns can be negated as the following example shows:

forall [ bn : !/ferrari/ ]

Here we specified that all topics must never have a basename matching the brand above. As such, this is equivalent to

not exists [ bn : /ferrari/ ]

# is equivalent with

forall [ !bn: /ferrari/ ]

# is equivalent with

forall [ $bn !~ /ferrari/ ]

10.4 Negating Association Members

The internal ! operator can also be used within maplet patterns constraining associations. The following constraint

forall [ (is-some-assoc)

         ! role : * ]

is equivalent with

forall $a [(is-some-assoc) ]

   => not exists $a [role : * ]

Again, multiple !'s with a pattern are resolved by AND-ing the individual conditions.

10.5 Cardinality Constraints on patterns

In some situations a constraint might include a particular quantity, a lower or an upper bound specifying how often a pattern should occur in a conforming map. For this purpose we extend the exists quantifier and allow a range modifier like below:

exists {4} [ (is-member-of)

             group : string-quartett

             member: $p

             $p (musician) ]

The range modifier {4} sets a lower and the upper limit of the number of matches within a conforming map to 4. If the limits differ, then we can use the Perl-like syntax {lower,upper}:

exists {2,3} [ $c (car)

               bn: /ferrari/i

               (is-owned-by)

               owner : rho

               property : $c ]

In this example we constrain a map to have at least 2, but no more than 3 topics of type car having a base name matching ferrari case-insensitively whereby these cars are owned by rho.

Note, that with this notation variables within a pattern morph from singular to plural.

To translate

exists {m,n} [ # condition ]

into the core AsTMa! syntax, we first build constraints for the lower bound:

exists $v1 [ # condition ]

AND

exists $v2 [ # condition ]

AND

...

AND

exists $vm [ # condition ]

As it is inherent in AsTMa! that different variables MUST match different values, we have prescribed the existence of (at least) m instances matching the condition.

For the upper bound, n we add clauses to prohibit the existence of more instances:

...

AND

not exists $vm_plus_1 [ # condition ] # for m+1

AND

...

not exists $vn [ # condition ]

An AsTMa! parser must flag an error if m is bigger than n.

FIXME: add plural variables here?

10.6 Cardinality Constraints inside patterns

In other situations it might be necessary to restrict the number elements within one maplet pattern. Application thereof are limitations on the number of members in an association or the number of particular characteristics within one topic. The extended syntax allows {n,m} constructs also there.

To constrain, for example, that a child must have exactly two parents we would notate that as

forall $a [ (are-parents-of) ]

    => exists $a ] (are-parents-of)

                   parent{2} : *

                   child     : * [

which is equivalent to

forall $a [ (are-parents-of) ]

    => exists $a ] (are-parents-of)

                   parent : *

                   parent : *

                   child  : * [

If, as another example, we want to enforce that every topic must contain at least two basenames, then

forall $t [ * ] # all topics

   => exists $t [ bn{2,} : * ]

The mapping into the AsTMa! core languages happens like in the previous section.

FIXME: This needs more elaboration.

10.7 Role Alternatives

Maplet patterns which constrain associations can be relaxed by allowing options for the members. These options are separated by the pipe character |:

exists [ (everybody-loves-someone)

         lover: romeo

         loved: shirley | kylie | julia ]

The above constraint is equivalent with

exists [ (everybody-loves-someone)

         lover: romeo

         loved: shirley ]

OR

exists [ (everybody-loves-someone)

         lover: romeo

         loved: kylie ]

OR

exists [ (everybody-loves-someone)

         lover: romeo

         loved: julia ]

If options are used for several members, then all possible combinations are enumerated.

10.8 Constraint Reification

To use constraints as topics in associations, constraints can also be reified in the same way as topics and associations in AsTMa=:

forall [ .... ] is-reified-by constraint-0815

10.9 External Functions

In some cases regular expressions are not expressive enough to constrain string values. Examples are values which have to come from a particular domain such as natural numbers, some finite set of enumeratable values or URLs which represent a downloadable resource at the moment of checking the constraint.

AsTMa! implementations MAY offer particular functions which may be used inside maplet patterns. An example might be

forall $a [ * (online-article) ]

   => suggested exists $a [ oc (download): ?is_live_url ]

After having identified all online articles in a map we check whether for these articles at least one occurrence of type download can be downloaded at validation time.

Another example is to constrain a resourceData occurrence to be a date:

forall $w [ * (world-war) ]

   => exists $w [ in (start-date) : ?is_date ]

Function references can occur anywhere where regular expressions are allowed. When invoked, they get passed the string value matching this particular pattern part (in this sense, the regular expression m|.*| will be used to bind the string, so ignoring leading and trailing blanks).

If a particular function is not implemented by an implementation, it may flag a warning condition, but otherwise behave as if the function returned TRUE.

11 Moderators

An existence quantified constraint can have either of three modifiers: mandated, suggested or derived. The default is mandated.

The moderators do not change anything on the validity relation; they only indicate to the processor information how to regard constraints.

If a constraint contains an existence quantified constraint with the moderator mandated then that constraint MUST be satisfied by a conforming map. A validator will report an error if that is not the case.

Example X:

forall [ $car (ferrari) ]

   => mandated exists [ (is-property-of)

                        property: $car ]

Some ontologies not only constrain future maps in a formal sense but also to express current best practices. Others may be input for editing environments directing them to prompt information from the user which may be optional. As such, a map not conforming to such weaker constraints can still be valid. If a constraint contains an existence quantified constraint with the moderator suggested then a validator will not flag an error but only a warning.

Example X:

forall [ $car (ferrari) ]

   => suggested exists [ has-color

                         object: $car

                         color : red ]

If a constraint contains a derived moderator this constraint is completely ignored by a validator. In this case the constraint poses not really a constraint for a conforming map; instead it is used to express an ontological property which should be silently assumed to hold for any map. With this it is possible to enrich map with ontological knowledge before they are queried by a query processor.

Example X:

forall [ (is-property-of)

         object : $car

         owner  : $owner ]

   => forall [ $car (ferrari) ]

      => derived exists [ $owner (millionair ) ]

12 Summary

In a semi-formal way we have tried to define AsTMa! as logic oriented pattern matching language. First we homogenized topics and associations introducing the concept of a maplet. It forms a uniform element which can be used to build complete maps.

Then we extended the notation of the authoring language AsTMa= by regular expressions and then defined an upper and a lower bound on how a pattern matches a maplet. Using these open and closed maplet patterns we added quantifiers which build constraints on maps on the whole, not just patterns for individual maplets.

Using normal logic operations on quantified constraints allowed us to build fairly complex constraints which provide ways to identify particular submaps first before constraining other parts of the map.

13 Acknowledgements

Much of the initial ideas emerged in the discussions with Jan Gylta and Daniel Eliasson when they worked on using Topic Maps for syndication. Alexander Zangerl contributed significantly to improve the clarity of the text. Erica Santosaputri triggered many changes in the language semantics when she implemented a AsTMa! to Prolog converter in a proof-of-concept study.

A Appendix: Language Skins (Non-normative)

Depending on the provienience different people prefer different syntax for particular language elements. This section hosts these syntactical abberrations from the language definition. An AsTMa! processor might allow free mixing or might prescribe a particular skin.

Skin AsTMa

Uses exists and forall as defined in this document.

Skin XQuery

This skin uses the same syntax like AsTMa! with the following exceptions:

· Instead of exists $t p one can write SOME $t SATISFIES p

· Instead of forall $t p one can write EVERY $t SATISFIES p

Skin XAsTMa

This skin is an XML representation of an AsTMa! expression and parallels the syntax definition defined in Appendix C1 whereby using XML features. The following expression

forall [ $b

         bn : /pattern/ ]

   => exists ] (some-association)

               role1 : player

               role2 : $b [

would be written as

<a:constraint xmlns:a="http://astma.it.bond.edu.au/ns/astma!/1.0/"

              xmlns:t="http://www.topicmaps.org/xtm/1.0/"

              xmlns:xlink="http://www.w3.org/1999/xlink">

<a:forall>

   <a:pattern open="yes">

      <t:topic id="$b">

         <t:baseName>

            <t:baseNameString>/pattern/</t:baseNameString>

         </t:baseName>

      </t:topic>

   </a:pattern>

   <a:exists>

      <a:pattern open="no">

         <t:association>

            <t:instanceOf>

               <t:topicRef xlink:href="#some-association"/>

            </t:instanceOf>

            <t:member>

               <t:roleSpec>

                  <t:topicRef xlink:href="#role1"/>

               </t:roleSpec>

               <t:topicRef xlink:href="#player"/>

            </t:member>

            <t:member>

               <t:roleSpec>

                  <t:topicRef xlink:href="#role2"/>

               </t:roleSpec>

               <t:topicRef xlink:href="#$b"/>

            </t:member>

         </t:association>

      </a:pattern>

   </a:exists>

</a:forall>

</a:constraint>

A more formal definition will follow.

B Appendix: Predefined Functions (Normative)

Following functions must be assumed to be declared by any AsTMa! processor:

# Synopsis:

#    Name:           transitivity

#    Property:       a R b ^ b R c => a R c

#    Invocation:     transitivity (is-located-in, location, region)

function transitivity (relation, from, to)

   returns

      forall [ (relation)

               from : $a

               to   : $b ]

      => forall [ (relation)

                  from : $b

                  to   : $c ]

         => derived exists [ (relation)

                              from : $a

                              to   : $c ] is-reified-by transitivity

# Synopsis

#    Name:           symmetry

#    Property:       a R b => b R a

#    Invocation:     symmetry (is-adjacent,neighbor1,neighbor2)

function symmetry (relation, from, to)

   returns

     forall [ (relation)

              from : $a

              to   : $b ]

     => derived exists [ (relation)

                         from : $b

                         to   : $a ] is-reified-by symmetry

# Synopsis

#    Name:           reflexivity

#    Property:

#    Invocation:     forall [ $x (regions) ]

#                        => reflexivity (contains, container, containee, $x)

function reflexivity (relation, role1, role2, var)

   returns

     exists [ (relation)

              role1 : $x

              role2 : $x ]

# Synopsis

#    Name:           is-used-....

#    Description:    return true or false depending whether the given

#                    topic is used in a particular way or not.

#    Invocation:     is-used-as-type (car)

function is-used (T)

   returns

      is-used-as-type(T)

or

      is-used-as-scope(T)

or

      is-used-as-player(T)

or

      is-used-as-role(T)

function is-used-as-type (T)

   returns

      is-used-as-topic-type (T)

or

      is-used-as-assoc-type (T)

or

      is-used-as-occur-type (T)

function is-used-as-topic-type (T)

   returns

      exists [ * (T) ]

function is-used-as-assoc-type (T)

   returns

      exists [ (T) ]

function is-used-as-occur-type (T)

   returns

      exists [ oc (T) : * ]

function is-used-as-scope (T)

   return

      is-used-as-bn-scope (T)

or

      is-used-as-oc-scope (T)

or

      is-used-as-assoc-scope (T)

function is-used-as-bn-scope (T)

   return

      exists [ bn @ T : * ]

function is-used-as-oc-scope (T)

   return

      exists [ oc @ T : * ]

function is-used-as-assoc-scope (T)

   return

      exists [ T @ (*) ]

function is-used-as-player (T)

   return

      exists [ (*)

               * : T ]

function is-used-as-role (T)

   return

      exists [ (*)

               T : * ]

C Appendix: Syntax (Normative)

C 1 Core Syntax (Normative)

AsTMa! is line oriented and follows the same syntactic rules as AsTMa=. All keywords are case sensitive, so are all quantifiers and boolean operators. Variables have to follow the pattern ($|@)\w[[:alnum:]-_]+ as described above.

Constraints must not separated by empty lines or comments. It is recommended but not necessary to indent nested constraints.

The core language has the following syntax:

constraint        -> bool-constraint | function-definition

bool-constraint   -> '(' bool-constraint ')' |

                     or-constraint

or-constraint     -> and-constraint [ 'OR' bool-constraint ]

and-constraint    -> simple-constraint [ 'AND' bool-constraint ]

simple-constraint -> 'forall' variable-name pattern '=>'

                         bool-constraint |

                         exists-constraint

) |

                     function-invocation

exists-constraint -> ('derived' | 'suggested' | 'mandated') 'exists' variable-name pattern

name              -> /^\$\w[[:alnum]]/

function-definition -> 'function' function-name '(' [ formal-param-list ]')' 'returns' constraint

param-list        -> < ',', parameter-name [ ':' default-value ] >

function-invocation -> function-name '(' [ actual-param-list ] ')'

pattern           -> '[' maplet-pattern ']' |

                     ']' maplet-pattern ']'

maplet-pattern    -> is an AsTMa= expression with regular expressions, wildcards and variables

default-value     -> either a string enclosed with "" or a constant

C 2 Extended Syntax (Non-normative)

The extended syntax incorporates all syntactical extensions of the core language defined so far.

tbd

D Appendix: External Functions (Non-normative)

AsTMa! defines only a minimal set of functions, mainly to support patterns containing regular expressions. Implementations are free to offer (a) more functions and (b) a mechanism to allow applications to register functions.

Below is a non-normative list of suggested functions.

· is_integer (in string): returns non-zero if the only parameter is a string representation of an integer.

· is_float (in string): returns non-zero if the only parameter is a string representation of an float.

· is_uri (in string): returns non-zero if the only parameter is a valid URI.

· is_urn (in string): returns non-zero if the only parameter is a valid URN.

· is_url (in string): returns non-zero if the only parameter is a valid URL.

· is_live_url (in string): returns non-zero if the only parameter is a valid URL and can be referenced at checking time.

E Appendix: Directives (Normative)

AsTMa! allows directives in the same way as AsTMa=. This implies that AsTMa! directives have to be introduced by a '%' character at the beginning of a line.

The precise semantics of a directive is up to the implementation. There are no mandatary directives an implementation has to provide. Typical examples are

% optimize : speed

% optimize : memory

which could influence runtime properties of the validation process to optimize for either processing speed or memory consumption.

F Appendix: References (Non-normative)

A proposal for a TMQL, Based on OQL and an underlying Object Model of XTM

Christoph Schmidt, Feb 2001, posted to the tmql mailing list

TMQL Draft(Topic Map Query Language)

Ann Wrightson, Ontopia, BSI, 7 Nov 2000 (corrected 28 Nov 2000), http://www.y12.doe.gov/sgml/sc34/document/0186.doc

XQuery

XQuery 1.0: An XML Query Language, W3C Working Draft 20 December 2001, http://www.w3.org/TR/xquery/

Answer is just a question [of matching Topic Maps]

Rafal Ksiezyk, XML Europe 2000, proceedings, http://www.gca.org/papers/xmleurope2000/papers/s22-03.html

XML Topic Maps (XTM) 1.0

Specification, Steve Pepper, Graham Moore, Steven R. Newcomb, Michel Biezunski, http://www.topicmaps.org/xtm/

Creating semantically valid topic maps

Geir Ove GrØnmo, Ontopia, http://www.ontopia.net/topicmaps/materials/tm-schemas-paper.pdf

Topic Map Fundamentals for Knowledge Representation

H. H. Rath, in XML Topic Maps, Creating and Using Topic Maps for the Web, Ed: J. Park, 2002, Addison Wesley

SAM, The Standard Application Model for Topic Maps

Ed: L.M. Garshol, http://www.y12.doe.gov/sgml/sc34/document/0396.htm

Mastering Regular Expressions

Jeffrey E. F. Friedl; 2nd Edition July 2002, O'Reilly

Prolog for Programmers

F. Kluźniak, S. Szpakowicz, 1985, Academic Press

XML Path Language (XPath), Version 1.0

Ed: J. Clark, S. DeRose, W3C, http://www.w3.org/TR/xpath

Requirements for a Web Ontology Language

Ed: J. Heflin, R. Volz, J. Dale, W3C, http://www.w3.org/TR/webont-req/