NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software

Chapter 8
Semantic Notations

The field of semantics is vast, and one small chapter in this book cannot adequately survey it. Since this book is about a modeling notation and terminology, the focus in this chapter will be on how leading semantic languages, specifically Resource Description Framework (RDF) and Web Ontology Language (OWL), express concepts about the real world, and how COMN represents these things.

Predicates and RDF Statements

There has been a great interest for some time in enabling computers to understand natural language: to be able to interpret the meaning of sentences, and then to be able to respond to the meaning of full-sentence queries and commands. Think of the robots and computers of science fiction, which have artificial brains and can carry on conversations with humans. There is a serious effort that believes that we can achieve this in the next few decades, but it requires that computers be able to process meaning. The field of semantics has as its goal the reduction of meaning to something that can be processed by a computer.

The effort to reduce meaning to something computable has been hampered by the heavy overloading of the term “predicate”. It is used in at least three senses that are relevant to semantics. The three meanings are as follows.

In English grammar, the predicate of a complete sentence is “the part of a sentence or clause that expresses what is said of the subject and that usually consists of a verb with or without objects, complements, or adverbial modifiers” (Merriam-Webster). For example, the predicate is underlined in the following sentence.

Employee #952 works in department 4567.

The English grammar meaning of “predicate” is relevant when one is analyzing natural language text.

In the Resource Description Framework (RDF), the “predicate” is the middle part of an RDF statement, which also has a “subject” and an “object”. These terms, borrowed from English grammar, don’t quite line up with their English grammar definitions. We can ignore that fact as long as we can keep English grammar definitions out of our minds.

RDF would break down the above sentence as follows:

the subject: Employee #952
the predicate: works in
the object: department 4567

These three parts taken together are called a “triple”. (A triple is any group of three things). A triple of this sort forms what is called an RDF statement.

The RDF statement above could be expressed in XML as follows:

<rdf:Description rdf:about=”http://www.company.fake/employee#952”>
<ns:worksInDept>4567</ns:worksInDept>
</rdf:Description>

In this XML example, the subject is expressed as a resource referenced by a URL, the predicate is given by an XML element named <ns:worksInDept>, and the object is the value 4567.

You will see in chapter 14, Data and Information Defined, how logicians use the term “predicate”, which is entirely different from how the word is used in RDF (and in English). We will preview that chapter here, and map RDF predicates to logical predicates.

In logic, a proposition is “an expression in language or signs of something that can be believed, doubted, or denied or is either true or false” (Merriam-Webster). The statement, Employee #952 works in department 4567, is a proposition.

We would expect the Human Resources department of a corporation to record many similar propositions about many employees; for example:

Employee #952 works in Department 4567.

Employee #956 works in Department 4567.

Employee #891 works in Department 4566.

As shown above, each of these propositions could be expressed as an RDF statement.

These propositions clearly follow the same pattern. We can express that pattern by creating an English sentence where the variable parts of the propositions are represented by symbols—called, appropriately enough, variables. Here is such a sentence, with the variables underlined.

Employee #EmpId works in Department DeptNr.

This English sentence is now in the form of a logic predicate. In logic, a predicate is a formula with variables that will yield an answer, true or false, when all of its variables are bound to appropriate values. By “appropriate”, we mean values that are consistent with the type expected by the corresponding variable.

In our example above, we expect that the variable EmpId will be bound only to EmpId values found in the table of employees, and that the variable DeptNr will only be bound to DeptNr values found in the table of departments. Such restrictions on the possible values of variables are expressed by types, which are very similar to what OWL calls classes. We will dig deeply into types in chapter 11.

To this point, then, we’ve seen that an RDF statement is a proposition, and that the pattern of a proposition can be expressed as a logical predicate with two parameters. The RDF predicate identifies the logical predicate, and the subject and object are the values for the logical predicate’s two variables.

Doubles and Quadruples

Not every statement that we would like to make about things come in triplicate form. Sometimes we need to be able to say something that has four parts and can’t be sensibly subdivided.

For instance, consider this statement:

John threw the ball to Mary.

There is no good way to reduce this statement to three parts. There really are four parts, and dropping any part leaves out some important meaning.

Here are two possible approaches to reducing this four-part statement to triples. In the first approach, we place the direct object (the ball) and the indirect object (Mary) in their own triple, and then make that triple the object of another triple that includes the subject and verb (predicate). This can be expressed in pseudo-code using functional syntax as:

// triple #1
thrownToSomeone(ball, Mary)

// triple #2
threw(John, thrownToSomeone(ball, Mary))

You can see that the second triple incorporates the first triple as its third part.

The second approach places the subject, predicate, and direct object in a triple, and then makes that triple the subject of another triple.

// triple #1
someoneThrewSomething(John, ball)

// triple #2
thrownTo(someoneThrewSomething(John, ball), Mary))

Logical predicates don’t care how many arguments they take: any number greater than zero will do. A logical predicate corresponding to the above statement in functional notation might look like this:

threw(< Person_t who, Object_t what, Person_t toWhom >)

The above statement would appear in the same functional notation as:

threw(John, ball, Mary)

Forcing an extra level of factoring of such statements into triples could be disabling to some Big Data applications.

There is a lesser problem in the other direction, when we have only a subject and a verb/predicate; for example,

Horses exist.

Unicorns do not exist.

These statements could be represented as triples as long as there is a placeholder for the missing object. Such statements do not occur as frequently as those in the form of triples and quadruples, and the extra overhead of the missing object placeholder is probably not a performance problem.

OWL

The Web Ontology Language, or OWL, is a language for expressing ontologies. It has its own implicit ontology, described in the abstract syntax of the language.

COMN can be used to represent ontologies, because its symbology enables the depiction of real-world things, their relationships, and their properties. However, COMN has at its foundation a strong distinction between things that are concepts and things that are material objects. This distinction is present in order to ensure that COMN can represent not only real-world things, but also the real-world material objects of which computers are made, and can show how the meaningless states of those objects can be used to represent meaning.

This strong distinction in COMN leads to very different uses of words like type, class, and object than in OWL. Despite these differences, there is nothing in COMN that is incompatible with the abstract syntax of OWL. Consult the terminology mapping table in the Terminology section below for guidance.

Graphical Notations for Semantics

There is no single graphical notation that is dominant for the expression of semantic information. The semantic community seems to embrace a rich diversity of graphical notations in order to express different aspects of meaning. Notations in use include:

simple graphs with nodes represented by circles, ellipses, or rectangles, and edges connecting them represented by lines or arcs: Nodes represent the subjects and objects of triples, and edges represent RDF predicates.
UML-like drawings showing objects/entities with attributes.

Diagrams using these notations are sometimes organized into a particular style, such as state transition diagrams, cluster maps, and trees.

COMN offers the field of semantics a notation that is suitable for many of these purposes. A COMN model can be drawn more like a simple graph or more UML-like. Some of the possibilities will be explored in chapter 17. But the most important aspect of COMN is that the same notation used for ontologies can be used for expressing data and the static structure of software. This enables the modeler to ensure that the translation from a model of reality to a running system is complete and correct, and to express and therefore control the physical realization of the model.

Terminology

RDF Term	COMN Term
statement	an ordered list of three values. The second value (the RDF predicate) identifies a logical predicate with two variables. The first and third values (the RDF subject and RDF object, respectively) supply the values for the predicate’s two variables. The statement forms a logical proposition.
predicate	the name of a logical predicate with two variables
no RDF equivalent	logical predicate: a logical formula having one or more variables which, when the variables are bound, forms a proposition

OWL Term	COMN Term
individual	entity, whether conceptual or objective
class	type
no OWL equivalent	class: a description of the structure and/or behavior of material objects
datatype	type of something lexical
property	attribute
ObjectProperty	attribute whose value is a reference to some entity
DatatypeProperty	attribute whose value is lexical
restriction	restriction (a means of subtyping)

Key Points

The field of semantics today is dominated by the Resource Description Framework (RDF) and the Web Ontology Language (OWL).
RDF statements and triples are inefficient for representing information that are not in the form of a logical predicate with two variables.
COMN uses words like type, class, and object differently than OWL, but their abstract syntaxes are compatible.
COMN offers the field of semantics a single modeling notation that can represent the real world, representations of the real world in data, and the static structure of software. This can help ensure a complete and correct translation of an ontology into a running system.

Previous Chapter

Chapter 7 Fact-Based Modeling Notations

Next Chapter

Chapter 9 Object-Oriented Programming Languages

Table of Contents for NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software

Table of Contents for
NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software