The myths of semantic technology

In the previous text, The Myth of the Semantic Web , an attempt was made to substantiate the idea that the semantic web, within the framework of the technological solutions originally proposed by its founders, would remain a dream. Although the W3C semantic tools themselves are not without a squeak, with a fine-tuning file, they are quite successfully used for modeling ontologies in enterprise projects.

Now let's try to deal with the myths that can be heard when discussing the semantic approach, even from professionals.

So, the main myth of technology, which is called semantic, technology, which, according to its founders, is designed to make the computer understand the content (meaning) of texts or individual characters, is the very statement that IT semantics have something to do with meaning. And in order to recognize this incident, it is enough to realize that switching to a different format for recording facts, to a different data organization scheme, to a different way of generating them cannot fundamentally change the essence of information technology - the computer, as I never understood, still does not understand the meaning programmatically processed characters. Writing data from a relational database as a set of triplets does not add any meaning. Replacing tables with a graph can be useful for unifying a data model, implementing complex searches, safely modifying business models, etc.,but it won’t make the computer understand the meaning of the data.

The only case where it is permissible, albeit in quotation marks, to use the phrase “understanding of meaning” is when discussing the exchange of data between independent applications. We can say that the use of a single record format (RDF) and uniform dictionaries gives the computer the ability to “understand the meaning” of data from an unknown provider. Although it is obvious that there is no need to speak of any understanding either: the unconditionally important but essentially banal problem of matching the namespace, using the same identifiers for data of the same type is being solved (we just agree on the column names).

The use of so-called URIs as an identifier of an entity or its type does not add meaning. Except as such a “sense adding” link to the description of the entity. Although, again, this has nothing to do with semantics, interpreted as “computer understanding of meanings”; here we are talking only about the convenience of visualizing data for humans. In addition, in the electronic documentation of any IT system, the description of entities is necessarily associated with their identifiers in the repository.

The myth is also a statement that semantic technologies, unlike traditional ones, work with knowledge. After all, again, it is obvious that changing the structure of data storage and processing does not make of the latest knowledge. Of course, it is perfectly acceptable to call a semantic repository containing the most complete description of a certain subject area as a knowledge graph. But at the same time, we must understand that we are not dealing with knowledge in a meaningful sense, but with a large array of related facts that makes it convenient to conduct a search. And here it should be noted that in comparison with relational tables in the graph there is no special, additional data connection - the transfer of data from one scheme to another does not increase the number of links. The semantic format only simplifies the creation of new relationships,that is, it allows you to add new data types to the graph without any changes to the storage structure. But this is again technological convenience, and not a reason to talk about special “related data” (Linked Data).

And of course, the statement that computer ontologies are capable of generating new knowledge does not cause anything but a smile. Yes, with the help of the logical inference produced by special risoner programs, new statements can be obtained in the ontology. But the logic of this conclusion is comparable in level with the conclusion of a three-year-old child "grandfather is my dad's dad." Of course, such a logical conclusion is necessary for an advanced search, but you must admit that it cannot generate any new knowledge, it only allows you to not store unnecessary data (for example, it does not require attributing to each dad that he also became a grandfather at the time of the birth of his child child). And here again, you should pay attention to the fact that no one bothers in the same way to derive "new knowledge" in applications with relational databases:add a column for the attribute "grandfather" and programmatically control the addition of the fact of the birth of a child. Using the ontological approach only unifies logical inference operations, simplifies the addition of new axioms, makes it possible to store them in the same format as data, but does not add any “intelligence” to it.

So, semantic technology is not about semantics at all. They are about unification, standardization, identification, modification of a large array of heterogeneous data, about the exchange of data between independent applications, about a complex search ... But not about the meaning and about new knowledge. Rather, they are no more about meaning and new knowledge than other technologies for storing and processing data. However, you should not take these conclusions as a call to abandon established terminology - let the technology remain semantic, the graph a knowledge graph, and the data connected (Linked Data). You just need to understand what you can teach a computer and what not.

(To be continued)

All Articles