The semantic web myth

In the field of semantic modeling, a rather strange situation has developed: a set of standards and specifications from W3C used for the “semantic web” project (RDF / OWL, SPARQL, etc.) is used as the basic ones, although the project itself is not only not implemented at the moment, but and, apparently, will never be embodied due to the doubtfulness of the original hypotheses.

The semantic web was thought by its author Tim Berners Lee as the next stage in the development of the Internet. The idea was quite rational: you need to connect all the network resources not with meaningless links that send the user from one page to another, but with meaningful (semantic) connections. For this, it was proposed to assign each online and even offline entity (object, property) a unique identifier and combine these entities into a single graph. After that, users could quickly and accurately find the information they need, and most importantly, computers would gain access to the semantic content of the network. That is, the goal was to create a distributed knowledge graph that connects semantically defined data in a single network space, with the possibility of machine processing and logical inference of new facts.

The idea of ​​a semantic network described above looks not only relevant, relevant, but also quite feasible using modern technologies - such as peer-to-peer networks with attack-resistant consensus algorithms, cryptographic user identification and cryptographic data protection. But the founders of the project initially made dubious architectural and ideological decisions that left the semantic web in the status of a beautiful dream.

Since the main goal of creating a semantic web was the sharing of information on the Internet, this Internet was chosen as the technological platform of the project, that is, a chaotic dump of sites whose content is controlled not by authors, but by domain owners. Orientation to a modern network has necessarily determined the basic principles of the project: (1) using an Internet address as a basis for resource identifiers (URIs), (2) the ability of anyone to make an assertion about any resource, (3) the assumption of an open world, that is, incompleteness information. These principles were the main problems.

First of all, it is obvious that Internet addresses are not something that can serve as the basis for identifying entities. A domain can change its owner, it can be abandoned, and it’s just not technically available. The structure of names within a domain can be arbitrarily changed. Not to mention that many diverse technologies and engines, on the basis of which the sites are built, do not adhere to any standards for the formation of addresses.

But the main formal reason for the failure of the semantic web project should be recognized as the second basic principle, that is, the hope that the owners of the sites will build a single network semantic graph. Although even at the inception of the project idea, it was obvious that website owners would go to any forgery to deceive search robots (even writing invisible text on pages and manipulating keywords). Among those who honestly would like to perform semantic markup of pages, only a few would cope with the task. But even in the ideal case, if a semantic network had been competently thrown on all existing sites, the project would still not have worked. After all, then the obvious would have been revealed: we are dealing with hundreds and thousands of duplicates of the same resource (text, image,video) with different identifiers (addresses). And besides, most instances of one entity would not have the same properties, because "anyone has the right to make a statement about any resource." Well, it’s clear that it’s not possible to find the author’s original among these copies.

And of course, big problems arose with the third principle, proclaiming the presumption of the open world, that is, implying the possibility of free addition of facts to the general network. Let us dwell on it in more detail.

In fact, the idea of ​​an open world is inherited from the standard Internet, where everyone is free to add any domains, pages, entities and link to any other entities. But the semantic graph differs from the link network in that it must establish logical, ideally formally verifiable, relationships between statements about entities, and therefore, in order to be consistent, it must be closed. The compiler of the semantic graph, modeling a certain fragment of the subject area, should proceed from a strict conceptual scheme in which the ambiguity of terminology, the uniqueness of identifiers, and, moreover, the arbitrary addition of statements by any actors are fundamentally unacceptable. That is, if we talk about the openness of the logical world,then this openness should imply the free addition of new closed models to the graph, rather than arbitrary facts. The network should be composed of independent subject and level ontologies, the interaction between which is ensured by the use of common dictionaries. It is necessary to strictly separate two tasks: (1) constructing the ontology of the subject area and (2) solving the problem of interaction / correlation of different ontologies, that is, matching identifiers of entities, naming types and logical constraints to coordinate data exchange.(1) constructing the ontology of the subject domain; and (2) solving the problem of interaction / correlation of different ontologies, i.e., matching identifiers of entities, naming types, and logical constraints to coordinate data exchange.(1) constructing the ontology of the subject area; and (2) solving the problem of the interaction / correlation of different ontologies, i.e., matching identifiers of entities, naming types, and logical constraints to coordinate data exchange.

It should also be recognized as an erroneous decision and the orientation of the semantic web project towards creating the only true, consistent graph constructed according to the canons of formal (monotonic) logic. One can still agree with this approach when building a fixed knowledge base in some practically completed subject areas (geography, engineering standards, etc.). However, an ontology modeling tool is needed not to describe static structures, but to support the functioning of real complex systems in which the monotonicity and consistency of the description are unattainable not only during their formation, but also in the final state. It is worth recognizing that the occurrence of an error in building a system is a fact that changes its state, and ignoring this fact can lead to disastrous consequences.That is, the logic of the semantic graph should not be monotonic. And here it should be remembered that the authors of the idea of ​​the semantic web were not the only ones who stepped on the rake of a single ontology - after many years of trying to build a single consistent semantic space, the well-known CYC project abandoned this idea and switched to working with microtheories - locally closed ontologies of individual subject areas.

In fact, the mistake in designing the semantic web tools was that the difference between the two tasks was not identified and taken into account. The first is the creation of a local ontology of the domain: adding statements validated by local (offline and online) means into it, the logical derivation of new statements according to the rules built into the local ontology. The second is the connection of local ontologies into a single network graph and an attempt to obtain conclusions from a variety of independent data. Obviously, even if all network data sources use the same dictionaries and each of them is logically flawless in itself, the answers received for queries to the aggregate graph (if at all possible) will have a fundamentally different reliability status compared to the results obtained in each local ontology.

The described difference in working with local ontologies and a common semantic graph can be formally expressed in terms of the openness of the world: a request to the network should be based on the presumption of the openness of the world, and the logic of working with local ontologies will most often be based on the closed-world hypothesis. We can say that the world should be open, but not for individual statements, but for holistic ontologies.

So it turns out that the W3C standards continue to be developed for the mythical semantic web, and everyone who tries to use them in real projects, that is, to create ontologies of subject areas, are forced to constantly come up with crutches to get a working product.

(Continued myths of semantic technology ).

All Articles