DataGovernance in the home

Hello, Habr!

Data is the company's most valuable asset. This is stated by almost every company with a digital bias. It’s hard to argue: without discussing the approaches to managing, storing and processing data, not a single major IT conference is taking place now.

Data comes to us from the outside, they are also formed inside the company, and if we talk about the data of the telecom company, then for internal employees it is a storehouse of information about the client, his interests, habits, location. With competent profiling and segmentation, advertising offers shoot most efficiently. However, in practice, not everything is so rosy. The data stored by the company may be hopelessly outdated, redundant, duplicate, or no one knows about its existence, except for a narrow circle of users. ¯ \ _ (ツ) _ / ¯


In a word, data needs to be effectively managed - only in this case they will become an asset that brings real benefits and profits to the business. Unfortunately, to solve data management issues, quite a few difficulties have to be overcome. They are mainly determined by the historical heritage in the form of “zoos” of systems, and the lack of common processes and approaches to managing them. But what does “manage data" mean?

That's what we will talk about under the cut, as well as how the open source stack helped us.

The concept of strategic data management Data Governance (DG) is already well known in the Russian market, and the goals achieved by the business as a result of its implementation are clear and clearly stated. Our company was no exception and set ourselves the task of implementing the concept of data management.

So where did we start? To begin with, we have formed key goals for ourselves:

  1. Ensure the availability of our data.
  2. Ensure transparency of the data life cycle.
  3. Give company users consistent, consistent data.
  4. Give company users verified data.

To date, the software market has a dozen tools of the DataGovernance class.



But after a detailed analysis and study of the solutions, we fixed for ourselves a number of critical comments:

  • Most manufacturers offer a comprehensive set of solutions, which for us is redundant and duplicates existing functionality. Plus, resource-expensive integration into the current IT landscape.
  • Functionality and interface are intended for technologists, not end-users.
  • Low survival rate of products and lack of successful implementations in the Russian market.
  • High cost of software and further maintenance.

The criteria and recommendations voiced above regarding import substitution of software for Russian companies convinced us to go in the direction of our own development on the opensource stack. Django, a free and free framework written in Python, was chosen as the platform. And thus, we identified for ourselves the key modules that will contribute to the goals stated above:

  1. Register of reports.
  2. -.
  3. .
  4. BI-.
  5. .




According to the results of internal research in large companies, solving problems associated with data, employees spend 40-80% of the time searching for them. Therefore, we set ourselves the task of making open information about existing reports that were previously only available to customers. Thus, we reduce the time for the formation of new reporting and ensure the democratization of data.



The register of reports has become a single reporting window for internal users from various regions, departments, divisions. It consolidates information on information services created in several corporate storage facilities of the company, and there are a lot of them in Rostelecom.

But the registry is not just a dry list of developed reports. For each report, we provide the information necessary for the user to independently get to know him:

  • ;
  • ;
  • ;
  • ;
  • ;
  • - ;
  • ;
  • ;
  • .

According to reports, analytics of the level of usability are available, and reports fall into the top of the list based on log analytics by the number of unique users. And that's not it. In addition to general characteristics, we also provided a detailed description of the attribute composition of the reports with examples of values ​​and calculation methods. Such detailing already immediately gives the user an answer whether the report is useful for him or not.

The development of this module was an important step in terms of data democratization and significantly reduced the time required to find the required information. In addition to reducing search time, the number of calls to the support team for advice has also decreased. It is impossible not to note one more useful result that we achieved by developing a single register of reports - preventing the development of duplicate reports for different structural units.

Business Glossary


You all know that even within the same company, a business speaks different languages. Yes, they use the same terms, but they mean completely different things. The business glossary is designed to solve this problem.

For us, a business glossary is not just a guide with a description of terms and a calculation methodology. This is a full-fledged environment for the development, coordination and approval of terminology, the construction of the relationship of terms with other information assets of the company. Before entering the business glossary, the term must go through all stages of coordination with business customers and a data quality center. Only after that it becomes available for use.

As I wrote above, the uniqueness of this tool is that it allows you to make connections from the level of the business term to the specific user reports in which it is used, as well as to the level of physical database objects.



This was made possible through the use of glossary term identifiers in a detailed description of registry reports and a description of physical database objects.

Now in the Glossary more than 4000 terms are defined and agreed. Its use simplifies and speeds up the processing of incoming change requests in the company's information systems. If the required indicator has already been implemented in any report, then the user will immediately see a set of ready-made reports where this indicator has been used, and will be able to decide on the effective reuse of the existing functionality or its minimal improvement without initiating new requests for the development of a new report.

Technical Transformation Description Module and DataLineage


You ask, what are these modules? It’s not enough just to implement the Report Registry and the Glossary; you still need to land all business terms on the physical database model. Thus, we were able to complete the process of forming the data life cycle from source systems to BI visualization through all layers of the data warehouse. In other words, build a DataLineage.

We developed an interface based on the format used by the company for the description of the rules and data transformation logic. Through the interface, the same information is acquired as before, but the determination of the term identifier from the business glossary became a prerequisite. So we build the connection between the business and physical layers.

Who needs this? What did not suit the old format with which they worked for several years? How much increased labor requirements for the formation of requirements? We had to deal with such issues in the process of implementing the tool. Here the answers are quite simple - we all need it, the data office of our company and our users.

Indeed, the employees had to be restructured, at first this led to insignificant increases in the labor costs for preparing the documentation, but we figured out this issue. Practice, identification and optimization of problem areas have done their job. We achieved the main thing - we improved the quality of the developed requirements. Mandatory fields, unified directories, input masks, built-in checks - all this made it possible to significantly improve the quality of transformation descriptions. We left the practice of transferring scripts in the form of development requirements, shared knowledge that was available only to the development team. The generated metadata database reduces the time required for conducting a regression analysis by several times, and provides the ability to quickly assess the impact of changes on any of the layers of the IT landscape (storefront reports, aggregates, sources).

And where are ordinary report users, what are the pluses for them? Due to the ability to build DataLineage, our users, even those who are far from SQL and other programming languages, quickly receive information about sources and objects, on the basis of which this or that report is generated.

Data Quality Control Module


Everything that we talked about above regarding data transparency is not important without understanding that the data that we give to users is correct. One of the important modules of our Data Governance concept is the data quality control module.

At the current stage, this is a catalog of checks on selective entities. The immediate goal of product development is to expand the list of inspections and integrate with the registry of reports.
What will it give and to whom? For the end user of the registry, information will be available on the planned and actual dates of the report’s availability, the results of completed inspections with dynamics, and information on the sources uploaded to the report.

For us, the data quality module integrated into work processes is:

  • Prompt formation of customer expectations.
  • Making decisions on the future use of data.
  • Obtaining a preliminary set of problem points at the initial stages of work for the development of regular quality controls.

Of course, these are the first steps in building a full-fledged data management process. But we are sure that only by deliberately doing this work, actively introducing DataGovernance tools in the workflow, we will provide our customers with information, a high level of trust in the data, transparency in their receipt and increase the speed of output of new functionality.

DataOffice Team

All Articles