Data Vault Basics

Currently, in the field of data analysis and BI, it is no longer possible not to meet such a concept as DATA VAULT. However, in my opinion, there is some lack of information on this topic, especially in the Russian-language segment of the Internet.


You can find interesting articles on the application of DATA VAULT in companies, but the basics and methodology are not adequately covered.


In the English segment, things are much better. You can buy books by authors and inventors of the DATA VAULT methodology, but there are also articles in the public domain that focus on the basics.


Being inspired by one of these articles, I will try to convey the basic things of the DATA VAULT methodology in Russian.


DATA VAULT - the origins


The main prerequisite for the emergence of DATA VAULT was the increasing environmental variability and the need to respond quickly to these changes. For example, there is a new data source with previously granular data granulation in EDW (Enterprise Data Warehouse). It is assumed that the DATA VAULT methodology will allow you to quickly add data from a new source. In addition, using DATA VAULT is easier to build a system that allows you to store historical data.


Anatomy DATA VAULT


An important difference between DATA VAULT and other approaches to building data warehouses is the need to load data in an identical source state. The process of transferring data from sources to DATA VAULT does not involve any transformations or additions. DATA VAULT approach implies the possibility of reconciliation with the source. The data transformation process will be carried out later, when building data marts based on DATA VAULT.


Hubs (HUBS)


HUBs are the core of the DATA VAULT. Properly configured HUBs allow you to combine different data sources in your corporate repository. It is important that the sources are independent. Based on this, each HUB should have its own unique business key (Business Key), not associated with other business objects.


HUB’ , .


, , , VIN .


DATA VAULT, . , .


HUB’ , :


  • – ;
  • – ;
  • HUB – , DATA VAULT, ;
  • – , HUB’ .

(LINKS)


– DATA VAULT. , , , .


DATA VAULT LINK’. , HUB’ LINK’. LINK’ .


LINK’ , HUB’.


LINK HUB’ . LINK HUB’, LINK HUB’.
, HUB LINK :


  • , , HUB’, , , LINK’ ;
  • HUB’;
  • , HUB’;
  • ;
  • .

(SATELLITES)


, , . SATELLITE .
:


  • HUB’;
  • SATELLITE — SATELLITE timestamp.

SATELLITE – .


, , HUB’ , HUB’ , SATELLITE , .



DATA VAULT, , :


  • (HUB) = ;
  • (LINK) = , ;
  • (SATELLITE) = .

HUB — - .


LINK — .


SATELLITE — .
DATA VAULT , , .


, DATA VAULT Raw DATA VAULT, , . Business DATA VAULT, , : PIT BRIDGE . Business DATA VAULT , .
:


  1. On the publication of Kent Graziano , which in addition to a detailed description contains diagrams of the model;
  2. Book: Building a Scalable Data Warehouse with DATA VAULT 2.0.

All Articles