As discussed earlier, the key points of the methodology do not rely on any technical complexity on the physical modeling or on data loading processes, but rather on an adequate representation of business objects and of their relationships as the core of the data model.

The key stakes to reach success for a Data Vault based project will then be mainly this ability to reflect the business processes, which means:

    • Implication of business representatives during the study/analysis phase of analytical needs and their availability for explaining operational processes.
    • To have a true understanding of required analysis to be implemented, and so with related business objects and operational processes.
    • Ability to analyze theses operational processes to identify and understand, beyond the involved business objects:
      • The related source systems.
      • The way busines objects are implemented in each source system (as detailed definitions may vary depending on considered systems).
      • Data propagation for each object in the different systems of the Information System (interfaces and gradual enrichment of data).
    • Based on these elements, manage to define all the necessary business objects, from a global, organization-wide point of view. For sure this is the crucial point for the study phase, as all the business representatives have to be challenged in order to have a global vision that goes beyond a particular functional silo, or beyond processes being merely internal to a particular business area. This global and common definition will then result in what will be the basement of the modeling: the definition of the Business Keys (BKs), meaning what attributes, fully and uniquely, define and qualify a business object, whatever the considered business areas in the whole organization.
    • This study phase will also have to make the distinction between real business objects and what should just be relationships between basic business objects. After the choice of the BKs, it’s the second crucial point for designing the core data model, as the flexibility of the Links provides all the power and scalability for this type of modeling. Indeed, Links stand for relationships between objects, without having their cardinality being constrained (a Link is just a potential N-N relationship).
    • Define the simple transformations for global and shard business rules, with Data Quality processes, for going from “Raw Data Vault” to the “Business Data Vault” layer(s).

The other topics are less important for durability and flexibility of the data model, but more linked to technical optimizations, particularly linked to very high volumes :

  • Splitting of the Satellites: considering the different source systems or depending the variability of the attributes, to keep volume growth the lower when versions change on historized records.
  • Define additional physical or logical structures (tables / views / dematerialized views…) to accelerate the supply of data on the way out to Datamarts, based on data commonly used together and involving a subset of the model.


To be continued here