No, this is not the sequel of Little House on the Prairie, but rather the continuation of our articles related to the buzz words wandering in the world of Data…

So now that the implementation of Datalakes has been completed – or is in progress, or even starting… – what will succeed this concept and make it evolve???  Note that I talked about evolution, which is not necessarily an improvement, because like any new concept, you need to know if you really need it, or if it is at least applicable in your context.

So here comes… No, not the sun, nor the rain again… But the DataLakeHouse (or DatalekeHouse or Datalakehouse or Data Lake House… as the name isn’t really well defined or normalized as for other concepts with little maturity).

So, you will tell me… “What? I’ve barely finished setting up my Datalake, I’m just starting to plug in my Datahub over it (see our previous article) and I have to throw everything away to now implement this Datalakehouse, even though the previous initiatives didn’t really add some value for business people for now???”

Once again, let us try to demystify this concept.

First, let us refocus the topic, on what should be expected of a “Datalakehouse” and, from an architectural point of view, make a list of the different features and the related applicative components.

In some common and shared manners, some notions will be found such as:

  • data persistence (for sure a very revolutionary feature…)
  • data traceability or even auditability (oh, swearword…)
  • ACIDity of transactions (we get to something less obvious…)
  • deserve all kind of data-related business need: operational reporting, analytical reporting, protoyping, Machine Learning, Data Stewardship and Reference / Master Data systems (well it has to be magic, isn’t it?)
  • provide access to any kind of data

 

Well…

Does it remind you something??? Yes, we stick more or less to applicative blocks and functions of a Datawarehouse (ideally an Enterprise one, of course), whether it sits on a Datalake or not…

Once the functionalities have been identified, it is only a question of a technological transfer of DWH/analytical stack as close as possible to “Big Data” technologies and their progress and maturity, depending on the chosen solutions.

It is therefore obvious that our very familiar DWH will emerge from the DLH (yes, you might get used to this new acronym!) like an old chestnut…

We finally get to the same paradigm raised with the Datahub / Datalake i.e., a transition of applicative notions, which are clearly not new, onto a Big Data technical stack that is getting more and more mature and closer to the common industrial requirements, a technical architecture that is more open, less proprietary (well, depending on the distro…) and fully scalable (including the cost projection).

As a conclusion, the necessary skills and the related methodologies shouldn’t undergo any revolution because of a simple change in tools (except the mastery of these new tools, as for any technical solution) …

Yes, it will still be necessary to build a real DWH (whether it’s included in the Datalake layer of a hybrid Big Data architecture or not…). And therefore, to come back to robust and durable Enterprise modeling (such as Data Vault, which is by the way mentioned on https://datalakehouse.org/) for those who would like to invest in this not so new type of solution.

It is not uncommon nowadays to observe that the news includes a part of warmed-over stuff, existing concepts that can be presented as “new” or an assembly of existing solutions as a brand new architecture. We are finally coming back to fundamental and core concepts like Enterprise Datawarehousing and the best way to implement them (see our articles about this topic).