Shuveb Hussain
Data Warehouses feature prominently in what we today call “the Modern Data Stack”. If you wake a data engineer up from their sleep and ask them to deliver data, she’d take a very predictable path from ELT connectors, to a Cloud Data Warehouse while throwing in a workflow orchestration system for good measure. The fact that Data Warehouses are end destinations for our data affords them a very influential position in the so-called modern data stack. There are a couple of factors that makes employing Data Warehouses a favored approach: having a single source of truth is very alluring and since horizontal functions like sales, finance or growth marketing need to see org-wide data and having it all in one place can seem like a good excuse to continue with a centralized approach.
Hendrerit enim egestas hac eu aliquam mauris at viverra id mi eget faucibus sagittis, volutpat placerat viverra ut metus velit, velegestas pretium sollicitudin rhoncus ullamcorper ullamcorper venenatis sed vestibulum eu quam pellentesque aliquet tellus integer curabitur pharetra integer et ipsum nunc et facilisis etiam vulputate blandit ultrices est lectus eget urna, non sed lacus tortor etamet sed sagittis id porttitor parturient posuere.
Sollicitudin rhoncus ullamcorper ullamcorper venenatis sed vestibulum eu quam pellentesque aliquet tellus integer curabitur pharetra integer et ipsum nunc et facilisis etiam vulputate blandit ultrices est lectus vulputate eget urna, non sed lacus tortor etamet sed sagittis id porttitor parturient posuere.
Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat commodo sed egestas aliquam sem fringilla ut morbi tincidunt augue interdum velit euismod eu tincidunt tortor aliquam nulla facilisi aenean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget.
“Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat.”
Nisi quis eleifend quam adipiscing vitae aliquet bibendum enim facilisis gravida neque velit euismod in pellentesque massa placerat volutpat lacus laoreet non curabitur gravida odio aenean sed adipiscing diam donec adipiscing tristique risus amet est placerat in egestas erat imperdiet sed euismod nisi.
Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat commodo sed egestas aliquam sem fringilla ut morbi tincidunt augue interdum velit euismod eu tincidunt tortor aliquam nulla facilisi aenean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget.
There’s no question there have been several improvements in the Data Warehouse area since it has been in use in the past several decades. I think the most important one is something that made Snowflake popular: the decoupling of compute and storage. The most important feature is not really a feature, but the fact that when you hear people talking about a Data Warehouse, they’re most probably referring to a Cloud Data Warehouse. One of the triumvirate: Snowflake, Big Query or Red Shift. Data Warehouses have moved to the cloud and are available on a usage-based pricing model.
While these are improvements in the way Data Warehouses work and are deployed or commissioned, there has been no change in the way they are used in the workflow of data engineering teams.
The main bottleneck in creating a decentralized, data culture is this: Data Warehouses are still centrally managed. There’s typically a team (or a person) whose job it is to “maintain” the Data Warehouse. They’re stewards of what gets in, who has access to it, how to remove cruft that might form over time, creating org-side models and marts etc.
Previously, we discussed how, if you’re looking to implement something like a Data Mesh, the cultural changes are arguably more difficult to implement compared to the technology.
Come to think of it, Data Mesh is a suggested set of new workflows more than it is a set of suggested technologies. You can use a lot of different technologies to achieve a Data Mesh implementation.
This is an important question to answer. A road has been laid tunneling through mountains and bridging wild rivers, but who are we planning to take along to the promised land? Let’s see how we can go about this.
If your data engineering and data analytics workflow has the now common, steward secured centralized Data Warehouse-based workflow at the heart of it, you are far from realizing your Data Mesh implementation goals. But does that mean the Data Warehouse is incompatible with Data Mesh? The answer is: it depends.
Let’s look at a few different aspects:
When you want to implement a greenfield Data Mesh, it is not just a technical, but more often a political undertaking. Moreover, we’ve discussed how making the required cultural changes to make the Data Mesh work is probably more than half the work compared to what needs to be achieved from a technical perspective.
It is indeed a good idea to then seek and enable champions within the organization who are willing to create and maintain Data Products and by quickly leveraging existing tools in order to get there. The main idea is to organically convince the whole organization about the power of Data Products and how transformational they can be in setting up a data-driven culture. The same frugal approach that startups adopt then needs to be engaged in the creation of data products by leveraging whatever is available without first having to jump through the hoops of budget approvals for something as new as Data Mesh.
Once the various teams in the organization actually use a couple of Data Products and experience their benefits first hand, it then becomes not just easy to get budgets and resources to work on Data Mesh, it also makes it easy to convince domain owners to take up sponsorship and ownership of Data Products from their domains.