BEAM meets Data Vault and wham bam thank you ma’am
As a BI Practioner
I want I want to understand how to scope things quickly and early
So that so I can estimate and prioritise without a lot of upfront effort
Whether you are pipelining [[link]] your Agile delivery or you are managing to deliver a thin slice [[link]] every sprint iteration both should involve BEAM to gather your data requirements in an agile and repeatable way as well as Data Vault to model and load data in an iterative way, and a way that can be easily refactored.
One of the benefits of these two approaches is that they dovetail almost perfectly together. It is as if Lawrence Corr and Hans Hultren worked together to leverage the strengths of each of the respective methods.
As an aside I know for a fact they haven’t, although not from lack of trying on my part to get these two gurus to find times in their busy schedules to align.
So a quick recap of the core structure of each to set the scene.
In BEAM we capture the following artifacts:
Core business process that are defined by the questions of Who does What.
Example: Customer Orders Product.
Core things that comprise the event, in BEAM speak it’s the 7w’s of Who, What, When, Where, Why, How and How Many.
Example: Customer, Product, Order
- Detail of Detail
Things that help describe or provide context for the core things.
Example: Customer Name, Customer Address, Customer Age, Product Name, Product Type
As an aside I am hoping Lawrence Corr will one day do a revision of his BEAM method and rename Detail of Detail to something else. Try saying it repetitively in a 4 hour modelstorming workshop to experience why!
In Data Vault we model the following artifacts:
Table that only holds the keys for the core entity.
Example: Customer Hub, Product Hub, Order Hub
- Satellite (Sat)
Table that holds all attributes of the core entity.
Example: Customer Sat holds customer name, customer age and yes customer address (this is a discussion for a later article)
Table that holds the keys that represent the relationship of core entities that comprise a business process.
Example: Link table with the following keys, Customer || Product || Order
So lets look at examples for these …….
BEAM Event and Detail
BEAM Detail of Detail
Data Vault Model
Wow look at that BEAM and Data Vault align perfectly:
- BEAM Event = Data Vault Link
- BEAM Detail = Data Vault Hub
- BEAM Detail of Detail = Data Vault Sat
In fact we can even take the next step and map these easily to Dimensions and Facts in a Star Schema, but I won’t.
Keep in Mind
Most practitioners in the data warehouse and Business Intelligence domain have experience modeling using the Dimensional Star Schema pattern, and so are used to the concept of a fact table. BEAM and Data Vault treat the fact record slightly different to a Dimensional pattern.
In BEAM the fact is initially captured as part of the Event template as a How Many. In this example for Customer Orders Product, there is a How Many of Order Value.
In Data Vault the fact is a record in the Sat that hangs of the Hub for the thing that drives the relationship for the Link. In this example the Order Hub.
One of the benefits of BEAM is it also closely aligns with a Dimensional model, and it is still the way Lawrence Corr teaches it in his excellent three day workshops. So in the BEAM templates there is a sheet for defining the fact details. At the moment I tend to not use this template and just extend the Event table to have the How Many’s. However if you have a large number of How Many’s (Facts) I would typically use a hybrid template to capture this.
In the AgileBI BEAM to Data Vault approach, the Verb in the BEAM Event (in this example Order) becomes a Hub in Data Vault.
The How in the BEAM event defines the key for the Hub (i.e Order number). The How Many become entries in the Verb Sat (i.e Order Value in the Order Sat).
A World of Opinions
There are various variations and views on how Data Vaults should be defined, seems we are at the equivalent of the BetaMax vs VHS argument of old. I also know some of the approaches above, specifically around the definition of Facts for BEAM and the use of a Ensemble (Hub and Sat) in Data Vault for the Order key rather than hanging a Sat off the Link table are slight variations of what is typically taught.
However in my view they allow closer alignment between BEAM and Data Vault, which in turn reduces the effort and latency in delivery as well as increasing the agility of the delivery process.