Data Warehouse and Data Quality - Two options to implement data cleansing operations
How can data cleansing operations be implemented in the Data Warehouse There are two options
12/14/20231 min read


Data Warehouse and Data Quality - Two options to implement data cleansing operations
Nowadays everyone agrees to solve data quality issues at the data source, in the source systems. There is also a reason for this, because data quality issues residing in data sources prevent the source systems from properly servicing the execution of business processes.
Data Warehouse solutions have always performed data cleansing operations. A justified question is why Data Warehouse solutions (are able to) cleanse data and why the data cleansing operation is not or cannot be performed at the data source.
Let’s assume that there is a valid reason why data cleansing operation is not or cannot be performed at the data source; how can the data cleansing operation be implemented in the Data Warehouse?
There are two options of which the first option is preferred looking at quality criteria such as Controllability and Maintainability:
Create a dataset containing data cleansing rules, which is (left outer) joined with a source dataset containing data to be cleansed. The dataset containing data cleansing rules than preferably is maintained by the Business using a leigh-weight but robust User Interface
Write code to perform the data cleansing rules. The advice is to only choose this option, when the first option is not possible
Thumbs of rule for data cleansing operations performed by Data Warehouse solutions is that both the result of the data cleansing operation and the original source value is stored in the destination tables of the Data Warehouse. The Data Warehouse presents the quality of the source data “as is” together with the cleansed values. In that situation the Business is always able to check the result of the performed data cleansing operations.