Simple yet effective Data Validation method !

Introduction 

In this post we will see how we can do data validation using CASE statement. 

Scenario

We want to validate not only pick the records with data mismatch but also the exact columns where the data mismatch is present. 
We are aware that we can make use of EXCEPT queries to find the records with data mismatch, but in case there are millions of records in a our tables with large number of columns and if our except query fetches even few thousand records, it is simply a nightmare to zero in on the exact values.

We can use JOIN coupled with CASE statement to do above. 

In the above scenario we have a case where we have a PK present, which we will use to JOIN and then do a column by column comparison using CASE statement to pick the data mismatches.

P.S. Joining column must be present to make use of this query. 

USE tempdb
GO

SET NOCOUNT ON
GO

-- This table is one which is having correct data
CREATE TABLE TableWithCorrectData(
id INT PRIMARY KEY
,Name VARCHAR(25)
,Dept VARCHAR(20)
)
GO

INSERT INTO TableWithCorrectData VALUES
(1,'John','Accounts')
,(2,'Mary','Accounts')
,(3,'David','Finance')
,(4,'Michael','Technology')
,(5,'Sam','Finance')
GO

-- This table is to be validated
CREATE TABLE TableToBeValidated(
id INT PRIMARY KEY
,Name VARCHAR(25)
,Dept VARCHAR(20)
)
GO

INSERT INTO TableToBeValidated VALUES
(1,'John','Accounts')
,(2,'Mary','Account')
,(3,'David','Finance')
,(4,'Michael','Technology')
GO

-- Lets get the validation done

SELECT T1.id AS Valid_id
, T2.id AS Test_id
, CASE WHEN T1.id = T2.id THEN 'T' ELSE 'F' END  AS ValidationResult_id
, T1.Name AS Valid_Name
, T2.Name AS Test_Name
, CASE WHEN T1.Name = T2.Name THEN 'T' ELSE 'F' END  AS ValNameationResult_Name
, T1.Dept AS Valid_Dept
, T2.Dept AS Test_Dept
, CASE WHEN T1.Dept = T2.Dept THEN 'T' ELSE 'F' END  AS ValidationResult_Dept
FROM TableWithCorrectData AS T1
LEFT OUTER JOIN TableToBeValidated AS T2
ON T1.id = T2.id;


--Clean up
DROP TABLE TableToBeValidated;
DROP TABLE TableWithCorrectData;

Using above logic as a building block you can validate huge tables with many columns in an effective manner and pick up the exact data points where mismatch is present. 
In case you are wondering if you always have to type in the query with all these columns, then don't be afraid. You can always use the system tables like sys.columns, concatenate the column list and then employ dynamic query to create the above query as you might notice there is a certain structure about the above query. Thus, saving yourself from the trouble of typing in that huge column list. 

There can be a number of ways to do data validation. The above method is just one of them. 

Thank You ! 

Comments

Popular posts from this blog

Simple Program to Check Active Requests in ODSOs / ADSOs

Simplification List for SAP BW/4HANA (SAP Release 18th June’ 2018)

Unforeseen Issue during SAP BW Data Flow Migration