Data Integrity - Unicity Integrity

Data Integrity is part of the domain and the responsibility of IT. IT is responsible for the structure of data (both logical and physical) and the Business is responsible for the content

12/14/2023

Data Integrity - Unicity Integrity

Data Integrity is part of the domain and the responsibility of IT. IT is responsible for the structure of data (both logical and physical) and the Business is responsible for the content.

An important artifact of the system analysis phase of the system development cycle is the data model. The most common type of data model is the entity-relationship (ER) data model. The ER data model consists of entity types (for instance Customer), the properties (translated into attributes) of the entity types and the relationships (One-To-One, One-To-Many, Many-To-Many and whether these are mandatory or optional) between the entity types.

To control the behavior of the system - among others - a number of integrity rules are implemented, which are secured by the Relational Database Management System (RDBMS). These integrity rules can be categorized into:

  • Unicity integrity implemented with Primary Key constraints enforce a record in a table (which is a physical representation of an entity type) to be unique

  • Referential integrity implemented with Foreign Key constraints enforce a value of an attribute of a table to be present as the Primary Key of another table

  • Attribute integrity implemented with Data Type, Value and Boundary constraints limits the possible values to be stored in the attribute

Violation of the unicity integrity

Though these integrity rules are known since the 1970s still there are systems not having these integrity rules implemented and still there are data collections not having unique keys. Seen from Data Quality Management perspective the violation of the unicity integrity is the most severe

  • When this issue occurs in a source system, then the source system must be changed with the highest priority to proactively prevent this situation and to cleanse the dataset by removing duplicate records

  • When this issue occurs in a provided source dataset, then the source dataset must be sent back to the data source provider together with the request to provide a new dataset having unique records

Source dataset is not unambiguous
It is possible that a source dataset contains unique identifiers, but that the content is ambiguous: the content of a combination of attributes is not unique while it is supposed to be. This situation is proactively prevented by the implementation of unique indexes created on top of one or more non-primary key attributes of the source dataset.