Page under construction

🚧

Data Quality

Data quality refers to how well-suited data is for its intended use, focusing on aspects like accuracy, completeness, and consistency. In data reliability engineering, data quality is crucial because it ensures that the data systems an organization relies on are dependable and can support accurate decision-making and efficient operations.

For those interested in data reliability engineering, understanding data quality is essential. High-quality data leads to reliable systems that businesses can trust for their critical operations and strategic decisions. This chapter will dive into the practical side of maintaining and improving data quality, making it a key skill set for data professionals.

We'll cover important topics like master data management, which helps keep data consistent across the organization, and data governance, ensuring data remains accurate and secure. We'll also look at different data quality models that provide frameworks for assessing and improving data quality. These topics are geared towards giving you actionable insights and tools to enhance the reliability of your data systems.

The goal of this chapter is to bridge the gap between theoretical data quality concepts and their practical application in data reliability engineering, providing actionable insights for improving data systems' robustness and dependability and introducing a variety of data quality models, standards, and best practices, enabling data professionals to assess, monitor, and enhance the quality of data within their organizations, thus contributing to overall system reliability.

The topics in this chapter on Data Quality are based on ideas from the book "Calidad de Datos" (Data Quality) by Ismael Caballero MuƱoz-Reja and others. The book is published by "Ediciones de la U" and "Ra-Ma". We chose to follow this book's approach to make sure we cover data quality thoroughly and in a way that's useful for Data Reliability Engineering. This way, we're using trusted information from experts to help you understand data quality clearly and systematically.

As a very special note, this chapter mentions a lot the term Data Reliability, which is not the same as Data Reliability Engineering. Data reliability refers to the trustworthiness and dependability of data, while data reliability engineering is the practice of designing, implementing, and maintaining systems and processes to ensure data remains reliable. Both terms were oversimplified here, but both will be explored further in the book.

This chapter is divided into five parts:

Foundations of Data Quality

This section explains how governance, data management, and data quality management differ and work together, highlighting their importance in aligning with ISO/IEC 38500 standards to meet organizational goals and manage data risks efficiently. We'll also explore the concept of data lifecycle.

Master Data

Master data is the core information an organization uses across its systems, and master data management is the process of organizing, securing, and maintaining this information to ensure it's accurate and consistent. This section explores entities resolution, master data architecture, maturity models, and standards.

Data Management

Here we'll explore various frameworks and models that guide how organizations can systematically improve the handling and quality of their data. Including DAMA DMBOK, Aiken's Model, Data Management Maturity Model (DMM), Gartner's Model, Total Quality Data Management (TQDM), Data Management Capability Assessment Model (DCAM), and the Model for Assessing Data Management (MAMD).

Data Quality Models

Data Quality Models are fundamental frameworks that define, measure, and evaluate the quality of data within an organization. Here we'll explore various criteria, known as dimensions, that help evaluate and enhance the quality of organizational data.

Final Thoughts on Data Quality

This section emphasizes that good data quality, covering aspects like accuracy and completeness, is essential for data reliability and underlies trustworthy business decisions, with a focus on proactive measures to ensure data integrity during integration, influenced by solid data architecture and metadata management.