Fix Data Quality Problems at the Source: How to Realistically and Permanently Pay Down Data Debt
Data debt, poor quality data, is the bane of most organizations. Without high-quality data your leadership cannot make data-informed decisions, nor can you build effective artificial intelligence (AI) based systems. Traditional data quality (DQ) strategies have proven ineffective in practice, hence your data debt problem, and after-the-fact fix strategies such as data transformation into your data warehouse (the T in ETL and ELT) or data cleansing at the point of usage are expensive band aids at best.
The solution is to address your data debt at the source once and for all. This includes fixing any DQ defects in data sources, schema problems within those data sources, and the logic in source systems that are injecting the DQ problems in the first place. This is accomplished via data repair, database refactoring, and classic code refactoring respectively. This presentation overviews the DQ challenges we face, the impact of those challenges, and how each of these techniques work in practice for production data sources. It also discusses strategies for building the business imperative, and the technical culture, for this multi-year continuous improvement strategy to remove data debt.
What You’ll Learn About Addressing Data Debt
-
- Definitions of technical debt and data debt
- Why data quality is important
- Types of data debt
- Strategies for avoiding data debt
- Strategies for removing data debt
- Strategies for accepting data debt
Audience For This Presentation
-
- Experienced developers who want to learn about the realities of data debt
- Architects who want to avoid injecting new data debt or reduce existing debt within their organization
- Managers and leaders who want to understand and address data debt
Why You Want to Hear About Addressing Data Debt From Me
I’m the thought leader behind the Agile Data method, a primary focus of which is on effective strategies to either avoid to to remedy data debt, a term that I coined. More importantly, along with Pramod Sadalage I co-authored Refactoring Databases, the first book to describe how to pragmatically fix data at the source. This presentation shares my experiences and both technical and management ways of working (WoW) pertaining to data debt.
Presentation History
I have given versions of this presentation at data and agile conferences, and to user groups, since 2017.