|
This paper examines the nature, cause and impact of data quality problems in typical IT system implementations. It argues that the extent and impact of poor data quality is largely misunderstood and often ignored. It is proposed that the problem of poor data quality is primarily a behavioral problem, not a technological problem. This paper provides an approach for improving data quality through clear accountability for the prevention, detection and correction of data problems.
The Data Quality Problem Data quality problems cost organizations millions of dollars, waste vast amounts of time and resources, and deceive management into making very poor decisions. Many executives and managers dramatically underestimate the tremendous damage poor quality data inflicts on their organization each year. This, coupled with a lack of understanding regarding how to solve data quality problems, often leads management to ignore this critical issue.
Data is the life-blood of technology systems. While hardware and software are the infrastructure, the veins and arteries of a system, it is the data that actually gives the system life. Without data to fuel the systems the technology is of no value. Yet, many organizations focus the majority of their attention on installing servers and developing applications while paying little or no attention to ensuring data is consistently of the highest quality.
Data is ubiquitous in an organization. The same piece of data is used multiple times for multiple purposes. For example, address data is used for deliveries, invoices and marketing. Product data is used for sales forecasting, marketing, financial forecasts and supply chain management. Data is used to process transactions quickly (efficiency) and to fuel analytics packages to enable better decisionmaking (effectiveness). Given the multiple touch-points and purposes of data, it is absolutely critical that the data is of the highest quality. While data only needs to be entered correctly once to add value, the damage poor data quality causes is felt every time the data is used! Many organizations acknowledge poor data quality is a major problem, yet they accept it as inevitable. Poor data quality is neither acceptable nor inevitable.
Why Is Data Quality Important? It should be obvious why data quality is important. Data is the driver of business decisions, actions and transactions. Data is used for all aspects of business ? sales, marketing, production, support, finance and legal to name just a few. Historical data is used to make decisions that affect the future success of the organization. Organizations that harness accurate historical data to improve efficiency and effectiveness have a massive advantage over their competition.
While data presents many opportunities, it is also the source of tremendous problems. Poor data costs organizations huge sums each year. For example, poor customer data contributes to millions of wasted dollars in poorly targeted sales and marketing campaigns. Erroneous financial data causes organizations to restate financial earnings and suffer the associated consequences with regulatory agencies and shareholders. Other data quality problems cause organizations negative publicity and legal problems. All of these examples are the result of poor quality data and each incident causes tremendous problems for the organizations in which they occur. The problems are all avoidable if the organization takes steps to improve the quality and use of its data. Worse yet, the massive costs to recover from these sorts of data related problems far exceed the costs it would take to address the data problems proactively in the first place and prevent the crisis from ever occurring.
Data Quality Defined High quality data has six key attributes: accuracy, reliability, credibility, timeliness, completeness and appropriateness. The degree to which data meets each of these criteria determines its quality. Each attribute is defined as:
Accuracy: Data is correct and precisely reflects the object or transaction it describes. Reliability: Data is consistent across multiple transactions. Credibility: The degree to which users trust both the accuracy and reliability of data. Data must be credible in order for it to be utilized for analytical or decision-making purposes. Timeliness: Data is available to the end-user when it is needed. Data that is not available when needed is of no value.
|