Poor quality is not an inevitable attribute of software. It results from known causes. It can be predicted and controlled, but only if its causes are understood and addressed. With more critical business processes being implemented in software, quality problems are a primary business risk.
In recent years, corporate scandals, regulatory changes, and the collapse of many financial institutions have brought much warranted attention to the quality of enterprise information. Best practices have been developed and discussed. Data quality is no longer the domain of just the data warehouse. It is accepted as an enterprise responsibility.
Each instance of a quality issue presents challenges in both identifying where problems exist and in quantifying the extent of the problems. Quantifying the issues is important in order to determine where our efforts should be focused first. A large number of missing email addresses may well be alarming but could present little impact if there is no process or plan for communicating by email. It is imperative to understand the business requirements and to match them against the assessment of the problem at hand. Consider the following seven sources of data quality issues:
- Entry quality: Did the information enter the system correctly at the origin. It is probably the easiest problem to identify but is often the most difficult to correct. Entry issues are usually caused by a person entering data into a system.
- Process quality: Was the integrity of the information maintained during processing through the system. Process quality issues usually occur systematically as data is moved through an organization. They may result from a system crash, lost file, or any other technical occurrence that results from integrated systems.
- Identification quality: Are two similar objects identified correctly to be the same or different. Identification quality problems result from a failure to recognize the relationship between two objects. For example, two similar products with different stock keeping unit (SKU) are incorrectly judged to be the same.
- Integration quality: Is all the known information about an object integrated to the point of providing an accurate representation of the object. Integration quality, or quality of completeness, can present big challenges for large organizations. Integration quality problems occur because information is isolated by system or departmental boundaries.
- Usage quality: Is the information used and interpreted correctly at the point of access. Usage quality often presents itself when data warehouse developers lack access to legacy source documentation or subject matter experts. Without adequate guidance, they are left to guess the meaning and use of certain data elements. Another scenario occurs in organizations where users are given the tools to write their own queries or create their own reports. Incorrect usage may be difficult to detect and quantify in cost.
- Aging quality: Has enough time passed that the validity of the information can no longer be trusted. The most challenging aspect of aging quality is determining at which point the information is no longer valid. Usually, such decisions are somewhat arbitrary and vary by usage. For example, maintaining a former customer’s address for more than five years is probably not useful. If customers haven’t been heard from in several years despite marketing efforts
- Organizational quality: Can the same information be reconciled between two systems based on the way the organization constructs and views the data. Organizational quality, like entry quality, is easy to diagnose and sometimes very difficult to address. It shares much in common with process quality and integration quality but is less a technical problem than a systematic one that occurs in large organizations.
https://www.melissadata.com/enews/articles/0611/2.htm Source: Information Management June 2009 (www.information-management.com). William McKnight is partner, Information Management, at Lucidity Consulting Group.