Data quality

Data quality Refers to the status of a set of values of qualitative or quantitative variables. There are many definitions of data quality but data is considered to be “for its intended uses in operations , decision making and planning.” [1] Alternatively, it is possible to use the real-world construct to which it refers. Furthermore, apart from thesis definitions, as data volume Increases, the issue of internal data consistency Becomes significant, Regardless of fitness for use for Any Particular external purpose. People ‘ S views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. Data cleansing may be required in order to ensure data quality. [2]


This list is taken from the online book “Data Quality: High-impact Strategies”. [3] See also the glossary of data quality terms. [4]

  • Degree of excellence exhibited by the data in relation to the portrayal of the actual scenario.
  • The state of completeness, validity, consistency, timeliness and accuracy. [5]
  • The totality of features and characteristics of data that are capable of satisfying a given purpose; The sum of the degrees of excellence for factors related to data. [6]
  • The processes and technologies involved in ensuring the conformity of data to business requirements and acceptance criteria. [7]
  • Complete, standards based, consistent, accurate and time stamped. [8]

If the ISO 9000 : 2015 definition of quality is applied, data quality can be defined. Examples of characteristics are: completeness, validity, accuracy, consistency, availability and timeliness. Requirements are defined as the need or expectation that is stated, generally implied or obligatory.


Before the rise of the inexpensive computer data storage , massive mainframe computers were used to maintain name and address data for delivery services. This was so could mail routed to its destination. The mainframes used to correct the missing or misspellings in the name of the customers, who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. (NCOA), the National Change of Address Registry (NCOA) . This technology saved large companies’ millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, As bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved into the walls of corporations, as low-cost and powerful server technology became available. Citation needed ]

Companies with an emphasis is marketing Often Focused Their quality efforts we name and address information, data quality goal is reconnu by Whom? ] As an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, in the case of a product that is a product of a product, 2) avoiding false stock-out; (3) improving the understanding of vendor purchases to negotiate volume discounts; And 4) avoiding logistics costs in stocking and shipping parts across a broad organization. Citation needed ]

For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error , bounds checking of data, cross-tabulation , modeling and outlier detection, verifying data integrity , etc. Citation needed ]


There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expanses the definition of data quality and includes the quality of the fundamental dimensions of the fundamental dimensions of the theory of science (Ivanov, 1972). One framework, dubbed “Zero Defect Data” (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the Service perspective (meeting consumers’ expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004).

A number of categories of desirable attributes (or dimensions) of data. These dimensions commonly include accuracy , correctness , currency, completeness and relevance . (Wang et al., 1993). In this paper, we present the results of the study. Software engineers may recognize this problem as ” ilities “.

MIT has a Total Data Quality Management program, led by Professor Richard Wang, which produces a large number of publications and hosts in International Conference on Information (ICIQ). Hansen, Hansen, Hansen, Hansen, and Hansen.

In practice, data quality is a concern for professionals Involved with a wide-range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management . One industry study estimated the total cost to the US $ 600 trillion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. [9]

In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all US mail sent is incorrectly addressed. [10]

One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. [11]

In fact, the problem is Such a concern That companies are Beginning to set up a data governance team Whose sole role in the corporation is to be responsible for data quality. In some who? ] Organisms, this data governancefunction has-been Established as share of a larger Regulatory Compliance function – a recognition of the importance of Data / Information Quality to organisms.

Problems with data quality of incorrect data; Inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse.

Enterprises, scientists, and researchers are starting to take part in the quality of their common data. [12]

The market is going some way to providing data quality insurance. A number of vendors make tools for analyzing and repairing poor quality data in situ, “service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. A series of tools for improving data, which may include some or all of the following:

  1. Data profiling – Initially assessing the data to understand its quality challenges
  1. Data standardization – a business rules policy that ensures that data conforms to quality rules
  2. Geocoding – for name and address data. Corrected data to US and Worldwide postal standards
  3. Matching or Linking – a way to compare data so that similar, but slightly different records can be aligned. Matching may use “fuzzy logic” to find duplicates in the data. It often recognizes that “Bob” and “Robert” may be the same individual. It might be able to manage “householding”, or finding links between spouses at the same address, for example. Finally, it is possible to build a “best of breed” record, taking the best components from multiple data sources and building a single super-record.
  4. Monitoring – keeping track of data quality over time and reporting changes in the quality of data. Software can also auto-correct the variations based on pre-defined business rules.
  5. Batch and Real time – Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.

There are several well-known authors and self-styled experts, with Larry English perhaps the most popular guru . In addition, IQ International – the International Association for Information and Data Quality was established in 2004 to provide a focal point for professionals and researchers in this field.

ISO 8000 is an international standard for data quality. [13]

Data quality assurance

Data quality assurance is the process of data profiling to discover inconsistencies and other anomalies in the data, as well as performing data cleansing [14] [15] activities (eg removing outliers , missing data interpolation) to improve the data quality.

These activities can be Undertaken as share of data warehousing or as share of the database administration of an existing piece of software implementation . [16]

Data quality control

Data quality control is the process of controlling the use of data. This process is usually done after a Data Quality Assurance (QA) process, which consists of discovery of data inconsistency and correction.

Data Quality Control (QC)

  • Severity of inconsistency
  • Incompleteness
  • Accuracy
  • Precision
  • Missing / Unknown

The Data QC process uses the information from the QA process to decide whether or not to use the process. For example, if a Data QC process is used, it can cause disruption. For example, providing invalid measurements from several sensors to the aircraft can cause it to crash. Thus, the process of obtaining and maintaining the data in a computer system is not limited. Citation needed ]

Optimum use of data quality

Data Quality (DQ) is a niche area required for the integrity of the data management by covering gaps of data issues. This is one of the most important functions of the system. Data quality checks may be defined at full remediation steps. Citation needed ]

DQ checks and business rules may easily overlap if an organization is not attentive of its DQ scope. Business teams should understand the DQ scope thoroughly in order to avoid overlap. Data quality checks are redundant if business logic covers the same functionality and fulfills the same purpose as DQ. The DQ scope of an organization should be defined in DQ strategy and well implemented. Some data quality checks may be translated into business rules after repeated instances of exceptions in the past. Citation needed ]

Below are a few areas of data flows that may be perennial DQ checks:

Completeness and precision DQ checks on all data sources at the point of entry for each mandatory attribute from each source system. Few attribute values ​​are created after the initial creation of the transaction; In such cases, administering these checks becomes tricky and should be done immediately after the defined event.

All data HAVING attributes referring to Reference Data in the organization May be validated Against the well-defined set of valid values of Reference Data to discover new gold discrepant values through the validity DQ check. Results may be used to update Reference Data administered under Master Data Management (MDM) .

All data sourced from a third party to organization’s internal teams May UNDERGO accuracy (DQ) check Against the third party data. These DQ check results are valuable when administered on the basis of which data is made available.

All data columns that refer to Master Data may be validated for its consistency check. A DQ check for the MDM process, but a DQ check administered after the point of entry discovers the failure (not exceptions) of consistency.

As data transforms, multiple timestamps and the positions of that timestamps are captured and may be compared with each other and its leeway to validate its value, decay, operational significance against a defined SLA (service level agreement). This timeliness DQ check can be used to reduce data value decay rate and optimize the policies of data timeline.

In an organization complex logic is usually segregated into simpler logic across multiple processes. Reasonableness DQ checks on such complex logic yielding to a logical result within a specific range of values ​​or static interrelationships (aggregated business rules) may be validated to discover complicated but crucial business processes and outliers of the data, its drift from BAU ) Expectations, and may provide. This check-up may be a simple generic aggregation rule. This DQ check requires high degree of business knowledge and acumen.

Compliant checks and integrity checks need not covered in all business needs, it’s strictly under the database architecture’s discretion.

There are many places in the data movement where DQ checks may not be required. For instance, DQ check for completeness and precision is not a null columns is redundant for the data sourced from database. Similarly, data should be validated for its accuracy with respect to time when the data is stitched across disparate sources. However, this is a business rule and should not be in the DQ scope. Citation needed ]

Regretfully, from a software development perspective, Data Quality is often seen as a non functional requirement. And as such, key data quality checks / processes are not factored into the final software solution. Within Healthcare, wearable technologies or Body Area Networks , generate large volumes of data. [17] The level of detail required to ensure data quality is extremely high and is often under estimated. This is also true for the vast majority of mHealth apps, EHRs and other health related software solutions. However, some open source tools exist that examine data quality. [18] The primary reason for this,

Professional associations

IQ International-the International Association for Information and Data Quality [19]
IQ International is a not-for-profit, vendor-neutral, professional association formed in 2004, dedicated to building the information and data quality profession.

See also

  • Data validation
  • Record linkage
  • Information quality
  • Master data management


  1. Jump up^ Redman, Thomas C. (30 December 2013). Data Driven: Profiting from Your Most Important Business Asset . Harvard Business Press. ISBN  978-1-4221-6364-1 .
  2. Jump up^ “What is data scrubbing (data cleansing)? – Definition from” .
  3. Jump up^ “Data Quality: High-impact Strategies – What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors” . Retrieved 5 February 2013 .
  4. Jump up^ “IAIDQ – glossary” .
  5. Jump up^ Government of British Columbia
  7. Jump up^ “ISTA Con – Innovations in Software Technologies and Automation.” .
  8. Jump up^ Anonymous (23 December 2014). “Data Quality” .
  9. Jump up^ “Liability and Leverage – A Case for Data Quality” .
  10. Jump up^ “Address Management for Mail-Order and Retail” .
  11. Jump up^
  12. Jump up^ E. Curry, A. Freitas, and S. O’Riain,”The Role of Community-Driven Data Curation for Enterprises,”in Linking Enterprise Data, D. Wood, Ed Boston, MA. Springer US, 2010 , Pp. 25-47.
  13. Jump up^ “ISO / TS 8000-1: 2011 Data quality – Part 1: Overview” . International Organization for Standardization . Retrieved 8 December 2016 .
  14. Jump up^ “Can you trust the quality of your data?” .
  15. Jump up^ “What is Data Cleansing? – Experian Data Quality” . 13 February 2015.
  16. Jump up^ “Read 23 Data Quality Concepts Tutorial – Data Warehousing” . Watch Free Video Training Online . Retrieved 8 December 2016 .
  17. Jump up^ O’donoghue, John, and John Herbert. “Data management within mHealth environments: Patient sensors, mobile devices, and databases.” Journal of Data and Information Quality (JDIQ) 4.1 (2012): 5.
  18. Jump up^ Huser, Vojtech; DeFalco, Frank J; Schuemie, Martijn; Ryan, Patrick B; Shang, Ning; Velez, Mark; Park, Rae Woong; Boyce, Richard D; Duke, Jon; Khare, Ritu; Utidjian, Levon; Bailey, Charles (30 November 2016). “Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Datasets”. EGEMs (Generating Evidence & Methods to improve patient outcomes) . 4 (1). Doi : 10.13063 / 2327-9214.1239 .
  19. Jump up^ “IQ International – the International Association for Information and Data Quality” . IQ International website . Retrieved 2016-08-05 .

Leave a Comment

Your email address will not be published. Required fields are marked *