One of the most important rules of using data for business purposes is this: the quality of your decisions is heavily dependent on the quality of your data. However, simply knowing it isn’t very useful. To get measurable results, you should measure the quality of your data and act on the results. We shed some light on complex data quality issues and offer advice on how to excel at resolving them.
Attributes, measures, and metrics for defining data quality
It would be appropriate to begin this section with a universally accepted definition of data quality. But here’s the first problem: there aren’t any. In this regard, we can draw on our 32-year experience in data analytics and offer our own definition: Data quality refers to the state of data and its ability (or inability) to solve business tasks. This state can be either “good” or “bad,” depending on how closely data matches the following attributes:
Why is poor data quality a problem?
Do you believe that the overall problem of poor data quality has been exaggerated, and that the attributes mentioned above are not worthy of the attention they have received? We’ll show you real-world examples of the impact low-quality data can have on business processes.
A manufacturer believes they have pinpointed the exact location of the truck transporting their finished goods from the manufacturing site to the distribution center. They optimize routing, forecast delivery times, and so on. And it turns out that the location information is incorrect. The truck arrives later, disrupting the distribution center’s normal workflow. Not to mention routing recommendations that proved ineffective.
Assume you’re working to improve your supply chain management. You track delivery time to assess suppliers and determine which are disciplined and trustworthy and which are not. However, unlike the scheduled delivery time, the actual delivery time field is optional in your system. Naturally, your warehouse employees frequently fail to enter it. You cannot understand how your suppliers perform if you do not have this critical information (incomplete data).
Data interpretation that is ambiguous
A field called “Breakdown reason” in a machinery maintenance system may be used to help identify what caused the failure. It is usually in the form of a drop-down menu with the option “Other.” As a result, a weekly report may state that the “Other” reason caused the machinery failure in 80% of cases. As a result, a manufacturer may experience low overall equipment efficiency without knowing how to improve it.
At first glance, duplicated data may not appear to be a problem. However, it has the potential to become a serious problem. For example, if a customer appears in your CRM more than once, it not only consumes more storage but also results in an incorrect customer count. Furthermore, duplicated data undermines marketing analysis by disintegrating a customer’s purchasing history, leaving the company unable to understand customer needs and segment customers properly.
Consider a customer who filled out a retailer’s questionnaire and stated that they did not have children. However, time passed, and they now have a newborn child. The delighted parents are ready to spend their money on diapers, baby food, and clothing, but is our retailer aware of this? Is this customer part of the “Customers with Babies” group? Both are no. This is how outdated data can lead to incorrect customer segmentation, poor market knowledge, and lost profits.
Late data entry/update
Data analysis and reporting, as well as your business processes, may suffer as a result of late data entries and updates. An invoice sent to the incorrect address is a common example to demonstrate the situation. Here’s another example of asset tracking to round out the story. The system can report that the cement mixer is currently unavailable because the responsible employee is several hours late in updating its status.
Do you want to avoid the consequences of poor data quality?
LivYoung Realtech provides services ranging from consulting to implementation to assist you in fine-tuning your data quality management process and ensuring that your decision-making is not hampered by poor data quality.
Best Practices of Data quality management
Because the consequences of poor data quality can be disruptive, it’s critical to learn about the solutions. Here, we share best practices for improving the quality of your data.
Prioritizing data quality
The first step is to prioritize data quality improvement and ensure that every employee understands the issues that poor data quality causes. Sounds easy enough. However, incorporating data quality management into business processes necessitates a number of significant steps:
- Creating a data strategy for the entire organization.
- Defining user roles, including rights and accountability.
- Creating a data quality management process (detailed explanation later in the article).
- Having a dashboard to monitor the current situation.
Data Entry Automation
Manual data entry, whether by employees, customers, or multiple users, is a common root cause of poor data quality. Thus, businesses should consider how to automate data entry processes in order to reduce human error. It is worth implementing whenever the system can do something automatically (for example, autocompletes, call or e-mail logs).
Preventing duplicates rather than simply curing them
It is a well-known fact that it is easier to prevent a disease than to cure it. Duplicates can be treated similarly! On the one hand, you can just clean them on a regular basis. On the other hand, you can create duplicate detection rules. They allow determining whether a similar entry already exists in the database and forbid creating another one or suggest merging the entries.
Taking care of both the master and the metadata
Maintaining your master data is critical, but don’t forget about your metadata. Companies, for example, will be unable to control data versions unless metadata reveals time stamps. As a result, they may extract obsolete values for their reports rather than updated ones. qq
Stages of Data quality management
Data quality management is a process for achieving and maintaining high data quality. Its main stages include defining data quality thresholds and rules, assessing data quality, resolving data quality issues, and monitoring and controlling data.
- Define data quality thresholds and rules
If you believe that there is only one option – perfect data that is 100% compliant with all data quality attributes (in other words, 100% consistent, 100% accurate, and so on – you may be surprised to learn that there are more options. To begin, reaching 100% everywhere is an extremely expensive and time-consuming endeavor, so companies typically decide what data is critical and focus on several data quality attributes that are most applicable to this data. Second, a company does not always require 100% perfect data quality; sometimes ‘good enough’ is sufficient. Third, if you require different levels of quality for different types of data, you can set different thresholds for different fields. You may be wondering how to determine whether the data meets these thresholds or not. You should establish data quality rules for this.
When the theory is finished, we’ll move on to a practical example.
Assume you decide that the customer full name field is critical for you and set a 98% quality threshold for it, while the date of birth field is less important and will suffice with an 80% threshold. As a next step, you decide that the customer’s full name and date of birth must be complete and accurate (that is to say, it should comply with the orderliness attribute). Because you chose several data quality attributes for the customer’s full name, they should all meet the 98% quality threshold.
Now you create data quality rules that you believe will cover all of the data quality attributes you’ve chosen. In our case, they are as follows:
- The customer’s full name cannot be N/A. (to check completeness).
- At least one space must be included in the customer’s full name (to check accuracy).
- Customer names must only contain letters; no numbers are permitted (to check accuracy).
- Only the first letters of the customer’s name, middle name (if applicable), and surname must be capitalized (to check accuracy).
- Date of birth must be a valid date between January 1, 1900 and January 1, 2010.
- Evaluate the data quality
Now it’s time to examine our data and see if it complies with the rules we established. So we begin profiling data, or gathering statistical information about it. That’s how it works: we have 8 individual records (though your actual data set is undoubtedly much larger) that we compare to our first rule. The customer’s full name cannot be N/A. All of the records follow the rule, indicating that the data is completely complete.
We have three rules for measuring data accuracy:
- At least one space must be included in the customer’s full name.
- Customer names must only contain letters; no numbers are permitted.
- Only the first letters of the customer’s name, middle name (if applicable), and surname must be capitalized.
- Address data quality issues
At this point, we should consider what caused the problems in order to eliminate their root cause. In our example, we identified several issues with the customer full name field that can be resolved by instituting clear manual data entry standards, as well as data quality-related key performance indicators for employees responsible for entering data into a CRM system.
In the date of birth field example, the data entered was not validated against the date format or range. We clean and standardize the data as a stopgap measure. To avoid such errors in the future, we should implement a system validation rule that will not accept a date unless it conforms to the format and range.
- Data monitoring and control
Data quality management is a continuous process, not a one-time effort. You must review data quality policies and rules on a regular basis in order to continuously improve them. Because the business environment is constantly changing, this is a must. Assume a company decides to enrich their customer data by purchasing and integrating an external data set containing demographic data one day. As a result, they will almost certainly need to develop new data quality rules, as an external data set may contain data that they have not previously dealt with.
Categories of data quality tools
To address various data quality issues, companies should consider not one tool but a combination of them. For example, Gartner names the following categories:
- Parsing and standardization tools break the data into components and bring them to a unified format.
- Cleaning tools remove incorrect or duplicated data entries or modify the values to meet certain rules and standards.
- Matching tools integrate or merge closely related data records.
- Profiling tools gather stats about data and later use it for data quality assessment.
- Monitoring tools control the status-quo of data quality.
- Enrichment tools bring in external data and integrate it into the existing data.
Currently, the market offers a plethora of data quality management tools. The trick is that some of them concentrate on a single type of data quality issue, while others cover several. To choose the right tools, you should either devote significant time to research or hire professional consultants.
Boundless data quality management in a Nutshell!
Data quality management protects you from low-quality data, which can completely undermine your data analytics efforts. However, in order to do data quality management correctly, you must consider a number of factors. Choosing metrics to assess data quality, selecting tools, and describing data quality rules and thresholds are just a few of the critical steps. Hopefully, with professional assistance, this difficult task can be completed. We at Livyoung Realtech, are happy to assist you with your data quality management project at any stage. If you want to organize your data management process promptly and correctly, we are ready to share and implement our best practices. For more information, check our Data Management System.