1. Insights
  2. AI Data
  3. Article
  • Share on Facebook
  • Share via email

Quality assurance best practices for AI training data

Posted October 18, 2022
Illustration of an assembly line with bugs being removed from computer code via automated arms, meant to symbolize quality assurance in AI training data

As sytems that are based on artificial intelligence (AI) become more prevalent, the adage “garbage in, garbage out” has never been more applicable.

While the tools and techniques for building AI-based systems have become democratized, the quality of AI predictions remains highly dependent on quality training data. Without data quality management, you will not be able to accelerate your AI development strategy.

Data quality in AI has multiple dimensions. First, there is the quality of the source data. That could come in the form of images and sensor data for autonomous vehicles, or text from support tickets, or material from more complex business correspondence.

Wherever it comes from, unstructured data needs to be annotated for machine learning algorithms to build models that power AI systems. The quality of annotation, therefore, is also critical to the overall success of your AI implementations.

Setting baseline standards for data annotation quality control

An effective annotation process is a critical step to ensure better model output and avoid problems early in the model development pipeline.

The essential guide to AI training data

Discover best practices for the sourcing, labeling and analyzing of training data from TELUS Digital (formerly TELUS International), a leading provider of AI data solutions.

Download the guide

Annotation is most effective when clear guidelines are in place. Without the rules of engagement, annotators can’t be consistent in their application.

As well, it’s important to keep in mind that annotation data quality operates at two levels:

  • The instance level: Each example used to train a model is properly annotated. This requires a clear understanding of the annotation criteria and data quality metrics, as well as data quality checks to ensure labels are correct.
  • The dataset level: Here there’s a need to ensure the dataset is not biased. This can happen easily — for example, if most of the images of vehicles and roads in a dataset are taken during the day and only a small number are from the night. In this case, the model will not be able to learn how to correctly identify objects in images taken in the dark.

Designing an efficient quality assurance process for data annotation

The first step to ensuring data quality in annotation is to define the right quality metrics. This allows you to measure the quality of a dataset in quantifiable terms. For example, when working on a natural language processing (NLP) model for a voice assistant, you will need to identify the correct syntax for utterances in multiple languages.

In addition to defining metrics, it is important to have tests that can be applied for measuring those metrics using a common set of examples. Ideally, the team annotating the dataset would create the test. This will help the team agree on a set of guidelines and provide objective measures to identify how well annotators are performing.

Human annotators may disagree on the proper annotation of an object. For example, if a pedestrian in a crosswalk image is only partially visible, one annotator may not label it as a pedestrian while another annotator might. Use a small calibration set to clarify guidelines and expectations, as well as to describe how to treat edge cases or subjective annotations.

Even with detailed guidelines, there may be cases where annotators disagree. Decide how you will resolve those cases — for example, by consensus or inter-annotator agreement. Discussing data collection protocols, annotation requirements, edge cases and quality metrics beforehand can be the key to ensuring your annotation is streamlined.

Meanwhile, always remember: Humans experience fatigue, and methods to detect this fact of life need to be accounted for in order to preserve data quality. Consider periodically inserting ground truth data in your dataset to spot common problems associated with fatigue, such as mislabeled objects, unassigned attributes, inaccurate boundaries/color associations and missed annotations.

Another important consideration is that AI is applied in a wide range of domains. Annotators may need some amount of domain knowledge to properly annotate data from specialized fields such as medicine and finance. You may need to consider designing customized training programs for such projects.

Implementing standardized quality control processes

Data quality processes should be standardized, adaptable and scalable. It is not practical to manually check all the parameters of all the annotations present in a dataset — especially when dealing with millions of annotations. That’s why it’s of value to create a statistically significant random sample that will be a good representation of the dataset.

Decide what data quality metrics you will use to evaluate quality. Precision, recall and F1-scores (the harmonic mean of precision and recall) are widely used in classification tasks.

Another essential element of standardized quality control processes is the feedback mechanism used to help annotators correct their errors. In general, you want to take a programmatic approach to detecting errors and informing annotators. For example, the dimensions of general objects can be capped for a particular dataset. Any annotation that does not adhere to the pre-set limits can automatically be blocked until the error is corrected.

Building efficient quality-control tools is a prerequisite to enabling quick checks and corrections. In a computer vision dataset, each annotation made on an image is visually inspected by multiple evaluators with the help of quality control tools like comments, instance-marking tools and doodles. These error identification procedures help evaluators flag incorrect annotations during the review process.

Use an analytics-based approach to measuring annotator performance. Metrics such as average making/editing time, project progress, tasks completed, person-hours spent on different scenarios, the number of labels/day and delivery ETAs are all useful for managing the data quality of annotations.

Data quality management in summary

Research from VentureBeat shows that only about 13% of machine learning models make it into production. Since quality assurance is an essential part of building AI systems, poor data quality can undermine — and sometimes even sully — what would have otherwise been a successful project.

Be sure to address the need for data quality management early. By designing an efficient quality assurance process and implementing standardized quality controls, you can set your team up for success. This will give you a better foothold for constantly optimizing, innovating and identifying best practices for ensuring the highest quality annotation outputs for all the different types of annotations and use cases you might need in the future. In short, this is an investment that pays off long-term.


Check out our solutions

Test and improve your machine learning models via our global AI Community of 1 million+ annotators and linguists.

Learn more