The human side of AI and data annotation
When most people think of artificial intelligence (AI), computers and datasets are first to come to mind. But in actuality, the most critical piece of any AI project is one that is rarely considered: humans.
AI’s effectiveness depends on humans to source huge amounts of data and train AI algorithms to perform intended tasks. More and more businesses are turning to AI and machine learning to update their customer experience and back office operations. This is why human involvement in data capture and the maintenance of systems over time is extremely important.
Why humans are important to data collection
For most AI projects, humans are the ones behind the scenes spending a significant amount of time and effort selecting and gathering the right kinds of data and labeling it appropriately. In healthcare, that could mean collecting and annotating thousands of pictures of rashes to teach AI to identify skin cancer. In financial services, that could mean teaching systems to detect fraud by sifting through millions of micro-transactions.
Refining systems well enough to be truly useful takes a lot of data; the problem is, a lot of the data out there is not of the best quality. Or, datasets come in non-standard formats, and need to be cleaned. These obstacles are still prevalent. In fact, a report conducted by Forrester entitled Overcoming obstacles to get to AI at scale notes that “90% of firms are having difficulty scaling AI across their enterprises, and data is a significant reason why.” Over 50% of the respondents in the study indicated that they were unsure of their data needs.
“Furthermore,” the paper continues, “firms struggle with ensuring data quality and data integration issues that leave them unable to connect multiple data sources. Without properly curated data, AI initiatives are destined to fall short — which leads to increased costs, missed deadlines, and regulatory risks.”
The essential guide to AI training data
Discover best practices for the sourcing, labeling and analyzing of training data from TELUS Digital (formerly TELUS International), a leading provider of AI data solutions.
The difficulty of obtaining high-quality data is one of the main reasons companies turn to experts like TELUS Digital for help with AI projects. While there are plenty of datasets readily available for common AI models, some projects require bespoke solutions and human ingenuity.
How are humans involved in the data annotation process?
In data annotation, people add metadata tags to mark up elements of text, images, audio and video clips to give computers an idea of what they’re looking at.
Data collection and data annotation are two different parts of the machine learning product cycle. Where you start your project depends on whether or not you have data. If you don’t, you need to collect it and then annotate it. If you do, you can begin with annotation.
Annotators are hugely influential to AI because they manually label media so that machine learning algorithms understand the samples they are fed, which in turn helps them learn patterns, recognize similar future patterns to predict results, correct false assumptions and build vocabularies. Without humans, no machine learning algorithm would be able to compute the attributes relevant to its work.
“Annotation involves soft skills that data scientists and machine learning engineers don’t necessarily focus on,” says Tina Tseng, a legal analyst with Bloomberg Law.
Data collection and annotation are so crucial to AI that they represent about 80% of the work on any given AI project, according to one estimate, and require an enormous amount of manual work. Unfortunately, the consequences of poor data annotation can be severe.
“Not only can your annotations be messy, but the way you’ve defined your problem may be incorrect,” says Amanda Stent, a natural language processing architect at Bloomberg. “You may have biases in the way that your data was sampled, which means that your annotations are incomplete.” Not only can bias show up in terms of whom you sample, but timing matters, too. “You may be only sampling data from the last year, when the important phenomena happened two years ago.”
Having trained annotators matters not only for data labeling, but also for maintenance going forward. AI is far from a “set it and forget it” solution; AI requires humans to monitor and actively update solutions over time.
How to promote engagement with AI
Data annotation projects do take time, but they don’t have to be a treacherous experience. Whether it’s giving employees flexibility in their schedules and working locations to maintain a healthy work-life balance, ensuring inclusivity and fair pay or proactively providing the latest tools and information, there are things that companies can do to help ensure that their AI projects and the people who power them are successful.
For more information on best practices, download the essential guide to AI training data.