The challenge
Our client, Thorn, was founded in 2012 with the mission to build technology to defend children from sexual abuse. The organization has its own data science and engineering teams focused solely on developing new technologies to meet their objective. Their ML/AI is used across the child safety ecosystem to accelerate victim identification, prevent revictimization and proactively prevent abuse. TELUS Digital was tasked with annotating 90,000 images and text strings from Thorn’s confidential database and the dark web over an eight-week period. The labeled data would be used to train one of their ML/AI models to identify online sexual harms against children.
The TELUS Digital solution
As this project required the annotation of highly sensitive material, the team of 10 annotators was strategically selected based on a number of key criteria, including seniority, skill profile and emotional maturity. The classification work required the annotators to determine which offensive material category an image or text string belonged to. Due to the sensitive nature of the project, the team worked in secure rooms at a TELUS Digital facility using our in-house annotation platform. For heightened security, an API was set up so that the images were not stored locally. Team members worked on a reduced schedule (at full pay) to mitigate the impact of prolonged exposure to the potentially offensive material. All members had access to a robust wellness program that included custom counseling and regular wellness check-ins.
Key features of the program:
- A secure API and file transfer system with rigorous security checks
- Reduced work hours to manage the impact of the sensitive material
- Personal identifiable information (PII) was stripped from all messages to protect innocent victims
- Multiple, iterative calibrations were completed to ensure labeling consistency and high annotator agreement both within the labeling team and with Thorn’s issue experts
The results
The team delivered the project on time and with minimal revisions required, achieving a 93% accuracy rate. Efficient communication between the client and our team led to revision requirements being met ahead of schedule. The thousands of annotated text strings and images from Thorn’s database are being used to help its model learn to identify threats online in order to accelerate victim identification, stop revictimization and prevent abuse. We continue to deliver accurately annotated data to the client following the success of the first portion of this project.