Despite economic impacts from the COVID-19 pandemic, the Artificial Intelligence (AI) industry has continued to grow and is forecasted to continue its growth trajectory in the coming years. However, the industry needs high-quality training data to support this growth.
It’s not surprising then that as the AI industry has grown, the data labeling industry is growing rapidly in tandem. The data annotation tools market size was more than $700 million in 2019 and is predicted to grow by more than 30% in the next six years. As the demand for training data increased, there has also been an increase in offerings from different data labeling tools and partners.
Data Labeling and AI
More refined methods for data annotation and sentiment analysis, new technologies for more accurate processes — these are just some of the industry developments that could lead to higher accuracies, speeds, and customization within the labeling process. These are the developments that could advance the AI industry as a whole.
All this is great, but if the AI industry wants to progress beyond emergence, it needs standards. But while standardization takes place, there’s also a need to unify diversity. In the data labeling world, the ideal scenario would be to have an easy-to-use, highly accurate tool that data labeling companies could contribute to and benefit from. A tool that consolidates the data labeling industry yet contains the diversity needed to produce high-quality, ethical AI software.
This was likely something that Intel thought about when they open-sourced their Computer Vision Annotation Tool (CVAT) in 2019. Basically, the code for CVAT is available for anyone to “inspect, modify and enhance.”
Open source can provide a community with a powerful tool customized based on specific Applications. But that’s not all. Precisely because it’s open source, it’s usually also more stable and secure. More people contribute to the code base, which means that if there are bugs or potential security breaches, it’s more likely to be caught.
Imagine a world where all the data labeling companies come together to create the ultimate data labeling tool. Imagine being able to learn from each other. Imagine being able to use a tool that’s stable, secure, and customizable.
How big will the AI industry be able to grow then? How fast will this growth be?
There are many benefits to open source software, and in many cases, they’ve made it possible for a specific field to grow too. Just think about what WordPress did for blogging and personal websites!
It might seem like a crazy idea to give your technology away and possibly cause a disruption to your business model, but it’s possible to strike a balance.
Say a company wanted to add to CVAT, it would not have to include its entire solution from end-to-end onto an open-source tool. It can simply add the applicable parts of its data annotation tool while keeping the rest of its value propositions and tools private. Besides, a business is built on more than just the tool in use. Or at least it should be. Just because a tool is perfect doesn’t mean the other aspects of the business, such as personnel and processes, are no longer necessary.
Building on CVAT can also be seen as an opportunity to set an industry standard for data labeling work that AI-related businesses and the entire AI industry can benefit from. One area where this would be especially beneficial would be in terms of security. In the training data industry, security is normally one of the top issues. Because, as an industry, we’re handling copious amounts of data, this makes sense.
Currently, most data labeling companies manage their own data security based on existing guidelines and best practices, which are less specific to security and cover other data-related areas as well. The current practice also sees most companies operating in silos, due to the fact that everyone is using “unregulated” proprietary technology.
As it is, how would the data labeling industry be able to expand into areas that require a much higher level of security for data handling, such as the banking industry? For industry expansion to take place, the basics need to be taken care of and security could be a good starting point for putting the right practices in place. It’s less grey compared to other issues that need to be addressed, such as diversity in training data sets.
While the data labeling industry may still be the “Wild Wild West” for now, it’s reassuring to note that there are already opportunities and avenues for standardization. This standardization could be what’s needed to revolutionize the AI industry.
Working together to create the ultimate tool could lead to creating benchmarks, policies and standards for all training data to come. Perhaps this one step will take us a giant leap closer to an automated future.