The Evolution of Computer Vision
You can easily find computer vision technology in everyday products, from game consoles that can recognize your gestures to cell phone cameras that can automatically set focus on people. It is impacting many areas of our lives.
In fact, computer vision has a long history in commercial and government use. Optical sensors that can sense light waves in various spectrum ranges are deployed in many applications: Like quality assurance in manufacturing, remote sensing for environmental management or high resolution cameras that collect intelligence over battlefields. Some of these sensors are stationary while others are attached to moving objects, such as satellites, drones and vehicles.
In the past, many of these computer vision applications were limited to some closed platforms. But when combined with IP connectivity technologies, they create a new set of applications that were not possible before. Computer vision, coupled with IP connectivity, advanced data analytics and artificial intelligence, will be catalysts for each other, giving rise to revolutionary leaps in Internet of Things (IoT) innovations and applications.
Advancements in Multiple Fields Driving Computer Vision
Environment Designed for Vision
Sight or vision is the most developed of the five human senses. We use it daily to recognize our friends, detect obstacles in our path, to complete tasks and to learn new things. We design our physical surroundings for our sense of vision. There are street signs and signal lights to help us get from one place to another. Stores have signs to help us locate them. Computer and television screens display information and entertainment that we consume. Given the importance of sight, it’s not a big leap to extend it to computers and automation.
What is Computer Vision?
Computer vision starts with the technology that captures and stores an image, or set of images, and then transforms those images into information that can be further acted upon. It is comprised of several technologies working together (Figure 1). Computer vision engineering is an interdisciplinary field requiring cross-functional and systems expertise in a number of these technologies.
As an example, Microsoft Kinect uses 3D computer graphics algorithms to enable computer vision to analyze and understand three dimensional scenes. It allows game developers to merge real-time full body motion capture with artificial 3D environments. Besides gaming, this opens new possibilities in areas like robotics, virtual reality (VR) and augmented reality (AR) applications.
Sensor technology advancements are also happening rapidly at many levels beyond conventional camera sensors. Some recent examples include:
- Infrared sensors and lasers combine to sense depth and distance, which are one of the critical enablers of self-driving cars and 3D mapping applications
- Nonintrusive sensors that track vital signs of medical patients without physical contact
- High frequency cameras that can capture subtle movements not perceivable by human eyes to help athletes analyse their gaits
- Ultra low power and low cost vision sensors that can be deployed anywhere for a long period of time
Computer Vision Gets Smart
The surveillance industry is one of the early adopters of image processing techniques and video analytics. Video analytics is a special use case of computer vision that focuses on finding patterns from hours of video footage. The ability to automatically detect and identify predefined patterns in real world situations represents a huge market opportunity with hundreds of use cases.
The first video analytics tools use handcrafted algorithms that identify specific features in images and videos. They were accurate in laboratory settings and simulation environments. However, the performance quickly dropped when the input data, like lighting conditions and camera views, deviated from design assumptions.
Researchers and engineers spent many years developing and tuning algorithms or coming with new ones to deal with different conditions. However, cameras or video recorders using those algorithms are still not robust enough. Despite some incremental progress made over the years, poor real world performance limited the usefulness and adoption of the technology.
Deep Learning Breakthrough
In recent years, the emergence of deep learning algorithms has reinvigorated computer vision. Deep learning uses Artificial Neural Networks (ANN) algorithms which mimic the neurons of the human brain.
Starting in the early 2010s, computer performance, accelerated by graphics processing units (GPU), have grown powerful enough for researchers to realize the capabilities of complex ANN. Besides, partly driven by video sites and prevalent IoT devices, researchers have large diverse libraries of video and image data to train their neural networks.
In 2012, a version of the Deep Neural Network (DNN), called the Convolutional Neural Network (CNN), demonstrated a huge leap in accuracy. That development drove renewed interest and excitement into the field of computer vision engineering. Now, in applications requiring image classification and facial recognition, deep learning algorithms even exceeded their human counterparts. More importantly, just like humans, these algorithms have the ability to learn and adapt to different conditions.
With deep learning, we are entering an era of cognitive technology where computer vision and deep learning integrate to address high level, complex problems once the domain of the human brain (Figure 2). We are just scratching the surface of what is possible. These systems will continue to improve with faster processors, more advanced machine learning algorithms and deeper integration to edge devices. Computer vision is set to revolutionize IoT.
Increasing Use Cases
Other interesting use cases include:
- Agricultural drones that monitors the health of crops (http://www.slantrange.com/) (Figure 3)
- Transportation infrastructure management (http://www.vivacitylabs.com/)
- UAV drone inspections (http://industrialskyworks.com/drone-inspections-services/)
- Next generation home security cameras (https://buddyguard.io/)
These are just some small examples of how computer vision can greatly increase the productivity in many fields. We are entering the next phase of IoT evolution. In the first phase, we focus on connecting devices, aggregating data and building up big data platforms. In the second phase, the focus will shift to making “things” more intelligent through technologies like computer vision and deep learning, generating more actionable data.
There are many problems to overcome in making the technology more practical and economical for the masses:
- Embedded platforms need to integrate deep neural design. There are difficult design decisions to be made around power consumption, cost, accuracy, and flexibility.
- The industry needs standardization to allow smart devices and systems to communicate with each other and share metadata.
- Systems are no longer passive collectors of data. They need to act upon the data with minimal human intervention. They need to learn and improvise by themselves. The whole software/firmware update process has new meanings in machine learning era.
- Hackers could exploit new security vulnerabilities in computer vision and AI. Designers need to take that into account.
In this post, we had a brief introduction to computer vision and how it is becoming a critical component of many connected devices and applications. Most of all, we predicted its imminent explosive growth and listed some of the hurdles in practical applications. In the next series of posts, we will explore new frameworks, best practices and design methodologies to overcome some of the challenges.