In recent years, face technology is making its way into the mainstream. Advancements in Deep Neural Networks (DNN) and face scanning cameras greatly increase the accuracy of face detection and recognition. Furthermore, there is a growing list of companies offering developer friendly face API through their SaaS platforms, making it a lot easier and cheaper to incorporate state-of-the-art face technology into different types of products.
This is the first part of the series which will give you a good understanding of the key components behind commonly used face technology, as well as the capabilities and constraints. Though we will focus on facial recognition, other related face technology functions will be mentioned as well. We will also talk about some prominent services currently available in the market.
In Part II of this series, we will walk through design and development considerations, using a case study, in integrating the technology into the final product.
A typical face analytic pipeline is illustrated in Figure 1.
Usually, the face is captured through a webcam or security camera. The positioning and selection of the camera is important to the performance of the face pipeline in the usage scenario. We will talk about that in a later section.
The captured face data can also be three dimensional if depth camera is available, which will make the system even more robust. Nevertheless, the overall procedure will still be similar.
Face processing pipeline starts with detecting faces within camera view, so that only relevant face pixels are passed to subsequent analysis stages. Without going into too much details, the basic idea is to use a “face detector” window box to slide across the image at multiple scales. Face detector is usually a classifier which is trained to look for predefined features (e.g. gradient and edges). When a face is detected, the coordinate and size of the face box is returned.
Modern face detectors are quite efficient and effective. However, the accuracy is significantly reduced in situations like small faces, face occlusions, low resolution and poor camera angles. The state-of-the-art face detector uses deep learning to overcome those challenges (Figure 2), which could make its way into market in the next few years .
Facial Landmarks Detection
After determining the location of a face, facial landmark detection localizes salient regions on a typical face like the eyebrows, eyes, nose, mouth, jawline. As an example, a commonly used open source image processing toolkit called DLIB provides a face shape predictor model that tracks 68 key points on face and provide face pose (tilt, rotation, etc) estimation (Figure 3).
The landmarks and pose are used for normalizing the face image before face recognition. Facial landmarks and pose are also useful for fun applications like Snapchat’s Face Swap and Lenses.
Normalization is an important technique that enhances the accuracy of face recognition. Face images come in at different poses, sizes, expressions and lighting. Before they are used for training and identification purpose, the face images are normalized geometrically to a centered full frontal orientation.
Lighting is also normalized to compensate for the differences in exposure and reflection. Some advanced normalization techniques can even neutralize expressions and remove occlusions, like eye-glasses. Face normalization works closely with the face recognition step.
Lastly, the normalized face is passed to a system that is trained to look at subtle differences between faces. The system is built with, what else, deep neural networks (DNN). More specifically, deep convolutional neural network (CNN).
As a start, people feed the CNN with millions of pictures from labelled faces. By repeatedly comparing images from the same and different people, the system is able to learn over a hundred facial features that it can use to tell people apart. This is built into a neural network model. When your face is enrolled into the system, your facial features become part of the model which will be used for identification and matching purpose.
There are two key elements here that impact the performance of face recognition, size of neural network and size of data. That is the important reason why internet and social media giants like Google, Facebook, Baidu and Tencent, usually lead the curve in face recognition performance. Whoever owns a large database of face data and compute resources will more likely dominate the field.
Misc. Analysis Using Your Face
Similar to face recognition, the system can also perform different types of classifications like age, gender and ethnicity. Some Face SaaS platforms like Microsoft Face API (Figure 4), also provide analysis of the emotional state, makeup, eye wear, etc.
Benchmarks for Face Recognition
There are a few projects that maintain large face image data sets, to benchmark the performance of face recognition algorithms. One of the most prominent projects is called Labelled Face in the Wild (LFW). The project provides reports of Receiver Operating Characteristic (ROC) curve performance for algorithms of participating research teams, using pictures of labelled face of known individuals. [Note: ROC is commonly used to measure the performance of binary classifiers, which basically give true or false results]
Figure 5 shows ROC curves which plot the true positive rate (a.k.a. recall rate) against false positive rate. It measures how well different algorithms can match the correct faces within the database. The bigger the area under curve is, the better.
As a comparison, Figure 7 shows the performance of an actual human using the same set of images.
The top performers in Figure 5 are all using deep neural networks. In fact, it can be seen that in some cases, the performance is better than human.
Recently, there are newer projects that measure other common usage scenarios like performance of face recognition in searching a face within many distractors, conducted by MegaFace Project.
The elements of the face processing pipeline have been around for many years. Government and enterprises are the main users of the technology. Several factors help make the technology useable in much broader scopes.
For quite a while, the accuracy of machine face recognition technology has had a meaningful gap from human performance. So, a lot of manual work is still necessary to review the results from machines, making it impractical to be used in large scale.
As can be seen in the previous section, DNN has gradually matched human performance, at least in some cases. Thus, it allows us to design it into previously unthinkable Applications.
SaaS platforms are able to deliver face applications in large scale using well established infrastructures, together with other services they are offering. So, they are able to offer the service at a reasonable margin and cost.
Meanwhile, competition is heating up, as small companies are also able to leverage advancements in deep learning to achieve reasonable performance in face recognition. The consequence is commoditization of basic face recognition and analysis services.
The rapid improvements in compute performance allows face recognition and various types of analysis to be executed in real-time. Immediately actionable intelligence can be generated. This opens up many new Applications like real-time threat detection, targeted marketing, etc.
Face processing and analysis technology can greatly enhance the usability of many products and boost up productivity of many industries. Developers now have a wide selection of face technology solutions to choose from. The solutions can largely be divided into cloud based and offline SDK.
When you are evaluating use of face technology in prototypes (e.g. experimenting with user experience), a cloud based platform is a good starting option. They usually have little system performance requirements and are free for evaluation and development.
|Microsoft Azure Cognitive Service Face API||
|Google Vision API||Face Detection (emotion, head wear)|
|Affectiva||Emotion Analysis (age, gender, ethnicity classifiers)|
Table 1: Cloud based face technology platforms (Sept 2017)
Table 1 lists the feature sets provided by the leading face technology providers. Microsoft Azure and Kairos provide a rich set of features, which will be useful when you are building an end-to-end ambient intelligence solutions. Domain expertise is also important. Some providers focus on specific domain. As an example, Affectiva provides the most in-depth emotion analysis.
It is also noteworthy that some of the platforms like Face++ and Sighthound are also providing other computer vision services like body and vehicle recognition.
All platforms provide a free tier which limits the number of transactions per month. They also limit the throughput of API calls (e.g. 1 per second) based on the pricing tier chosen. The free tier is only good for experimentation. The paid tiers are charged by the number of API calls per month.
It is important to look at the API calls required for a function you want to implement. Use face recognition as an example. One platform may take one API call with image in request body. While another platform will need two API calls, face detection API followed by the actual face recognition API.
Some platforms could have different prices for specific API calls too. So, the actual cost will depend on your Applications and input data. It is best to create a prototype workflow for your product to make a better estimation of cost on different platforms.
A cloud based platform can have some major drawbacks. Firstly, the service is not accessible if internet access or the SaaS platform is down. Secondly, you need to send pictures to a third party, which is not desirable in a lot of cases. Even though cloud service providers emphasize that they always “de-identify” the images, facial feature information is still in the system. Customer privacy could be a concern. Offline face recognition SDKs are an alternative.
Table 2 has a list of SDK providers. The actual features provided on specific platforms are heavily dependent on the capability of hardware.
Table 2: Offline face SDK (Sept 2017)
OpenCV and DLIB are very popular computer vision libraries that computer vision developers use to develop complete face recognition applications. Those libraries can be used to augment above solutions in building full applications. We will discuss about that in the next part of the series.
For the first part of this series, we have an overview of face technology pipeline and the building blocks. Developers now can choose from a good selection of face technology software providers. We also covered what to consider in choosing your solution. In the next part of the series, we will use a case study to gain more insight in the design and development process of integrating face technology in product.