A Practical Guide To Using Face Technology (Part I)

What are the key components behind commonly used face technology?

Frank Lee

Overview

In recent years, face technology is making its way into the mainstream. Advancements in Deep Neural Networks (DNN) and face scanning cameras greatly increase the accuracy of face detection and recognition. Furthermore, there is a growing list of companies offering developer friendly face API through their SaaS platforms, making it a lot easier and cheaper to incorporate state-of-the-art face technology into different types of products.

This is the first part of the series which will give you a good understanding of the key components behind commonly used face technology, as well as the capabilities and constraints. Though we will focus on facial recognition, other related face technology functions will be mentioned as well.  We will also talk about some prominent services currently available in the market.

In Part II of this series, we will walk through design and development considerations, using a case study, in integrating the technology into the final product.  

The Plumbing

A typical face analytic pipeline is illustrated in Figure 1.

Face technology pipeline
Fig 1. Basic Face Processing Pipeline, Image Credit: Frank Lee

Usually, the face is captured through a webcam or security camera. The positioning and selection of the camera is important to the performance of the face pipeline in the usage scenario. We will talk about that in a later section.

The captured face data can also be three dimensional if depth camera is available, which will make the system even more robust. Nevertheless, the overall procedure will still be similar.

Face Detection

Face processing pipeline starts with detecting faces within camera view, so that only relevant face pixels are passed to subsequent analysis stages. Without going into too much details, the basic idea is to use a “face detector” window box to slide across the image at multiple scales. Face detector is usually a classifier which is trained to look for predefined features (e.g. gradient and edges). When a face is detected, the coordinate and size of the face box is returned.

Modern face detectors are quite efficient and effective. However, the accuracy is significantly reduced in situations like small faces, face occlusions, low resolution and poor camera angles. The state-of-the-art face detector uses deep learning to overcome those challenges (Figure 2), which could make its way into market in the next few years .   

Face detection
Fig 2. CVPR 2017 Finding Tiny Faces, Peiyun Hu, Deva Ramanan, CMU

Facial Landmarks Detection

After determining the location of a face, facial landmark detection localizes salient regions on a typical face like the eyebrows, eyes, nose, mouth, jawline. As an example, a commonly used open source image processing toolkit called DLIB provides a face shape predictor model that tracks 68 key points on face and provide face pose (tilt, rotation, etc) estimation (Figure 3).  

Image Credit: Cole Murray as HackerNoon Blog

The landmarks and pose are used for normalizing the face image before face recognition. Facial landmarks and pose are also useful for fun applications like Snapchat’s Face Swap and Lenses.

Face Normalization

Normalization is an important technique that enhances the accuracy of face recognition. Face images come in at different poses, sizes, expressions and lighting. Before they are used for training and identification purpose, the face images are normalized geometrically to a centered full frontal orientation.

Lighting is also normalized to compensate for the differences in exposure and reflection. Some advanced normalization techniques can even neutralize expressions and remove occlusions, like eye-glasses. Face normalization works closely with the face recognition step.   

Face Recognition

Lastly, the normalized face is passed to a system that is trained to look at subtle differences between faces. The system is built with, what else, deep neural networks (DNN). More specifically, deep convolutional neural network (CNN).

As a start, people feed the CNN with millions of pictures from labelled faces. By repeatedly comparing images from the same and different people, the system is able to learn over a hundred facial features that it can use to tell people apart. This is built into a neural network model. When your face is enrolled into the system, your facial features become part of the model which will be used for identification and matching purpose.

There are two key elements here that impact the performance of face recognition, size of neural network and size of data. That is the important reason why internet and social media giants like Google, Facebook, Baidu and Tencent, usually lead the curve in face recognition performance. Whoever owns a large database of face data and compute resources will more likely dominate the field.

Misc. Analysis Using Your Face

Similar to face recognition, the system can also perform different types of classifications like age, gender and ethnicity. Some Face SaaS platforms like Microsoft Face API (Figure 4), also provide analysis of the emotional state, makeup, eye wear, etc.  

Benchmarks for Face Recognition 

There are a few projects that maintain large face image data sets, to benchmark the performance of face recognition algorithms. One of the most prominent projects is called Labelled Face in the Wild (LFW). The project provides reports of Receiver Operating Characteristic (ROC) curve performance for algorithms of participating research teams, using pictures of labelled face of known individuals. [Note: ROC is commonly used to measure the performance of binary classifiers, which basically give true or false results]

Figure 5 shows ROC curves which plot the true positive rate (a.k.a. recall rate) against false positive rate. It measures how well different algorithms can match the correct faces within the database. The bigger the area under curve is, the better.

face technology benchmark
Fig 5. LFW ROC Unrestricted, Labelled Outside Data (Retrieved Sept, 2017)
Face technology benchmarks
Fig 6. LFW mean classification accuracy (Retrieved Sept, 2017)

As a comparison, Figure 7 shows the performance of an actual human using the same set of images.

Face technology vs. human performance
Fig 7. Real human LFW ROC face match performance (Retrieved, Sept 2017)

The top performers in Figure 5 are all using deep neural networks. In fact, it can be seen that in some cases, the performance is better than human.

Recently, there are newer projects that measure other common usage scenarios like performance of face recognition in searching a face within many distractors, conducted by MegaFace Project.

Why Now?

The elements of the face processing pipeline have been around for many years. Government and enterprises are the main users of the technology. Several factors help make the technology useable in much broader scopes.

Accuracy

For quite a while, the accuracy of machine face recognition technology has had a meaningful gap from human performance. So, a lot of manual work is still necessary to review the results from machines, making it impractical to be used in large scale.

As can be seen in the previous section, DNN has gradually matched human performance, at least in some cases. Thus, it allows us to design it into previously unthinkable Applications.

Cost

SaaS platforms are able to deliver face applications in large scale using well established infrastructures, together with other services they are offering. So, they are able to offer the service at a reasonable margin and cost.

Meanwhile, competition is heating up, as small companies are also able to leverage advancements in deep learning to achieve reasonable performance in face recognition. The consequence is commoditization of basic face recognition and analysis services.  

Speed

The rapid improvements in compute performance allows face recognition and various types of analysis to be executed in real-time. Immediately actionable intelligence can be generated. This opens up many new Applications like real-time threat detection, targeted marketing, etc.

Available Solutions

Face processing and analysis technology can greatly enhance the usability of many products and boost up productivity of many industries. Developers now have a wide selection of face technology solutions to choose from. The solutions can largely be divided into cloud based and offline SDK.

Cloud Based

When you are evaluating use of face technology in prototypes (e.g. experimenting with user experience), a cloud based platform is a good starting option. They usually have little system performance requirements and are free for evaluation and development.

PlatformFeature Set
Microsoft Azure Cognitive Service Face API
  • Face Detection
    • Age
    • Gender
    • Ethnicity
    • Facial landmark detection
    • Emotion analysis
    • Head wear analysis
  • Face Recognition
    • Verification/matching
    • Face search
    • Grouping
    • Identification
Amazon Rekognition
  • Face Analysis
    • Gender
    • Age
  • Face Recognition
  • Celebrity Recognition
Kairos
  • Face Analysis
    • Age
    • Gender
    • Facial landmark detection
    • Emotion analysis
    • Sentiment analysis
  • Face Recognition
    • Verification/matching
    • Identification
Face++
  • Face Detection
    • Facial landmark detection
    • Gender
    • Age
    • Ethnicity
    • Emotions analysis
  • Face Recognition
    • Comparison
    • Searching
Google Vision APIFace Detection (emotion, head wear)
AffectivaEmotion Analysis (age, gender, ethnicity classifiers)
Sighthound
  • Face Detection
    • Age
    • Gender
    • Emotion analysis
    • Facial landmark detection
  • Face Recognition
    • Grouping
    • Authentication

Table 1: Cloud based face technology platforms (Sept 2017)

Features

Table 1 lists the feature sets provided by the leading face technology providers. Microsoft Azure and Kairos provide a rich set of features, which will be useful when you are building an end-to-end ambient intelligence solutions. Domain expertise is also important. Some providers focus on specific domain. As an example, Affectiva provides the most in-depth emotion analysis.

It is also noteworthy that some of the platforms like Face++ and Sighthound are also providing other computer vision services like body and vehicle recognition. 

Pricing

All platforms provide a free tier which limits the number of transactions per month. They also limit the throughput of API calls (e.g. 1 per second) based on the pricing tier chosen. The free tier is only good for experimentation. The paid tiers are charged by the number of API calls per month.

It is important to look at the API calls required for a function you want to implement. Use face recognition as an example. One platform may take one API call with image in request body. While another platform will need two API calls, face detection API followed by the actual face recognition API. 

Some platforms could have different prices for specific API calls too. So, the actual cost will depend on your Applications and input data. It is best to create a prototype workflow for your product to make a better estimation of cost on different platforms.

Offline SDK

A cloud based platform can have some major drawbacks. Firstly, the service is not accessible if internet access or the SaaS platform is down.  Secondly, you need to send pictures to a third party, which is not desirable in a lot of cases. Even though cloud service providers emphasize that they always “de-identify” the images, facial feature information is still in the system. Customer privacy could be a concern. Offline face recognition SDKs are an alternative.  

Table 2 has a list of SDK providers.  The actual features provided on specific platforms are heavily dependent on the capability of hardware.

 FeaturesPlatforms
Face++
  • Facial Landmark
  • Face Compare
iOS

 

Android

Kairos
  • Face Analysis
    • Age
    • Gender
    • Facial landmark detection
    • Emotion analysis
    • Sentiment analysis
  • Face Recognition
    • Verification/Matching
    • Identification
Windows

 

Linux

OSX

Android

iOS

ChromeOs

Affectiva
  • Emotion Analysis
    • Age
    • Gender
    • Ethnicity
Windows

 

Linux

OSX

Android

iOS

Pi

ChromeOs

Luxand
  • Face recognition
  • Face detection
  • Facial landmark detection
Linux

 

Windows,

iOS

Android

OSX

SightHound Sentry
  • Face Detection
    • Age
    • Gender
    • Emotion analysis
    • Facial landmark detection
  • Face Recognition (Include grouping and  authentication)
iOS

 

Windows

Linux

OSX

Table 2: Offline face SDK (Sept 2017)

Open Source

OpenCV and DLIB are very popular computer vision libraries that computer vision developers use to develop complete face recognition applications. Those libraries can be used to augment above solutions in building full applications. We will discuss about that in the next part of the series.

Summary

For the first part of this series, we have an overview of face technology pipeline and the building blocks. Developers now can choose from a good selection of face technology software providers. We also covered what to consider in choosing your solution. In the next part of the series, we will use a case study to gain more insight in the design and development process of integrating face technology in product.

Author
Frank Lee
Frank Lee
Frank Lee is the co-founder and CEO of Eurika Solutions (www.eurika.ai), developing a platform that empower IoT devices with cognitive technology. He has over 20 years of experience in leading product developments in media processing, computer vis...
Frank Lee is the co-founder and CEO of Eurika Solutions (www.eurika.ai), developing a platform that empower IoT devices with cognitive technology. He has over 20 years of experience in leading product developments in media processing, computer vis...