Touch-less Natural User Interface
Natural User Interface (NUI) let users quickly immerse in applications and master control with minimum learning. It is critically important for AR/VR applications and ambient intelligence systems. In burgeoning applications like autonomous drone control and in-car infotainment navigation, NUI can greatly increase the usability.
One key contributor to NUI is touch-less gesture control which allows manipulating virtual objects in a way similar to physical ones. It completely removes the dependency on any mechanical devices like a keyboard or mouse.
Gesture Control Devices
In 1970s, wired gloves were invented to capture hand gestures and motions. The gloves use tactile switches, optical or resistance sensors to measure the bending of joints. Those gloves had clumsy setup, limiting the applications to research purposes.
One of the earliest commercial wired gloves for consumer market is the Power Glove (Fig. 1), released in 1989. It was used as a control device for the Nintendo game console.
Through the years, more accurate and lightweight wired gloves were developed. One advantage of a wired glove is that they require less computing power. They are useful in cases where haptic feedback is important, like industrial robot control. However, requiring the user to put on a glove is a barrier for mass market adoption.
Vision Based Gesture Recognition
Vision based gesture recognition uses a generic camera and/or range camera to capture and derive the hand gesture. It requires higher processing power compared to a wired glove. There are multiple methods for camera based gesture recognition.
Using a conventional 2D camera, simple gesture recognition can be implemented using functions provided by commercial or open source computer vision libraries, like OpenCV library (Fig. 2). The pipeline uses skin tone detection to detect hands in a constrained area. It then detects convex and defect points of hands.
The simplistic approach can do basic stuff like finger counting. However, it is not suitable for more complex use cases. The reliability is strongly affected by factors like lighting and skin tones.
Another algorithm is the appearance-based method, which directly uses one or more hand images to match with a set of gesture templates. It can deliver pretty robust gesture classifications with machine learning methodology. The method supports simple gestures like starting a program with an open palm, stopping a program with closed palm, changing pages with hand swipe, and more.
However, the kinematic parameters of hand joints are not available. It is not suitable for use cases that require a more detailed representation of hand interaction with virtual objects in 3D space.
3D cameras that can perceive depth have become much more broadly available and cheaper in recent years (Choosing a 3D Vision Camera). In 2010, Microsoft released Kinect V1 motion controller, using technology from PrimeSense. It provides strong three dimensional body and hand motion capture capabilities in real-time, freeing game players from physical input devices like keyboards and joysticks. Kinect also supports multiple users in a small room setting. It engages non-gamers of different ages to easily participate in fun games like sports and dancing.
Besides gaming, the platform supports many interesting applications too. In sports, companies like Swing Guru developed professional coaching applications for golfing and baseball. The alternative is to place motion trackers on the user’s body. They are relatively expensive and inconvenient.
While Kinect primarily focuses on capturing body pose, Leap Motion developed a short range gesture capture device using a stereo infrared camera.
Leap Motion software is able to track fine gestures of two hands at high frame rate. It enables applications like drawing and manipulating small objects in virtual space. Some PC vendors partnered with Leap Motion to provide the user NUI in desktop applications like Computer Aided Design (CAD).
As discussed in the article “Choosing a 3D Vision Camera”, there is a growing number of low cost cameras that can perceive three dimensional space. Here is a list of some examples that people are using to develop gesture recognition software:
As mobile and embedded devices are becoming more powerful, some software vendors have also developed a strong gesture recognition software stack on typical smartphones. They are suitable for use cases where hand motions are confined to a small and well defined space, like menu clicking in VR applications or interaction with an automobile’s infotainment/navigation system while driving.
A number of software vendors are also providing SDK or middleware for application developers to easily integrate gesture and pose recognition to their applications. Here is a list of examples:
Gesture Control Use Cases
Besides AR/VR/MR, touch-less gesture control has a broad range of applications.
Digital signages and display walls in retail continue to grow rapidly in the next few years. Rather than just rotating predefined digital contents, gesture control enables digital signages to engage with customers in their shopping process.
In combination with face technology, digital signages could effectively function as digital sales representatives and provide a bridge between online and offline experience. This will have a positive impact on sales conversion, which is particularly important for brick-and-mortar business nowadays.
Driver distractions are becoming a huge problem for traffic safety. Automobile manufacturers are coming up with more natural ways to control the infotainment system, keeping the driver’s eye on the road.
Voice control is one way but it may not be desirable in certain situations (e.g. when you have a bad sore throat one day). A touch-less hand gesture interface reduces the need for drivers to reach out to the dashboard control panel. BMW’s camera based gesture control system is one example.
Drone manufacturers like DJI are making photo-taking drones that can fly autonomously from the user’s hand and return without using remote control. Hand gestures are the viable way to guide drone operations outdoor, like summoning the drone back by waving hands (Fig. 8).
In the age of IoT, a touch-less, natural user interface is critical for everyday users to engage with the intelligent devices and environments. In designing smart buildings, carefully designed interfaces that can recognize common user gestures will greatly enhance the user experience, productivity and safety.