Biometrics have shown to be a solid alternative to conventional security measures – a person can use their unique characteristics like fingerprint, iris, voice, or face to prove they have access to a certain space. MobiDev engineers shared this roadmap of a flexible AI security system design for an office entrance.
For the Dataset, the most lightweight solution was chosen, SQLite DB. With this DB, all the data is stored in a single file that’s easy to browse and back up, while data science engineers’ learning span is shorter. Voice data wasn’t available initially, so people were asked to record 20-second clips. Then, a voice verification model was used to obtain vectors for each person, and store them in DB.
For face detection, engineers used the RetinaFace model with a MobileNet backbone from the InsightFace project. This model outputs four coordinates for each detected face on an image as well as 5 facial landmarks. The fact that images captured at different angles or with different optics can change the proportions of the face due to distortion. This may cause the model to struggle to identify the person, so facial landmarks were used to perform warping.
At this stage, the model has to identify the person from a given image, which is the obtained picture. Identification is done with the help of references (ground truth data). So here, the model will compare two vectors by measuring the distance score of the difference between two vectors to tell if it’s the same person standing before the camera. This assessment is compared with the initial photo of an employee.
The basic logic is almost the same as in the face identification stage, as two vectors are compared by the distance between them unless similar vectors are found. There is already a hypothesis about who is the person that is trying to pass, from a previous face identification module.
The final security grain was added with speech-to-text anti-spoofing built on QuartzNet from the Nemo framework. This model provides a decent quality user experience and is suitable for real-time scenarios. To measure how close what the person says to what the system expects, requires a calculation of the Levenshtein distance between them.
“By adding multiple edge devices, the system can be scaled to different locations or easily modified. We can directly configure Jetson through the network, set up a connection with low-level devices via GPIO interface, and upgrade it with new hardware quite easily, compared to a regular computer. We can also integrate with any digital security system that has a web API. But the main benefit here is that we can collect data for improving the system right from the device, since it appears convenient to gather data on the entrance, without any specific interruption.”Daniil Liadov, Python Engineer at MobiDev.