Subject: Face recognition
Data Science Areas: Computer Vision, Object Detection, Face Detection, Face Recognition
Tools: Python, Dlib
The Challenge
Our clients approached us looking for a solution that would be capable of finding a match in the photo and recognize a person after seeing this person’s document. The solution would help to determine from the photo of the person whether this person is depicted in the photo in the document.
What about the data?
The most widespread problem with Data Science projects is the lack of data. To avoid this problem here, we decided to create a dataset. To do this, we made sample documents using blank templates and photos of real people from the
Kaggle dataset for face recognition.
We decided to limit our scope to passports, driver’s licenses, and visas to keep the PoC as simple as possible. To ensure optimum efficiency, we used samples from different countries. Each kind of document has its own characteristics, such as unique labels, patterns, and structures, and they also differ in various countries. However, we managed to deliver a solution that works as efficiently as possible on any of the above-mentioned types of documents, regardless of the country they are coming from.
The Solution
After analyzing all the aspects that may affect the outcome of our project, we identified certain limitations. What made our work easier was the fact that the documents mostly contain only one face on the photo.
In order to to establish a connection between the face in the person’s photo and the face in the document, we first had to locate the area where the face was located. Thus, part of the decision was the detection and localization of the face both in the document and in the photo.
To determine whether both photos show the same person, it is necessary to process the faces and identify similarities.. To do this, we needed to identify key traits, unique characteristics which we can use to distinguish among other people— like how big the eyes are, how long the face is, etc.
After locating the faces on both photos, we encoded this image. Using encodings, we mapped facial images to a compact Euclidean space, where the distances correspond to the degree of similarity of the face.
The result
With our solution, we fully implemented the customer’s idea. The accuracy of recognition and determination of whether the same person is in both photos is 99.38%. Using our solution, it is easy to recognize fake images on documents and it can be used as part of document verification or personal identification. It can also be applied as an API or integrated into the web.