Intro to Liveness Detection with React Native

Osama Qarem
25 min read

It's been over a year since the pandemic began and social distancing measures are still ongoing for good reason. Naturally, this has been quite the hardship for establishments that depend on the physical presence of customers. What it has also been is the leading factor pushing companies towards the adoption of a more digital operating model.

“Over the last few months, we’ve seen years-long digital transformation roadmaps compressed into days and weeks in order to adapt to the new normal as a result of COVID-19.” - Glenn Weinstein, CCO at Twilio

If a business has yet to digitalize their services, they're likely missing out on some revenue. However, different businesses have different concerns and as such, some may be quick to adapt while others remain wary. Financial institutions in particular are typically nervous about digital identity fraud.

Are You a Robot?

Most banks still require in person meetings to open an account. Few allow you to open one remotely, albeit with limitations on what it can do. The reason is risk. Your physical presence with documents up close is more trustworthy than an online application — where everything is easier to fake.

Enter eKYC (electronically know your customer). It's the category of techniques and approaches to digital identity verification. One of which, is Liveness Detection.

The Turing Test could be described as a challenge which determines if a machine could be mistaken as human for a specific mode of interactions. In contrast, Liveness Detection is a test to uncover a machine that is pretending to be human.

In 2019, Facebook terminated 5.4 billion fake accounts. Adding a liveness test to an onboarding process would certainly reduce spambots.

But how would it prevent digital identity fraud? Unfortunately, it does not. Not on its own. When it comes to digital identity verification, there is no silver bullet.

Rather, liveness detection can make it harder for someone to use your identity. It's usually not used alone and can be part of a larger verification process, typically in the onboarding stage. For a high level of confidence, it's very important that this process covers all bases. Liveness is one of them.

How would that online onboarding process look like? Besides the registration form, it could consist of a liveness test, facial verification and document authenticity checks. A user gets a score for each test, and their overall risk level can be calculated. That's the 'typical' eKYC procedure.

However, there are different methods to liveness detection and not all of them are useful against identity fraud. Each have their own strengths and weaknesses when it comes to security, accessibility and user experience.

A Familiar Challenge

CAPTCHAs are a method of liveness detection capable of deterring simple bots. They're quite common on the web and come in different forms: text-based, image-based or even a simple slider. Google's reCAPTCHA can be a single button click. Matter of fact, it works by tracking your browsing activity and assigning you a risk score. If you're a privacy-conscious user who clears all cookies, you're likely to be labled as high risk. The improved user experience comes at the cost of your own privacy.

Meme making fun of a CAPTCHA. Please laugh.

We've all seen how impressive AI can be. But it's not only getting more impressive, but also accessible to learn and use. While that's great, it's not so great for CAPTCHAs which advanced AIs can easily subvert.

“CAPTCHA tests may persist in this world, too. Amazon received a patent in 2017 for a scheme involving optical illusions and logic puzzles humans have great difficulty in deciphering. Called Turing Test via failure, the only way to pass is to get the answer wrong.” - Why CAPTCHAs have gotten so difficult, The Verge

CAPTCHAs are intended for catching the average spambot. They are not capable of identifying advanced or tailor-made AI. In a process that aims to shield against digital identity fraud, they wouldn't be adequate. So what's the alternative?

Face-based Liveness Detection

Face CAPTCHAs are in fact, a thing. This liveness detection model boils down to requesting a selfie from the user, and applying image processing algorithms to determine whether it's an image of a real person.

To give you an idea of how such a process would work, I built a proof of concept app in React Native, which I will also guide you through creating. The goal here is an introductory level app. In fact, it's not extremely complicated to spoof this implementation as I will demonstrate in the app discussion.

Guide

I value your time so if you want to jump straight to source code, find it here. Feel free to skip ahead to the App Discussion section as well.

Getting Started

Expo makes it easy to prototype and share code for React Native. It's my go-to if I want to build something fast. I also really prefer TypeScript, so the code here is going to have types.

Create a new project using the blank TypeScript template and cd into it:

expo init liveness-detection
? Choose a template:
----- Managed workflow -----
blank a minimal app as clean as an empty
canvas
> blank (TypeScript) same as blank but with TypeScript
configuration
tabs several example screens and tabs
using react-navigation
----- Bare workflow -----
minimal bare and minimal, just the essentials
to get you started
minimal (TypeScript) same as minimal but with TypeScript
configuration

We're going to need the following Expo modules for working with the camera, face detection and view masking:

expo install expo-camera expo-face-detector @react-native-community/masked-view react-native-svg

Additionally, we will use this view which abstracts away the circular progress animation:

npm i react-native-circular-progress

Now we're ready.

expo start
Initial screen with a blank expo project

User Interface

First comes the layout. The following pseudocode represents the final view hierarchy we're going to build:

JSX
<View>
<MaskedView>
<Camera>
<AnimatedCircularProgress />
</Camera>
</MaskedView>
<Instructions />
</View>

We will make the camera preview and mask cover the entire screen using absolute fill. For the cutout shape, it depends on the styles for the mask element. We will use the following constant as a reference for its dimensions:

App.tsx
import { Dimensions } from 'react-native';
const { width: windowWidth } = Dimensions.get("window")
const PREVIEW_SIZE = 325
const PREVIEW_RECT = {
minX: (windowWidth - PREVIEW_SIZE) / 2,
minY: 50,
width: PREVIEW_SIZE,
height: PREVIEW_SIZE
}

Basically, a square at the horizontal center of the screen with a small offset from the top.

minX refers to the left margin and minY to the top margin. So maxX would refer to minX plus the width. I borrowed this naming from native iOS development as it made the most sense.

Note that children to the camera component are drawn on top of the preview, and we want the circular progress to from a ring around the preview cutout. We will need to reference PREVIEW_RECT and PREVIEW_SIZE in several style objects.

App.tsx
import * as React from "react"
import MaskedView from "@react-native-community/masked-view"
import { Camera } from "expo-camera"
import { Dimensions, StyleSheet, Text, View } from "react-native"
import { AnimatedCircularProgress } from "react-native-circular-progress"
const { width: windowWidth } = Dimensions.get("window")
const PREVIEW_SIZE = 325
const PREVIEW_RECT = {
minX: (windowWidth - PREVIEW_SIZE) / 2,
minY: 50,
width: PREVIEW_SIZE,
height: PREVIEW_SIZE
}
export default function App() {
return (
<SafeAreaView style={StyleSheet.absoluteFill}>
<MaskedView
style={StyleSheet.absoluteFill}
maskElement={<View style={styles.mask} />}
>
<Camera
style={StyleSheet.absoluteFill}
type={Camera.Constants.Type.front}
>
<AnimatedCircularProgress
style={styles.circularProgress}
size={PREVIEW_SIZE}
width={5}
backgroundWidth={7}
fill={0}
tintColor="#3485FF"
backgroundColor="#e8e8e8"
/>
</Camera>
</MaskedView>
<View style={styles.instructionsContainer}>
<Text style={styles.instructions}>Instructions</Text>
<Text style={styles.action}>Action to perform</Text>
</View>
</SafeAreaView>
)
}
const styles = StyleSheet.create({
mask: {
borderRadius: PREVIEW_SIZE / 2,
height: PREVIEW_SIZE,
width: PREVIEW_SIZE,
marginTop: PREVIEW_RECT.minY,
alignSelf: "center",
backgroundColor: "white"
},
circularProgress: {
width: PREVIEW_SIZE,
height: PREVIEW_SIZE,
marginTop: PREVIEW_RECT.minY,
marginLeft: PREVIEW_RECT.minX
},
instructions: {
fontSize: 20,
textAlign: "center",
top: 25,
position: "absolute"
},
instructionsContainer: {
flex: 1,
justifyContent: "center",
alignItems: "center",
marginTop: PREVIEW_RECT.minY + PREVIEW_SIZE
},
action: {
fontSize: 24,
textAlign: "center",
fontWeight: "bold"
}
})

That's about it for the initial UI. But we forgot one important task — we need to handle camera permissions:

App.tsx
const [hasPermission, setHasPermission] = React.useState(false)
React.useEffect(() => {
const requestPermissions = async () => {
const { status } = await Camera.requestPermissionsAsync()
setHasPermission(status === "granted")
}
requestPermissions()
}, [])
if (hasPermission === false) {
return <Text>No access to camera</Text>
}
User interface

Face Detector

Let's get down to business. The face detector module integrates with the camera module using props. It's quite straightforward to configure:

App.tsx
import * as FaceDetector from "expo-face-detector"
// `onFacesDetected` callback should be defined inside `App()`.
// We will implement it later.
<Camera
style={StyleSheet.absoluteFill}
type={Camera.Constants.Type.front}
onFacesDetected={onFacesDetected}
faceDetectorSettings={{
mode: FaceDetector.Constants.Mode.fast, // ignore faces in the background
detectLandmarks: FaceDetector.Constants.Landmarks.none,
runClassifications: FaceDetector.Constants.Classifications.all,
minDetectionInterval: 125,
tracking: false
}}
>

faceDetectorSettings can be configured to provide us with different detections based on need. There are many landmarks we could obtain such as the position of the mouth, nose and eyes. By analyzing these kind of points together, we can create the desired expressions and gestures.

Detection Criteria

Let's talk about onFacesDetected callback. This is where all the data processing is going to happen. In order to avoid issues with bad data, we will need to create some rules to make sure that the user is holding the device properly:

  1. There is only a single face in the detection results.
  2. The face is fully contained within the camera preview.
  3. The face is not as big as the camera preview (user is too close to the camera).

It would also be good to verify that the user is looking straight at the device as the fourth step, but I'll leave that for you to try.

In the callback, we're going to recieve results of different faces. This is the type signature for each face detection:

App.tsx
interface FaceDetection {
rollAngle: number
yawAngle: number
smilingProbability: number
leftEyeOpenProbability: number
rightEyeOpenProbability: number
bounds: {
origin: {
x: number
y: number
}
size: {
width: number
height: number
}
}
}

To check condition #2, we need to create a function to determine if one rectangle is within another. We can do that by checking if all corners of the inside rectangle (face) are within the outside one (preview cutout).

Diagram showing rectangular outlines of views
contains.ts
export interface Rect {
minX: number
minY: number
width: number
height: number
}
interface Contains {
outside: Rect
inside: Rect
}
/**
* @returns `true` if `outside` rectangle contains the `inside` rectangle.
* */
export function contains({ outside, inside }: Contains) {
const outsideMaxX = outside.minX + outside.width
const insideMaxX = inside.minX + inside.width
const outsideMaxY = outside.minY + outside.height
const insideMaxY = inside.minY + inside.height
if (inside.minX < outside.minX) {
return false
}
if (insideMaxX > outsideMaxX) {
return false
}
if (inside.minY < outside.minY) {
return false
}
if (insideMaxY > outsideMaxY) {
return false
}
return true
}

It's time to implement onFacesDetected. It will handle the detection criteria for now:

App.tsx
// Add new imports
import { Camera, FaceDetectionResult } from "expo-camera"
import { contains, Rect } from "./contains"
...
const onFacesDetected = (result: FaceDetectionResult) => {
// 1. There is only a single face in the detection results.
if (result.faces.length !== 1) {
return
}
const face = result.faces[0]
const faceRect: Rect = {
minX: face.bounds.origin.x,
minY: face.bounds.origin.y,
width: face.bounds.size.width,
height: face.bounds.size.height
}
// 2. The face is fully contained within the camera preview.
const edgeOffset = 50
const faceRectSmaller: Rect = {
width: faceRect.width - edgeOffset,
height: faceRect.height - edgeOffset,
minY: faceRect.minY + edgeOffset / 2,
minX: faceRect.minX + edgeOffset / 2
}
const previewContainsFace = contains({
outside: PREVIEW_RECT,
inside: faceRectSmaller
})
if (!previewContainsFace) {
return
}
// 3. The face is not as big as the camera preview.
const faceMaxSize = PREVIEW_SIZE - 90
if (faceRect.width >= faceMaxSize && faceRect.height >= faceMaxSize) {
return
}
// TODO: Process results at this point.
}

For checking whether the user's face is in the camera preview, you'll notice that we created an object faceRectSmaller. The reason is that the face detection rectangle we get is actually as big as the entire head:

Face detection rectangle - original
Face detection rectangle - better face fit

Modeling State

Before we work on the rest of onFacesDetected, we need to come up with the possible states. Let's note down what the app should do. Describing the process in detail will help us come up with the state model:

  1. User opens the liveness detector screen. They should see a prompt of what to do here.
  2. If there is more than a single face in the preview, we don't proceed.
  3. If the user's face is not in the preview at all, we let them know.
  4. If the user's face is in the preview but it's too close, we let them know.
  5. We want to detect user actions. Those will be:
    • Blinking both eyes.
    • Turning head to the left.
    • Turning head to the right.
    • Nodding.
    • Smiling.
  6. If the processing conditions are met, we prompt the user to perform an action from the above list.
  7. As the user completes actions, the circular progress fills.
  8. If the user's face leaves the preview after processing starts, we reset the process.
  9. If the user completes all the required actions in sequence, they pass.

Alright. Let's put the above into code. We can start by defining the prompt text to be shown before the processing starts:

App.tsx
const instructionsText = {
initialPrompt: "Position your face in the circle",
performActions: "Keep the device still and perform the following actions:",
tooClose: "You're too close. Hold the device further."
}

We're also going to need a list of detections that must be performed. Each action has a threshold or a probability to compare against.

App.tsx
const detections = {
BLINK: { instruction: "Blink both eyes", minProbability: 0.3 },
TURN_HEAD_LEFT: { instruction: "Turn head left", maxAngle: -15 },
TURN_HEAD_RIGHT: { instruction: "Turn head right", minAngle: 15 },
NOD: { instruction: "Nod", minDiff: 1.5 },
SMILE: { instruction: "Smile", minProbability: 0.7 }
}

The way to determine the thresholds is through trial and error. If a threshold is met for an action in onFacesDetected callback, it means the user is performing that action.

We can track the current action to perform by declaring the actions as an array, which is easy to index and iterate through:

App.tsx
type DetectionActions = keyof typeof detections
const detectionsList: DetectionActions[] = [
"BLINK",
"TURN_HEAD_LEFT",
"TURN_HEAD_RIGHT",
"NOD",
"SMILE"
]

The final state model would be:

App.tsx
const initialState = {
faceDetected: "no" as "yes" | "no",
faceTooBig: "no" as "yes" | "no",
detectionsList,
currentDetectionIndex: 0,
progressFill: 0,
processComplete: false
}

Since we have several pieces of state changing together, it's best to use React.useReducer:

App.tsx
const [state, dispatch] = React.useReducer(detectionReducer, initialState)
App.tsx
interface Actions {
FACE_DETECTED: "yes" | "no"
FACE_TOO_BIG: "yes" | "no"
NEXT_DETECTION: null
}
interface Action<T extends keyof Actions> {
type: T
payload: Actions[T]
}
type PossibleActions = {
[K in keyof Actions]: Action<K>
}[keyof Actions]
const detectionReducer = (
state: typeof initialState,
action: PossibleActions
): typeof initialState => {
switch (action.type) {
case "FACE_DETECTED":
if (action.payload === "yes") {
return {
...state,
faceDetected: action.payload,
progressFill: 100 / (state.detectionsList.length + 1)
}
} else {
// Reset
return initialState
}
case "FACE_TOO_BIG":
return { ...state, faceTooBig: action.payload }
case "NEXT_DETECTION":
// Next detection index
const nextDetectionIndex = state.currentDetectionIndex + 1
// Skip 0 index
const progressMultiplier = nextDetectionIndex + 1
const newProgressFill =
(100 / (state.detectionsList.length + 1)) * progressMultiplier
if (nextDetectionIndex === state.detectionsList.length) {
// Passed
return { ...state, processComplete: true, progressFill: newProgressFill }
}
// Next detection
return {
...state,
currentDetectionIndex: nextDetectionIndex,
progressFill: newProgressFill
}
default:
throw new Error("Unexpected action type.")
}
}

We calculate the progress fill based on the number of successfully completed detections. We also consider the user placing their face in the preview correctly a successful detection, increasing progress fill (good job 🌟).

Once the user has gone through all detections in detectionsList the process will complete.

Our views remain static. The next step is to make them respond to state changes:

App.tsx
<SafeAreaView style={StyleSheet.absoluteFill}>
<MaskedView
style={StyleSheet.absoluteFill}
maskElement={<View style={styles.mask} />}
>
<Camera
style={StyleSheet.absoluteFill}
type={Camera.Constants.Type.front}
onFacesDetected={onFacesDetected}
faceDetectorSettings={{
mode: FaceDetector.Constants.Mode.fast,
detectLandmarks: FaceDetector.Constants.Landmarks.none,
runClassifications: FaceDetector.Constants.Classifications.all,
minDetectionInterval: 125,
tracking: false
}}
>
<AnimatedCircularProgress
style={styles.circularProgress}
size={PREVIEW_SIZE}
width={5}
backgroundWidth={7}
fill={state.progressFill}
tintColor="#3485FF"
backgroundColor="#e8e8e8"
/>
</Camera>
</MaskedView>
<View style={styles.instructionsContainer}>
<Text style={styles.instructions}>
{state.faceDetected === "no" &&
state.faceTooBig === "no" &&
instructionsText.initialPrompt}
{state.faceTooBig === "yes" && instructionsText.tooClose}
{state.faceDetected === "yes" &&
state.faceTooBig === "no" &&
instructionsText.performActions}
</Text>
<Text style={styles.action}>
{state.faceDetected === "yes" &&
state.faceTooBig === "no" &&
detections[state.detectionsList[state.currentDetectionIndex]]
.instruction}
</Text>
</View>
</SafeAreaView>

Let's come back to onFacesDetected. We need to update it so it reflects state by dispatching:

App.tsx
const onFacesDetected = (result: FaceDetectionResult) => {
// 1. There is only a single face in the detection results.
if (result.faces.length !== 1) {
dispatch({ type: "FACE_DETECTED", payload: "no" })
return
}
const face = result.faces[0]
const faceRect: Rect = {
minX: face.bounds.origin.x,
minY: face.bounds.origin.y,
width: face.bounds.size.width,
height: face.bounds.size.height
}
// 2. The face is fully contained within the camera preview.
const edgeOffset = 50
const faceRectSmaller = {
...faceRect,
width: faceRect.width - edgeOffset,
height: faceRect.height - edgeOffset
}
const previewContainsFace = contains({
outside: PREVIEW_RECT,
inside: faceRectSmaller
})
if (!previewContainsFace) {
dispatch({ type: "FACE_DETECTED", payload: "no" })
return
}
if (state.faceDetected === "no") {
// 3. The face is not as big as the camera preview.
const faceMaxSize = PREVIEW_SIZE - 90
if (faceRect.width >= faceMaxSize && faceRect.height >= faceMaxSize) {
dispatch({ type: "FACE_TOO_BIG", payload: "yes" })
return
}
if (state.faceTooBig === "yes") {
dispatch({ type: "FACE_TOO_BIG", payload: "no" })
}
}
if (state.faceDetected === "no") {
dispatch({ type: "FACE_DETECTED", payload: "yes" })
}
// TODO: Next section
}

Once the user has their face in the preview they should now see a prompt to perform the first action!

Initial action prompt

Processing Gestures

With the state model done, we will now look into gesture processing. Since we know all the detection actions, a switch statement would be ideal. We will match the current action and check the corresponding thresholds. If the the threshold is met, the user passes and moves on to the next detection.

App.tsx
// onFacesDetected continued.
const detectionAction = state.detectionsList[state.currentDetectionIndex]
switch (detectionAction) {
case "BLINK":
// Lower probabiltiy is when eyes are closed
const leftEyeClosed =
face.leftEyeOpenProbability <= detections.BLINK.minProbability
const rightEyeClosed =
face.rightEyeOpenProbability <= detections.BLINK.minProbability
if (leftEyeClosed && rightEyeClosed) {
dispatch({ type: "NEXT_DETECTION", payload: null })
}
return
case "NOD":
// TODO: We will implement this next.
case "TURN_HEAD_LEFT":
// Negative angle is the when the face turns left
if (face.yawAngle <= detections.TURN_HEAD_LEFT.maxAngle) {
dispatch({ type: "NEXT_DETECTION", payload: null })
}
return
case "TURN_HEAD_RIGHT":
// Positive angle is the when the face turns right
if (face.yawAngle >= detections.TURN_HEAD_RIGHT.minAngle) {
dispatch({ type: "NEXT_DETECTION", payload: null })
}
return
case "SMILE":
// Higher probabiltiy is when smiling
if (face.smilingProbability >= detections.SMILE.minProbability) {
dispatch({ type: "NEXT_DETECTION", payload: null })
}
return
}

One tricky part here is related to the nodding gesture. We need to consider how people don't normally hold their phones perfectly level with their heads like robots. To mitigate this, we will need to normalize roll angle values. We can track the last few values with React.useRef and use their average as the current baseline angle.

App.tsx
const rollAngles = React.useRef<number[]>([])
...
case "NOD":
// Collect roll angle data in ref
rollAngles.current.push(face.rollAngle)
// Don't keep more than 10 roll angles (10 detection frames)
if (rollAngles.current.length > 10) {
rollAngles.current.shift()
}
// If not enough roll angle data, then don't process
if (rollAngles.current.length < 10) return
// Calculate avg from collected data, except current angle data
const rollAnglesExceptCurrent = [...rollAngles.current].splice(
0,
rollAngles.current.length - 1
)
// Summation
const rollAnglesSum = rollAnglesExceptCurrent.reduce((prev, curr) => {
return prev + Math.abs(curr)
}, 0)
// Average
const avgAngle = rollAnglesSum / rollAnglesExceptCurrent.length
// If the difference between the current angle and the average is above threshold, pass.
const diff = Math.abs(avgAngle - Math.abs(face.rollAngle))
if (diff >= detections.NOD.minDiff) {
dispatch({ type: "NEXT_DETECTION", payload: null })
}
return
...

Note that this will add a delay of minDetectionInterval * 10 before nodding detection can work. An alternative implementation would be preferred in a real app.

To track process completion, we can use an effect (😀). Since the circular progress animation has a default duration of 500ms, we need to consider the last animation before handling completion (e.g. navigating away to another screen).

App.tsx
React.useEffect(() => {
if (state.processComplete) {
setTimeout(() => {
// It's very important that the user feels fulfilled by
// witnessing the progress fill up to 100%.
}, 500)
}
}, [state.processComplete])

You made it. The final source code is available here.

App Discussion

While there are many improvements that could be made, the main thing I would like to talk about is the robustness of the detector.

While this app is more difficult to spoof than a typical CAPTCHA, its processing pipeline is missing two important components:

  • Distingushing 3D and 2D images (regular image vs. an image of another image).
  • Image manipulation detection. E.g. deepfakes.

Exploiting the first weakness:

With a high enough resolution screen the process can be spoofed. In the video above, I'm recording myself from a different device and playing the recording on a screen in real time.

Being able to complete the process through a screen is a big no-no. This is the case of detecting a face on an image of another image (the image on the recording device and the image on the screen). Since this succeeded, it wouldn't be hard to spoof with a deepfake as well.

The implementation we have here is only but a single method of liveness detection: movement. There are detectors out there that do address these issues but in turn suffer from others. To distinguish 2D/3D and manipulated images, some detectors train a model on a dataset of authentic and spoofed images. Others analyze image features to infer properties such as depth.

Captcha meme

Each approach to face-based liveness detection has its own pros and cons. For example, a model trained to detect spoofed images may require little user collaboration but the model is only as good as its training set. To reduce edge cases, combining techniques is the way to go.

Think of how it would be if we enhanced this detector with the missing features. They could work in the background without any extra input required from the user. Although this would require novel work on our part instead of relying completely on a single library. Maybe for a future tutorial, I could explore that angle as I've never had a better excuse to try TensorFlow.js for React Native.

Conclusion

While the pandemic makes the benefits of going fully digital very apparent, onboarding new users online introduces the risk of digital identity fraud. To mitigate this risk, a digital identity verification procedure must be implemented. Liveness detection is typically part of that procedure. However, while text and image CAPTCHAs are typically used there, face-based liveness detection is a better alternative. Face-based liveness detection makes it harder for someone to use your photo when registering. In spite of that, measures must be taken to ensure that the approach implemented is adequately robust. This can be achieved by combining several liveness detectors into one, as a different approach may work where another fails.


I hope you found this post useful and informative. Have a great day.

Enjoying this content?


Feel free to join the newsletter and I'll let you know about new posts 💙