Skip to main content

Frame Processors

What are frame processors?

Frame Processors are JavaScript functions that are called for each Frame the Camera "sees".

Inside those functions you can analyze the Frame in realtime using native Frame Processor Plugins or draw directly onto the Frame using Skia Frame Processors.

For example, the vision-camera-image-labeler plugin can detect objects at 60+ FPS:

const frameProcessor = useFrameProcessor((frame) => {
'worklet'
const objects = detectObjects(frame)
const label = objects[0].name
console.log(`You're looking at a ${label}.`)
}, [])

return <Camera frameProcessor={frameProcessor} />

Due to their extensible plugin-based nature, Frame Processors can be used for all sorts of things, like:

  • ML for face recognition
  • Using Tensorflow/TFLite, MLKit Vision, Apple Vision or other libraries
  • Creating realtime video-chats using WebRTC to directly send the camera frames over the network
  • Creating scanners for custom codes such as Snapchat's SnapCodes or Apple's AppClips
  • Creating snapchat-like filters, e.g. draw a dog-mask filter over the user's face
  • Creating color filters with depth-detection
  • Drawing boxes, text, overlays, or colors on the screen in realtime
  • Rendering filters and shaders such as Blur, inverted colors, beauty filter, or more on the screen

Installation

Frame Processors require react-native-worklets-core 1.0.0 or higher. Install it:

npm i react-native-worklets-core

And add the plugin to your babel.config.js:

module.exports = {
plugins: [
['react-native-worklets-core/plugin'],
],
}

The Frame

A Camera Frame is a GPU-based pixel buffer, usually in YUV or RGB format. The Frame object contains these GPU-based pixel buffers, and exposes them to JavaScript. For example, to log information about the Frame such as it's resolution or pixel format, simply access it's properties:

const frameProcessor = useFrameProcessor((frame) => {
'worklet'
console.log(`Frame: ${frame.width}x${frame.height} (${frame.pixelFormat})`)
}, [])

To directly access the Frame's pixel data use toArrayBuffer():

const frameProcessor = useFrameProcessor((frame) => {
'worklet'
if (frame.pixelFormat === 'rgb') {
const buffer = frame.toArrayBuffer()
const data = new Uint8Array(buffer)
console.log(`Pixel at 0,0: RGB(${data[0]}, ${data[1]}, ${data[2]})`)
}
}, [])

While you can process the Frame's pixel data in JavaScript, it is recommended to use native Frame Processor Plugins instead for better performance and GPU-acceleration.

tip

At 4k resolution, a raw Frame is roughly 12MB in size, so if your Camera is running at 60 FPS, roughly 700MB are flowing through your Frame Processor per second.

Such amounts of data cannot be copied or serialized fast enough, so VisionCamera uses JSI to directly expose the GPU-based buffers from C++ to JavaScript.

Native Frame Processor Plugins

Usually JavaScript is not fast enough to process heavy amounts of data in realtime as it does not have access to GPU-acceleration directly. Instead, you should use native languages (Objective-C/Swift, Java/Kotlin or C++) to write the actual heavy processing code, such as face detection or object detection.

VisionCamera provides a plugin system to directly and synchronously call into native languages, run any processing on the Frame, and return back to JavaScript with almost no noticeable overhead at all.

This plugin-based approach ensures maximum flexibility while still guaranteeing maximum performance. For example, the sensitivity value can be tweaked on-the-fly via fast-refresh in development, or shipped with over-the-air updates in production - without a native rebuild needed.

const sensitivity = 0.8
const frameProcessor = useFrameProcessor((frame) => {
'worklet'
const faces = detectFaces(frame, { sensitivity: sensitivity })
// ...
}, [sensitivity])

Using Community Plugins

Community Frame Processor Plugins are distributed through npm. To install the vision-camera-image-labeler plugin, run:

npm i vision-camera-image-labeler
cd ios && pod install

That's it! 🎉 Now you can use it:

const { labelImage } = useImageLabeler()

const frameProcessor = useFrameProcessor((frame) => {
'worklet'
const labels = labelImage(frame)
const label = labels[0].name
console.log(`You're looking at a ${label}.`)
}, [labelImage])

Check out Frame Processor community plugins to discover available community plugins.

Creating native Frame Processor Plugins

VisionCamera provides an easy-to-use API for creating native Frame Processor Plugins, which you can use to either wrap existing algorithms (e.g. the iOS/Android "MLKit Face Detection" API), or to build your own custom algorithms.

A Frame Processor Plugin is a single native class which contains an initializer and a callback function that receives the native Frame (CMSampleBuffer or Image):

FaceDetectorPlugin.swift
@objc(FaceDetector)
public class FaceDetectorPlugin: FrameProcessorPlugin {
private let sensitivity: Double

public override init(proxy: VisionCameraProxyHolder, options: [AnyHashable: Any]) {
super.init(proxy: proxy, options: options)
sensitivity = options["sensitivity"] as Double
}

public override func callback(_ frame: Frame, withArguments args: [AnyHashable: Any]) -> Any {
let imageBuffer = CMSampleBufferGetImageBuffer(frame.buffer)
let faces = MLKit.detectFaces(imageBuffer, sensitivity)
return faces.map { face in face.toJson() }
}
}

The native plugin can accept parameters (e.g. for configuration) and return any kind of values for result, which are automatically converted to- and from JavaScript values.

🚀 See: "Creating Frame Processor Plugins" for more information.

Performance

The Frame Processor runtime is built with C++ and bridged to JavaScript with low-overhead using JSI.

Considerations

As Frame Processors run synchronously in the video pipeline, you need to complete your processing before the next Frame arrives. At 30 FPS you have 33ms, at 60 FPS you only have 16ms per frame.

tip

The lower the video resolution of your format, the faster your Frame Processor can execute. For example, if your ML model is trained on 1280x720, use the format closest to that resolution:

const format = useCameraFormat(device, [
{ videoResolution: { width: 1280, height: 720 } }
])

Benchmarks

Frame Processors are really fast. I have used MLKit Vision Image Labeling to label 4k Camera frames in realtime, and measured the following results:

  • Fully natively (written in pure Objective-C, no React interaction at all), I have measured an average of 68ms per call.
  • As a Frame Processor Plugin (written in Objective-C, called through a JS Frame Processor function), I have measured an average of 69ms per call.

This means that the Frame Processor API only takes ~1ms longer than a fully native implementation, making it the fastest and easiest way to run any sort of Frame Processing in React Native.

Disabling Frame Processors

The Frame Processor API spawns a secondary JavaScript Runtime which consumes a small amount of extra CPU and RAM. Additionally, compile time increases since Frame Processors are written in native C++. If you're not using Frame Processors at all, you can disable them:

Android

Inside your gradle.properties file, add the enableFrameProcessors flag and set it to false:

VisionCamera_enableFrameProcessors=false

Then, clean and rebuild your project.

iOS

Inside your Podfile, add the VCEnableFrameProcessors flag and set it to false:

$VCEnableFrameProcessors = false
info

When react-native-worklets-core is not installed, Frame Processors are automatically disabled.


🚀 Next section: Zooming (or creating a Frame Processor Plugin)