Engineering Real-Time Animated Profile Pictures

This article was initially published as a thread of posts on X. For the sake of readability and long-term availability, it is now available here as a single-page article.

Post 1

In the amo apps, profile pictures are done a bit differently. You record “Your Face”, a transparent animated boomerang-like cutout. It feels more personal. You see your friends come to life on-screen, not through overly curated or filtered camera roll photos.

From an engineering perspective, building “Your Face” was an exciting challenge. Here’s what we needed to achieve.

A thread 🧵

Post 2

Here were the general product requirements:

Transparent cutouts
Back-and-forth playback
Instant on-screen display
Realtime playback
Support for 20+ cutouts on-screen simultaneously

And of course, it all had to be done fast 😅

Post 3

“Your Face” has two key phases: recording and playback. Let’s start with how we handle data transfer.

To reduce bandwidth and leverage hardware decoders (highly device-dependent), we used the H.265 HEVC format in an MP4. The cutouts are 256x320px, but the video is actually double that size: 256x640px.

Since videos with transparency aren’t widely supported, we created a custom solution. We merged two “planes” into one video: the original video on top and the mask video below.

Post 4

Recording involved several steps. First, we captured the video using real-time face detection to ensure it’s actually a face. Next, we performed selfie fragmentation to compute an accurate contour of the face. Finally, we synchronized and merged the original and mask videos into one. We encoded the video using H.265 before sending it to our servers to meet transport requirements.

Post 5

Playback of “Your Face” was more complex. The main challenge came from requirements like back-and-forth and real-time playback. Video decoders are built to decode forwards, not backwards, and decoding is computationally expensive. Most hardware can’t handle decoding two videos in parallel.

We designed a queue-based system to manage this. For each request, we:

Decode each video frame
Apply the lower part of the video as a mask to the upper
Store each frame as a bitmap in memory
Use an infinite loop to “rotate” between these bitmaps

We prioritized rendering performance over memory use.

Post 6

These systems have been running smoothly for over a year on both Android and iOS. We have plenty of ideas for improvement – like optimizing decoding time, reducing memory footprint, adding disk caching, better leveraging depth sensors, and using adaptive scalable texture compression (ASTC).

In the end, “Your Face” is a great example of how engineering turns constraints into creativity and solutions into innovation.

Thanks Iggy for serving as a guinea pig 🐹