February 15, 2017 Mosh: Using AI to Predict 3D Human Pose & Shape from a Single Image
Introducing Mosh (click to download from the App Store), a Body Labs mobile photo experiment empowering new forms of creative expression. Mosh is the first app to use artificial intelligence to instantly detect your pose in a photo and offers a set of 3D filters that interact and react to you. This gives you the power to add stunning new 3D effects, seamlessly drape on animated outfits, and create amazing 3D environments that respond to you. With Mosh we want to empower users to create something truly magical. And we’re just getting started.
Create visual magic with Mosh
Add stunning 3D effects
Mosh knows where your head, hands, and feet are in a photo. This enables you to automatically add 3D effects like summoning fire, transforming friends into a mythical sea creatures, and more. With gestures you can also easily control each effect while you record.
Seamlessly try on animated outfits
Use Mosh’s character creator to swipe through animated outfits that conform to you or your friends. Mix and match different styles or character features to make each one your own. Each outfit also comes with live animations to bring them to life.
3D environments that respond to you
Mosh can recognize your 3D shape to unlock environments and objects that react to you. This enables you to enhance your photos with realistic interactions that behave in response to your pose and shape.
Paint or write freely in the background
Mosh automatically separates your friends from the background enabling you to freely paint or write in front or behind them. Quickly toggle between the background and foreground to customize your masterpiece.
How the magic is made
To make Mosh possible we combined deep learning with Body Labs’ generative 3D body models. We start by predicting major joints, toes, and facial features. From this foundation, we then predict the 3D landmark locations and fit our generative 3D model called “SMPL” (Skinned Multi-Person Linear Model) to them. We do this by estimating the 3D pose and shape parameters of the model based off the 3D landmark locations as well as each landmark’s projected location in the original 2D image.
Our CNN (convolutional neural network) runs on a backend server and delivers the 3D predicted pose and shape transparently over the subject in the photo. From there we can render client-side 3D graphics locally on the iPhone inside of the Mosh app to enable the user to efficiently swipe through a series a filters without relying additionally on the server. The server is only required for the 3D pose estimation of the subject.
We see 3D body shape and pose estimation from conventional images and commoditized RGB sensors as an emerging technology. The detection, tracking, and augmentation of faces is already widely in use in an array of mobile applications. Body shape is much harder because they are highly articulated in nature, self occluding, have a wide variety of shapes, and can be obscured by clothing.
By combining deep learning with our 3D body models we address these issues and create a platform for all sorts of amazing interactions, controls, and novel 3D content. We see Mosh as just a first step towards mobile applications that augment the body for fun, games, communication, shopping, and a whole lot more.