Larian’s Swen Vincke Talks Costs of Voiced Dialogue and Motion Capture

Larian Studios’ Swen Vincke has penned the follow-up to a blog post from last year on the costs of voice acted dialogue. As it turns out, Larian has decided to go with facial motion capture and even has a new video to showcase the brand new system they’ve bought to do this kind of work:

And a snip from the blog post:

As you can see, we bought a facial capture system I guess it’s obvious from the video that some in our team were quite excited about this momentous event
Our plan of approach is to put those cameras you see in the video in the voice recording booth, and then use this captured data to put emotion in the faces of our 3D protagonists and antagonists.
This facial capture data will then be overlaid on top of motion captured body animations (we also have a motion capture system in the office), and the end result should be believable dialogues when talking to all of the characters in Dragon Commander.
At least that’s the plan.
The decision to do it this way came after checking plenty of other solutions, ranging from trying to set something up oursevles with Kinect devices (cheapest) to hiring simultanous body & facial capture studios (most expensive).
The latter had prices in the range of $1000 to $2000 per minute which would cost us between 0,5M US$ to 1M US$. I actually contemplated this for some time, but then decided against it. I figured that in the end we’d be best served if we could come up with a homebrewn solution, even if that causes a bit more pain and might in the short term not give us the highest quality solution.
My thinking was that for whatever game we do, we’ll always need to hire voice actors, so in all cases that’s a cost we’ll have to carry. Now, while they are acting, they are actually generating the data we need we just need the ability to extract that data and project it on 3D characters.
The equipment we bought allows us to record the facial marker data at 100 frames per second from seven directions. Should we discover for some reason that that’s not enough, we can always add extra cameras, but from the looks of it, the raw data looks to be good enough to work with.
So if we organize ourselves such that for every future recording session, we record the facial expressions of the actors in addition to their voice, we should have sufficient base material to work from. Obviously, this does cause extra complications in the recording booth as we’re increasing actor/studio time and thus recording cost, but from the tests we’ve done, it looks like it should be manageable.