Optimizing the Audio Localization Process For Games

Better Loc Next Time, Part 2: Optimizing the Audio Localization Process

PTW and SIDE’s own Olivier Deslandes, SVP of Audio and Speech Technology, gave an insightful talk at the most recent GameSoundCon, the game music and sound design conference.

In addition to outlining the main stages of audio localization, Olivier’s purpose was to illuminate common pitfalls that studios encounter during this process and provide viable solutions to well-known problems.

Before partnering with a vendor for localization, and specifically audio localization, game studios should know what to expect in terms of cost, the processes for different types of VO, casting briefs, and script formatting, to name but a few aspects. Being prepared in these areas leads to a more optimized Audio Loc process and a happier working partnership for all involved.

Let’s dive into the details.

Audio localization costs

Localizing audio represents a whopping 50-70% of a game’s total localization budget. It requires actors, voice directors, sound engineers, production managers, and sophisticated facilities and equipment.

However expensive, the investment is worth the outlay, since bad voice-acting and poor-quality recording will result in an inferior product. The budget should go to professional dubbing actors, a voice director that understands video games, and an experienced production team. A dedicated production management team can best handle talent contracting, scheduling, and asset management, especially if going the route of multiple vendors.

Factors that influence cost are word or line count, the time constraint and its associated recording velocities (which is influenced by the sync status—more on this later), and the quality of voice talent. It’s wise to be as prepared as possible before recording begins, as any delays and changes occurring during production will impact timeline and budget.

In order to provide a detailed quote, your vendor will need the following information:

1. Total number of lines and words to record
2. Breakdown of lines and words per category: wild, time-constrained, lip-sync
3. Character count, character types (main/minor/incidental), and wordcount per character
4. Genre of the game, as it relates to character briefs (e.g., do you need authentic historical accents for a medieval game?)
5. Database vs. audition casting (the latter being more expensive)
6. Number of actors to cast (won’t necessarily be the same as for the original audio)
7. Number of platforms the game will be published on
8. List of languages to localize into

Types of VO lines

Depending on the length of the script and the number of incidental or non-narrative lines, the type of voiceover line can be a cost factor.

The sync status is the defining trait: whether the VO needs to be timed to the game’s original audio or synched to visuals. Types of non-sync VO include barks (grunts of pain or effort), one-liners, and ambient dialogue from NPCs. On the other hand, scripted events and cutscenes need synched VO.

The cost differential also depends on each line’s duration across different languages. Structural differences between languages account for speed delivery discrepancies. For example, French tends to run longer than English, and synching for tonal languages like Chinese can be challenging.

This is where defining all your recording constraints is important. If your game mechanics allow your localized VO to be wild, this means there are no constraints, and the dialogue can be as long as it needs to be. Then there is time-constrained (TC) and strict time-constrained (STC) voiceover: TC allows files to be longer or shorter than the source recordings by a set margin (typically 10%), but for STC, the localized audio must perfectly match the source recording length.

Finally, sound sync is the same as STC, except that all of the silences must also mirror those of the source; and lip sync is when the audio perfectly match the lip movement in the visuals, for instance, when close facial shots are featured in cutscenes.

WILD

TIME-CONSTRAINED (TC)

STRICT TIME-CONSTRAINED (STC)

SOUND SYNC/LIP SYNC

Casting brief

Proper documentation can also save a lot of time and miscommunication. One of the more complex hurdles is that of the casting brief, the document that holds all the information for localization of characters. At first glance it might not seem like much of a hurdle: one simply replaces the original actors with local talent. However, there are subtleties that crop up that make this more of a fraught aspect of casting.

For example, an accent may be a character’s feature for comedic effect, but that accent may not correspond to the language that you’re localizing into, and the humor may no longer work.

The same holds true for foreign language lines that give local flavor, as in war games that take place in a particular country. When the target country is of that originally foreign language, the line no longer works properly. These are conversations that must be had with the vendor of choice.

A character’s age can also be difficult to navigate when casting. Castilian Spanish voices can sound older to non-native players, while Japanese female voices tend to sound younger to non-Japanese players. Even using actual children as voice actors can pose a challenge, as different countries have different child labor laws that pose scheduling and productivity risks. Instead, a frequent substitution is an adult actor who can mimic a child’s voice.

Script formatting

Script formatting is a key component of keeping recordings on track, as it involves the status of individual lines, which can undergo changes during sessions.

Typically, an Excel spreadsheet with macros is the format of choice, as it easily handles elements like script change, alternative takes, time stamps, and version tracking. Important aspects of the script include unique line IDs for chronological order; story context and director’s notes, so that it’s always clear at what point the dialogue takes place; character names, which must always be spelled and spoken consistently; type of line (sync or not); feeder lines, which are links that play the preceding line for the actor’s context when recording; and unique filenames.

Finally, it’s vital to produce an As-Recorded script, which tracks as closely as possible what the actors actually recorded on the day. As previously mentioned, there are often deviations from the original script, so an As-Rec script will contain all the changes and additional comments that QA and post-production will need for subtitling and alternate takes.

Post-production and QA needs

Robust asset management is critical in the post-production phase due to the volume of files and tight turn-around times. The integrity of deliveries can make or break submission. For this reason, your vendor needs to understand all your delivery needs in terms of:

File formats
Naming conventions
Sample and bit rates for files
Folder hierarchy
“Cut and clean” rules: what and where to cut to avoid unnecessary silences, clicks, or other extraneous sounds
Compression, normalization, and F/X
Loudness levels for mastering

Of course, a good loc partner will already know to ask for your specific needs and understand that the loc files should match the source files across the technical requirements.

If, for whatever reason, some changes haven’t been tracked properly in the studio, the post-production team needs a process to implement those changes in the As-Rec script during QA. The more languages localized, the more files will be given back for integration simultaneously or at short intervals. That can be overwhelming, so it’s best to ensure the resources are ready toward the end of the loc process.

Communication is key

Finally, never underestimate the power of open communication with the localization team.

If your language service provider doesn’t have the full picture, it could easily lead to cost overruns and missed deadlines. If they didn’t fully understand the importance of a key item or character in the game and thus localized it improperly, a pick-up or first patch fix situation can occur.

So rather than making assumptions, be ready to provide context by enabling free communication between translators, localization project managers, audio producers, and game developers.

Taking all of these elements into account, you’re sure to have a more optimized audio loc process on future game projects.

Ready to localize your game audio? Need help defining a strategy? At any stage of development, PTW’s global audio localization specialists SIDE and entalize are here to meet your needs, no matter the scale. Contact us today.

This article is Part 2 of a series. Find Part 1 here.

Related Voice Production

Inside the World of Japanese Voice Production with SIDE Tokyo

Behind the Scenes: Voicing “Like a Dragon: Infinite Wealth”

Meet the Team: Simonne Stoneley