Back to LAB main
Better Loc Next Time, Part 2: Optimizing the Audio Localization Process
PTW and SIDE’s own Olivier Deslandes, SVP of Audio and Speech Technology, gave an insightful talk at the most recent GameSoundCon, the game music and sound design conference.
In addition to outlining the main stages of audio localization, Olivier’s purpose was to illuminate common pitfalls that studios encounter during this process and provide viable solutions to well-known problems.
Before partnering with a vendor for localization, and specifically audio localization, game studios should know what to expect in terms of cost, the processes for different types of VO, casting briefs, and script formatting, to name but a few aspects. Being prepared in these areas leads to a more optimized Audio Loc process and a happier working partnership for all involved.
Let’s dive into the details.
Localizing audio represents a whopping 50-70% of a game’s total localization budget. It requires actors, voice directors, sound engineers, production managers, and sophisticated facilities and equipment.
However expensive, the investment is worth the outlay, since bad voice-acting and poor-quality recording will result in an inferior product. The budget should go to professional dubbing actors, a voice director that understands video games, and an experienced production team. A dedicated production management team can best handle talent contracting, scheduling, and asset management, especially if going the route of multiple vendors.
Factors that influence cost are word or line count, the time constraint and its associated recording velocities (which is influenced by the sync status—more on this later), and the quality of voice talent. It’s wise to be as prepared as possible before recording begins, as any delays and changes occurring during production will impact timeline and budget.
In order to provide a detailed quote, your vendor will need the following information:
1. Total number of lines and words to record
2. Breakdown of lines and words per category: wild, time-constrained, lip-sync
3. Character count, character types (main/minor/incidental), and wordcount per character
4. Genre of the game, as it relates to character briefs (e.g., do you need authentic historical accents for a medieval game?)
5. Database vs. audition casting (the latter being more expensive)
6. Number of actors to cast (won’t necessarily be the same as for the original audio)
7. Number of platforms the game will be published on
8. List of languages to localize into
Depending on the length of the script and the number of incidental or non-narrative lines, the type of voiceover line can be a cost factor.
The sync status is the defining trait: whether the VO needs to be timed to the game’s original audio or synched to visuals. Types of non-sync VO include barks (grunts of pain or effort), one-liners, and ambient dialogue from NPCs. On the other hand, scripted events and cutscenes need synched VO.
The cost differential also depends on each line’s duration across different languages. Structural differences between languages account for speed delivery discrepancies. For example, French tends to run longer than English, and synching for tonal languages like Chinese can be challenging.
This is where defining all your recording constraints is important. If your game mechanics allow your localized VO to be wild, this means there are no constraints, and the dialogue can be as long as it needs to be. Then there is time-constrained (TC) and strict time-constrained (STC) voiceover: TC allows files to be longer or shorter than the source recordings by a set margin (typically 10%), but for STC, the localized audio must perfectly match the source recording length.
Finally, sound sync is the same as STC, except that all of the silences must also mirror those of the source; and lip sync is when the audio perfectly match the lip movement in the visuals, for instance, when close facial shots are featured in cutscenes.
Proper documentation can also save a lot of time and miscommunication. One of the more complex hurdles is that of the casting brief, the document that holds all the information for localization of characters. At first glance it might not seem like much of a hurdle: one simply replaces the original actors with local talent. However, there are subtleties that crop up that make this more of a fraught aspect of casting.
For example, an accent may be a character’s feature for comedic effect, but that accent may not correspond to the language that you’re localizing into, and the humor may no longer work.
The same holds true for foreign language lines that give local flavor, as in war games that take place in a particular country. When the target country is of that originally foreign language, the line no longer works properly. These are conversations that must be had with the vendor of choice.
A character’s age can also be difficult to navigate when casting. Castilian Spanish voices can sound older to non-native players, while Japanese female voices tend to sound younger to non-Japanese players. Even using actual children as voice actors can pose a challenge, as different countries have different child labor laws that pose scheduling and productivity risks. Instead, a frequent substitution is an adult actor who can mimic a child’s voice.
Script formatting is a key component of keeping recordings on track, as it involves the status of individual lines, which can undergo changes during sessions.
Typically, an Excel spreadsheet with macros is the format of choice, as it easily handles elements like script change, alternative takes, time stamps, and version tracking. Important aspects of the script include unique line IDs for chronological order; story context and director’s notes, so that it’s always clear at what point the dialogue takes place; character names, which must always be spelled and spoken consistently; type of line (sync or not); feeder lines, which are links that play the preceding line for the actor’s context when recording; and unique filenames.
Finally, it’s vital to produce an As-Recorded script, which tracks as closely as possible what the actors actually recorded on the day. As previously mentioned, there are often deviations from the original script, so an As-Rec script will contain all the changes and additional comments that QA and post-production will need for subtitling and alternate takes.