20 December 2010
Matching dialogue to a character's mouth in order to create the illusion of speech is known as lip-syncing. This effort often consumes more time than any other animation task because you need to make adjustments on nearly every frame. In fact, lip-syncing short animations (even only 1–2 minutes long) can involve hours of tedious labor.
This article explains how to make lip-syncing in Adobe Flash Professional CS5 as painless as possible by utilizing the SmartMouth extension to automatically analyze audio content and assign corresponding mouth shapes. That's right: you can sit back and relax your hands, back, neck, and eyes while SmartMouth processes the audio in the time that it takes the audio to play back, and then matches each frame using a speech algorithm. You'll also learn how use the free FrameSync extension to quickly make manual adjustments and tweaks to your character animation.
The basic unit of speech is known as a phoneme. The mouth shape and facial contortions that correspond to phonemes are known as visemes. Animators generally refer to phonemes and visemes interchangeably, even though they are technically different concepts. The standard set of about seven phonemes/visemes (mouth shapes)—not including a closed mouth—are sufficient to create the illusion of speech on an animated character. The SmartMouth Flash extension includes a sample set of those shapes (see Figure 1).
This small set of phonemes corresponds to a wide array of spoken sounds. The full range of sounds that are covered by this small set of mouth shapes are listed in Table 1.
|Abbreviated phoneme||Full letter list||Corresponding examples|
|–||none||No vocal sound (or inaudible)|
|A||A, I||ah as in "cat," a as in "say," i as in "kite"|
|O||O, U, (W), (R)||o as in "boat," u as in "clue"|
|E||E, (I)||e as in "street," eh as in "trek"|
|S||C, D, G, K, N, R, S, TH, Y,Z||s as in "stress," t as in "tent"|
|L||L||l as in "lull"|
|M||M, B, P||m as in "might," b as in "back," p as in "pass"|
|F||F, V||f and v as in "favor"|
Note: Letters in parentheses denote that some sounds made by those letters (for example, W and R) produce a viseme—specifically puckering of the lips, similar to the visemes produced by U and O sounds.
By reusing a small set of mouth shapes, you avoid reinventing the wheel on every frame. Instead, you can simply leverage the repository of mouth shapes that you've already created.
Flash animators have developed many techniques to speed up lip-syncing in Flash. The three most common methods are swapping symbols, nesting, and nesting with labels. Each of these three methods utilize reusable mouth shapes for phonemes.
The swapping method involves creating each phoneme in its own symbol and then swapping the symbol shown on each frame to match the audio at that frame.
The nesting method involves placing all of the phonemes along the Timeline within a single graphic symbol. By manipulating the First (frame) value in the Property inspector, you can control which frame of a graphic symbol's Timeline is displayed (as shown in Figure 2). The nesting method is a bit more organized and efficient than the swapping method, but it does require that you memorize (or write down) the frame numbers that correspond with each mouth shape.
The nesting with labels method is identical to the nesting method, except that frame labels are added to the graphic symbol's Timeline to identify each distinct shape. The labels make the symbol's Timeline easier to navigate—and they can read by SmartMouth and FrameSync, eliminating the need to memorize frame numbers.
The sample project in this article focuses on using the nesting with labels method. To add a frame label, select a keyframe and enter the desired label into the Name field in the Property inspector (see Figure 3).
All manual lip-syncing in Flash relies on scrubbing. Scrubbing refers to the act of selecting the playhead in the Timeline and dragging it across a single frame. Any audio layers that have their Sync property set to Stream will play the scrubbed frame's audio (assuming Control > Mute Sounds is off), allowing you to hear a small fragment of speech. Scrubbing is a time-consuming user action because it requires a lot of precision using a mouse, track pad, or stylus. It may take a couple of scrubs on a single frame to identify the sound. Using one of the three methods described previously, you can scrub and update each frame to use the correct mouth shape symbol until the entire Timeline has been synced.
SmartMouth works natively with the three methods discussed previously and edits your Timeline directly, as if you'd made the edits by hand; the Timeline and symbols remain completely editable. This flexibility allows you to make desired customizations, adding personality and subtlety to your character, just as you would manually, but in a fraction of the development time.
The file setup required for SmartMouth is identical to that of a file that you lip-sync manually. The file must contain mouth shapes (in a single symbol or in several symbols) and a layer with the audio set to Stream. For SmartMouth to distinguish speech properly, your targeted audio layer should contain only one character speaking at a time with no music or sound effects. Before setting up your file, install the SmartMouth and FrameSync extensions so that they can be accessed in the Flash authoring environment.
The Adobe Extension Manager makes it really easy to install extensions. Follow these steps:
You're almost ready to begin lip-syncing. First, let's review the setup in the provided FLA file; if you prefer, you can create some mouth shapes and add a custom audio file to replicate your own version of the project.
In the smartmouth_adc_demo_start.fla file, double-click the symbol in the mouth layer to enter symbol-editing mode and view the mouth symbol's Timeline. You'll find that the mouth symbol contains artwork for several mouth shapes on separate frames—and that each frame is labeled with its corresponding sound or purpose. Several frames have between added between the keyframes to make the labels visible on the Timeline (see Figure 5).
In the next section, you'll see how these frame labels facilitate a smooth workflow in both SmartMouth and FrameSync.
The other essential component for lip-syncing in Flash is streaming audio. If you return to the main Timeline (Edit > Edit Document) in the provided FLA file and select a frame on the audio layer, you'll see that the audio's Sync property has been set to Stream (see Figure 6).
To recreate these settings from scratch, you would follow these steps:
Once your project includes the mouth shapes and a streaming audio layer, you can begin lip-syncing.
Now comes the easy part. After ensuring that the mouth shapes and streaming audio elements are in place, the SmartMouth extension can do the heavy lifting:
Note: If you have not yet purchased a license for SmartMouth, the trial will limit you to 60 frames per run. For this example, that means that you'll need to run SmartMouth a second time, with a new frame selection to sync the entire 97-frame audio stream. To purchase a license, you can right-click (or Control-click) inside the SmartMouth interface and choose the option to Purchase a license. Alternatively, you can also visit the Ajar Productions site.
You may have noticed that SmartMouth has discovered most of your desired settings automatically (see Figure 7). Each setting is worth a closer look.
The input settings determine which parts of the Flash document will be analyzed (see Figure 8).
The Audio Layer menu determines which audio layer will be analyzed. Since the sample file only has one audio layer, this setting is obvious. However, when working on files that contain multiple audio tracks, always check that the Audio Layer is set correctly.
The Mouth Layer determines which layer will be referenced and updated with new mouth shapes. By selecting the mouth layer first, you ensured that SmartMouth would automatically select this layer.
Additionally, by selecting the entire mouth layer before running SmartMouth, you automatically populated the Start Frame and End Frame fields. These field determine which frames of the Flash project are analyzed by the SmartMouth extension.
The Action menu determines the output method (see Figure 9).
Actions determine the behavior that takes place when the audio analysis is complete. Available actions include the following:
For this example, leave the Action set to the default setting: Overwrite keyframes.
The Mode setting displayed in Figure 9 controls which lip-syncing technique will be applied. The Symbols, Frame #, and Labels modes correspond to the swapping, nesting, and nesting with labels techniques, respectively. The Labels mode has automatically been activated because the symbol inside the Start Frame on the mouth layer contains a symbol with frame labels.
The "Limit to" menu is only active in Symbols mode. It allows you to limit the items in the phoneme menus to a single folder in the Library.
Since frame labels have been provided for the various mouth shapes, SmartMouth has automatically matched those labels to the available phonemes and rendered previews of each mouth shape (see Figure 10).
You can change any of the mouth shapes that will be applied by choosing a new item from the corresponding menu. This selection causes the preview to instantly update. There's no need to change any of these settings if you're following along with the sample file. Click the Tell me, SmartMouth button to analyze the file (see Figure 11).
Once the analysis is complete, the keyframes are added to the mouth layer (see Figure 12).
In most cases, you may choose to touch up a few keyframes, but these tweaks are often minimal adjustments. Play Figure 13 to see how well SmartMouth analyzed and applied the mouth shapes to the sample file. In the example, only one keyframe was altered (the last frame was changed to a smile to reflect the character's pride in reciting the quote).
Figure 13. See how well SmartMouth can lip-sync automatically (click to play the animation).
If you'd like to make manual changes to your lip-syncing project, there's a tool to speed that up, too.
Making changes is no different than lip-syncing from scratch, except that some of the keyframes are already laid out. Since the sample file includes frame labels (or you added them yourself when you created a new FLA file), you can take advantage of the FrameSync panel that was installed with the FrameSync extension.
The FrameSync extension speeds up manual lip-syncing by providing quick access to the a graphic symbol's Timeline and symbol instance properties that can be altered with a single click. Follow these steps to see how it works:
Now you'll see all of the frame labels displayed in the FrameSync panel. Updating a mouth shape is as easy as selecting an item in the FrameSync panel (see Figure 15).
If you're using Flash Professional CS5, you can also take advantage of some of the newer features provided in the FrameSync extension. The frame controls make scrubbing a thing of the past (see Figure 16). Simply click the play audio button (middle button) to play a single frame of audio. After listening to the frame, you can choose the appropriate mouth shape and click the next frame button (>) to move on to the next frame.
Now that you're familiar with using the SmartMouth and FrameSync extensions, you can save yourself hours of time while lip-syncing in Flash. For more information on the tools and techniques mentioned in this article, visit the resources listed below.
General lip-syncing techniques
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Permissions beyond the scope of this license, pertaining to the examples of code included within this work are available at Adobe.