Though H.264 codecs come from different vendors, they use the same general encoding techniques and typically present similar encoding options. Here I review the most common H.264 encoding options.
According to the aforementioned article, Overview of the H.264/AVC Video Coding Standard, a profile "defines a set of coding tools or algorithms that can be used in generating a conforming bitstream, whereas a level "places constraints on certain key parameters of the bitstream." In other words, a profile defines specific encoding techniques that you can or can't utilize when encoding the files (such as B-frames), while the level defines details such as the maximum resolutions and data rates.
Take a look at Figure 1, which is a filtered screen capture of a features table from Wikipedia's description of H.264. On top are H.264 profiles, including the Baseline, Main, High, and High 10 profiles that Flash Player supports. On the left are the different encoding techniques available, with the table detailing those supported by the respective profiles.

Figure 1. Encoding techniques enabled by profile (source: Wikipedia)
As you would guess, the higher-level profiles use more advanced encoding algorithms and produce better quality (see Figure 2). To produce this comparison, I encoded the same source file to the same encoding parameters. The file on the left uses the Main Profile; the files on the right uses the Baseline. A quick check of the chart in Figure 1 reveals that the Main Profile enables B slices (also called B-frames) and the higher-quality CABAC encoding, which I define later in this article. As you can see, these do help the Main Profile deliver higher-quality video than the Baseline.

Figure 2. File encoded using the Main profile (left) retaining much more quality than a file encoded using the Baseline profile (right)
So, the Main and High profiles deliver better quality than the Baseline Profile; what's the catch? The catch is, as you use more advanced encoding techniques, the file becomes more difficult to decompress, and may not play smoothly on older, slower computers.
This observation illustrates one of the two trade-offs typically presented by H.264 encoding parameters. One trade-off is better quality for a file that is harder to decompress. The other trade-off is a parameter that delivers better quality at the expense of encoding time. In some rare instances, as with the decision to include B-frames in the stream, you trigger both trade-offs, increasing both decoding complexity and encoding time.
To return to profiles: At a high level, think about profiles as a convenient point of agreement for device manufacturers and video producers. Mobile phone vendor A wants to build a phone that can play H.264 video but needs to keep the cost, heat, and size requirements down. So the crafty chief of engineering searches and finds the optimal processor that's powerful enough to play H.264 files produced to the Baseline Profile. If you're a video producer seeking to create video for that device, you know that if you encode using the Baseline profile, the video will play.
Accordingly, when producing H.264 video, the general rule is to use the maximum profile supported by the target playback platform, since that delivers the best quality at any given data rate. If producing for mobile devices, this typically means the Baseline Profile, but check the documentation for that device to be sure. If producing for Flash Player consumption on Windows or Macintosh computers, this means the High Profile.
This sounds nice and tidy, but understand this: While encoding using the Baseline Profile ensures smooth playback on your target mobile device, using the High Profile for files bound for computer playback doesn't provide the same assurance. That's because the High Profile supports H.264 video produced at a maximum resolution of 4096 × 2048 pixels and a data rate of 720 Mbps. Few desktop computers could display a complete frame, much less play back that stream at 30 frames per second.
Accordingly, while producing for devices is all about profile, producing for computers is all about your video configuration. Here, the general rule is that decoding H.264 video is about as computationally intense as VP6—or Windows Media, for that matter. So long as you produce your H.264 video at a similar resolution and data rate as the other two codecs, it should play fine on the same class of computer. (For comparative playback statistics for H.264, VP6 and VC-1, check out my StreamingMedia.com article, Decoding the Truth About Hi-Def Video Production.)
In general, this means that as long as you're producing SD video at 640 × 480 resolution and lower, it should play fine on most post–2003 computers. If you're producing at 720p or higher, these streams won't play smoothly on many of these computers. You should consider offering an alternative SD stream for these viewers.
What about H.264 levels? If producing for mobile devices with limited screen resolution and bandwidth, you also have to choose the correct level, which again should be specified by the device manufacturer. However, since Flash Player can handle any level supported by any of the supported profiles, you don't have to worry about levels when producing for Flash Player playback on a personal computer.
When you select the Main or High Profiles, some encoding tools will give you two options for entropy coding mode (see Figure 3):
Of the two, CAVLC is the lower-quality, easier-to-decode option, while CABAC is the higher-quality, harder-to-decode option.

Figure 3. Your entropy coding choices: CABAC and CAVLC
Though results are source-dependent, CABAC is generally regarded as being between 5–15% more efficient than CAVLC. This means that CABAC should deliver equivalent quality at a 5–15% lower data rate, or better quality at the same data rate. In my own tests, CABAC produced noticeably better quality, though only in HD test clips encoding to very low data rates. This is shown in Figure 4, from a 720p file produced with CABAC on the left and CAVLC on the right, both to the same 800 kbps video data rate. Figure 4 shows a portion of a frame cut from a 16:9 720p video. Now 800 kbps is very low for 720p footage; by way of comparison, YouTube encodes H.264 720p footage at 2 Mbps, over 2.5 times the data rate.

Figure 4. 720p file produced using CABAC on the left, CAVLC on the right
Though neither image would win an award for clarity, the ballerina's face and other details are clearly more visible on the left. The bottom line is that CABAC should deliver better quality, however modest the difference. Now the question becomes, How much harder is the file to decompress and play?
Not that much, it turns out. I tested this on two of the less-powerful multiple-core computers in my office, one a Hewlett-Packard notebook with a Core 2 Duo processor, and the other a Power PC-based Apple PowerMac. As you can see in Table 2, the CABAC file increased the CPU load by less than 1% on the HP notebook, and less than 2% on the Mac. Based on the improved quality and minimal difference in the required playback CPU, I recommend choosing CABAC whenever the option is available.
Table 2. CPU consumed when playing back H.264 files encoded using CABAC and CAVLC
| Computer | CABAC | CAVLC | Difference |
|---|---|---|---|
| HP Compaq 8710w Mobile Workstation – Core 2 duo |
31.1% | 30.5% | 0.6% |
| Apple PowerMac – Dual 2.7 GHz PPC G5 | 35.5 | 33.7 | 1.8% |
It's common knowledge that talking-head footage, where very little changes from frame to frame, encodes at higher quality than dynamic, motion-filled video. That's because H.264, like all high-quality motion codecs, is designed to take advantage of redundancies between video frames. The more redundancy, the higher the quality at any given bit rate.
To leverage this redundancy, H.264 streams include three types of frames (see Figure 5):

Figure 5. I, P, and B-frames in an H.264-encoded stream
Now that you know the function of each frame type, I'll show you how to optimize their usage.
Though I-frames are the least efficient from a compression perspective, they do perform two invaluable functions. First, all playback of an H.264 video file has to start at an I-frame because it's the only frame type that doesn't refer to any other frames during encoding.
Since almost all streaming video may be played interactively, with the viewer dragging a slider around to different sections, you should include regular I-frames to ensure responsive playback. This is true when playing a video streamed from Flash Media Server, or one distributed via progressive download. While there is no magic number, I typically use an I-frame interval of 10 seconds, which means one I-frame every 300 frames when producing at 30 frames per second (and 240 and 150 for 24 fps and 15 fps video, respectively).
The other function of an I-frame is to help reset quality at a scene change. Imagine a sharp cut from one scene to another. If the first frame of the new scene is an I-frame, it's the best possible frame, which is a better starting point for all subsequent P and B-frames looking for redundant information. For this reason, most encoding tools offer a feature called "scene change detection," or "natural key frames," which you should always enable.
Figure 6 shows the I-frame related controls from Flash Media Encoding Server. You can see that Enable Scene Change detection is enabled, and that the size of the Coded Video Sequence is 300, as in 300 frames. This would be simpler to understand if it simply said "I-frame interval," but it's easy enough to figure out.

Figure 6. I-frame related controls from Flash Media Encoding Server
Specifically, the Coded Video Sequence refers to a "Group of Pictures" or GOP, which is the building block of the H.264 stream—that is, each H.264 stream is composed of multiple GOPs. Each GOP starts with an I-frame and includes all frames up to, but not including, the next I-frame. By choosing a Coded Video Sequence size of 300, you're telling Flash Media Encoding Server to create a GOP of 300 frames, or basically the same as an I-frame interval of 300.
I'll describe the Number of B-Pictures setting further on, and I've addressed Entropy Coding Mode already; but I wanted to explain the Minimum IDR interval and IDR frequency. I'll start by defining an IDR frame.
Briefly, the H.264 specification enables two types of I-frames: normal I-frames and IDR frames. With IDR frames, no frame after the IDR frame can refer back to any frame before the IDR frame. In contrast, with regular I-frames, B and P-frames located after the I-frame can refer back to reference frames located before the I-frame.
In terms of random access within the video stream, playback can always start on an IDR frame because no frame refers to any frames behind it. However, playback cannot always start on a non-IDR I-frame because subsequent frames may reference previous frames.
Since one of the key reasons to insert I-frames into your video is to enable interactivity, I use the default setting of 1, which makes every I-frame an IDR frame. If you use a setting of 0, only the first I-frame in the video file will be an IDR frame, which could make the file sluggish during random access. A setting of 2 makes every second I-frame an IDR frame, while a setting of 3 makes every third I-frame an IDR frame, and so on. Again, I just use the default setting of 1.
Minimum IDR interval defines the minimum number of frames in a group of pictures. Though you've set the Size of Codec Video Sequence at 300, you also enabled Scene Change Detection, which allows the encoder to insert an I-frame at scene changes. In a very dynamic MTV-like sequence, this could result in very frequent I-frames, which could degrade overall video quality. For these types of videos, you could experiment with extending the minimum IDR interval to 30–60 frames, to see if this improved quality. For most videos, however, the default interval of 1 provides the encoder with the necessary flexibility to insert frequent I-Frames in short, highly dynamic periods, like an opening or closing logo. For this reason, I also use the default option of 1 for this control.
B-frames are the most efficient frames because they can search both ways for redundancies. Though controls and control nomenclature varies from encoder to encoder, the most common B-frame related control is simply the number of B-frames, or "B-Pictures" as shown in Figure 6. Note that the number in Figure 6 actually refers to the number of B-frames between consecutive I-frames or P-frames.
Using the value of 2 found in Figure 6, you would create a GOP that looks like this:
IBBPBBPBBPBB......all the way to frame 300. If the number of B-Pictures was 3, the encoder would insert three B-frames between each I-frame and/or P-frame. While there is no magic number, I typically use two sequential B-frames.
How much can B-frames improve the quality of your video? Figure 7 tells the tale. By way of background, this is a frame at the end of a very-high-motion skateboard sequence, and also has significant detail, particularly in the fencing behind the skater. This combination of high motion and high detail is unusual, and makes this frame very hard to encode. As you can see in the figure, the video file encoded using B-frames retains noticeably more detail than the file produced without B-frames. In short, B-frames do improve quality.

Figure 7. File encoded with B-frames (left) and no B-frames (right)
What's the performance penalty on the decode side? I ran a battery of cross-platform tests, primarily on older, lower-power computers, measuring the CPU load required to play back a file produced with the Baseline Profile (no B-frames), and a file produced using the High Profile with B-frames. The maximum differential that I saw was 10 percent, which isn't enough to affect my recommendation to always use the High Profile except when producing for devices that support only the Baseline Profile.
Adobe Flash Media Encoding Server also includes the B and P-frame related controls shown in Figure 8. Adaptive B-frame placement allows the encoder to override the Number of B-Pictures value when it will enhance the quality of the encoded stream; for instance, when it detects a scene change and substitutes an I-frame for the B. I always enable this setting.

Figure 8. Other B-frame related options
Reference B-Pictures lets the encoder to use B-frames as a reference frame for P frames, while Allow pyramid B-frame coding lets the encoder use B-frames as references for other B-frames. I typically don't enable these options because the quality difference is negligible, and I've noticed that these options can cause playback to become unstable in some environments.
Reference frames is the number of frames that the encoder can search for redundancies while encoding, which can impact both encoding time and decoding complexity; that is, when producing a B-frame or P-frame, if you used a setting of 10, the encoder would search until it found up to 10 frames with redundant information, increasing the search time. Moreover, if the encoder found redundancies in 10 frames, each of those frames would have to be decoded and in memory during playback, which increases decode complexity.
Intuitively, for most videos, the vast majority of redundancies are located in the frames most proximate to the frame being encoded. This means that values in excess of 4 or 5 increase encoding time while providing little value. I typically use a value of 4.
Finally, though it's not technically related to B-frames, consider the number of Slices per picture, which can be 1, 2, or 4. At a value of 4, the encoder divides each frame into four regions and searches for redundancies in other frames only within the respective region. This can accelerate encoding on multicore computers because the encoder can assign the regions to different cores. However, since redundant information may have moved to a different region between frames—say in a panning or tilting motion—encoding with multiple slices may miss some redundancies, decreasing the overall quality of the video.
In contrast, at the default value of 1, the encoder treats each frame as a whole, and searches for redundancies in the entire frame of potential reference frames. Since it's harder to split this task among multiple cores, this setting is slower, but also maximizes quality. Unless you're in a real hurry, I recommend the default value of 1.
Once you get beyond I and B-frame related controls, H.264 enables a range of additional encoding parameters, as I will soon describe. To put these in perspective, I would estimate that all the options described up to this point account for 90-95% of the quality available in H.264. The settings discussed in this section can deliver only the remaining 5%, which means that most users can accept the defaults without noticing the difference. Still, if you want to try to eke out the ultimate in H.264 quality,you can use the functions that the settings shown in Figure 9 control.

Figure 9. Other H.264 encoding parameters available in Flash Media Encoding Server
First is search shape, which can be either 16 × 16 or 8 × 8. The latter (8 × 8) is the higher-quality option, with the trade-off being longer encoding time. The next three "fast" options allow you to speed encoding time at the possible cost of quality. I typically disable these options.
Adaptive Quantization Mode and Quantization Strength are advanced settings that reallocate bits of data within a frame using one of the three selected criteria: brightness, contrast, or complexity. I would only experiment with these settings when areas in the video frame are noticeably blocky. Unfortunately, operation is extremely content-specific, which makes it impossible to offer general advice regarding which techniques and values to use.
Both the rate distortion optimization and Hadamard transformation settings can improve quality but lengthen encoding time; I usually enable both. Finally, the Motion estimation subpixel mode defines the granularity of the search for redundancies: Quarter pixel represents the highest-quality option, though the slowest to encode, and Full pixel represents the fastest but lowest quality. In my low-volume environment, I always use the Quarter pixel option.