The web is the ultimate nonlinear medium because users control when things happen. They are accustomed to jumping from place to place and consuming content in small portions. As the web has evolved from its static-text and graphics-based roots, video has become enormously popular.
Unfortunately, technology for web video has not yet embraced the nonlinear nature of the web; viewers are generally limited to a simple Play/Pause interface. As a result, videos on the web tend to be short, whether user-generated clips or excerpts of longer originals. For example, the typical length of a video on YouTube is about two minutes long.
The wild success that web video has achieved in the last year has left behind the majority of existing video content—millions of videos longer than two minutes. This article describes how to address this issue by putting users in control to create a nonlinear experience consistent with the web.
Linear content consumption occurs when a viewer does not affect the sequence of a media presentation; the content producer is in total control of what the viewer sees first, second, third, and so on. The viewer may be able to affect the pacing of the presentation by pausing playback but, when he starts it again, it continues playing from the exact point at which he paused it earlier.
Television viewers and moviegoers are accustomed to watching linear video programming. Viewers typically sit back idly while the content comes to them; they cannot control the content or add to it. Only one sequence of events is possible, the one which was created by the producer of the video program.
Recently, technology advances have popularized the digital video recorder (DVR), which allows viewers to time-shift content easily—that is, record scheduled programming and watch it later. Unlike video cassette recorders, one benefit of DVRs is the ability to skip sections of a broadcast automatically, typically commercials. Millions of television viewers have now come to rely on this capability—pioneered by TiVo—and DVR usage continues to experience rapid growth. In effect, the DVR brings the first taste of nonlinear capabilities to television that web users take for granted.
Ironically, the majority of web users viewing a video online today must consume that video linearly. The predominant user interface for watching a web video, the Play/Pause interface, provides only a Play/Pause button and a slider to move the playhead through the video. Figure 1 shows the video player for YouTube.

Figure 1. Play/Pause web video interface as implemented on YouTube
Users start the video by clicking the Play button. They can change the sequencing of the video presentation by moving the slider back and forth. In practice, however, users rarely do this because a slider alone doesn't convey any information about what content lies at different points in the video. Sliding the playhead to a different point in a video is a fairly useless activity because nothing indicates what will appear at that new point. So while the common Play/Pause interface does afford some degree of random access to a video, it doesn't offer the information that users need to consume a video nonlinearly.
Nonlinear content consumption, on the other hand, occurs when the viewer takes an active role in controlling the sequencing of the media presentation. For some period of time (seconds or minutes), the content plays sequentially as created by the producer but, periodically, the viewer takes an action to change the sequence of events.
Streaming video allows the user to watch any part of the video at any time. With progressive download, the video file is downloaded to the user's computer sequentially, from beginning to end. The user can start viewing the video while download occurs—that is, before the file is completely downloaded.
Streaming video is more appropriate for nonlinear consumption than progressive download. Only with streaming can users access any point in a video as soon as they open the web page with the video. Nonetheless, even though streaming is appropriate for nonlinear consumption, video delivered in this manner has been consumed linearly until now. This is because all videos, whether delivered via streaming or progressive download, use the simple Play/Pause interface shown previously in Figure 1.
Web users are as active and in control of their content consumption as television viewers and moviegoers are passive and powerless. There are many reasons for this difference in behavior.
First, the ergonomics of web users differ fron those of TV/movie viewers. Web users sit leaning forward, less than two feet from the monitor, with one hand perched on a mouse and the other on a keyboard—devices designed with the sole intention of allowing users to control their experience.
Second, web users are faced with numerous opportunities inside the browser to click away from any given web page, including the Back button, a search box, and numerous links on the page. Indeed, there are so many choices simply one click away that web users usually spend an average of only a few seconds on each page.
Third, web users frequently interrupt their consumption of web content by paying attention to things outside their web browser: other applications on their computer and distractions outside their computer. On their computer, web users frequently hop between their web browsers and multiple applications (e-mail, chat, word processor, and the like).
Besides computer distractions, web users frequently interrupt their web activities to talk on the phone or to watch television. According to a survey by Universal-McCann, 18–49 year-olds in the US spend 20 percent of their total weekly TV-watching time viewing a computer simultaneously. In another study by Burst Media, 64 percent of college students reported that they watch television while using their computer.
A useful way to conceptualize web users' behavior with a web browser is in terms of whether the user or the content is in control. When users are in discovery mode—scanning elements on a page, clicking links or controls, or performing a text search—they are in control. When users view content in the exact order that it is presented to them—reading text word for word or watching a video from beginning to end without clicking—the content is in control.
Considerable evidence suggests that web users are almost constantly in discovery mode when they use a web browser. They surrender control to content rarely for only 15 or 30 seconds at a time, and only when the content is particularly compelling. Otherwise, they're discovering—either looking for new, interesting web content or trying to get their bearings after an interruption that pulled them away from their browser.
The web is a nonlinear medium because of the user interface of web browsers. The underlying protocol of the web, HTTP, is a file transfer protocol with the potential to enable nonlinear consumption. Web browsers augment HTTP with an easy-to-use interface for hypertext links, buttons, forms, etc. to make the nonlinear experience happen.
While browsers make it possible to consume text and graphics nonlinearly, the fact that video is viewed in a web browser does not necessarily imply that it is consumed nonlinearly as well. In fact, as I previously explained, the common Play/Pause interface for web video does not provide a nonlinear experience.
Everyone knows that the web is a nonlinear medium but many people—television and movie industry executives in particular—believe that "compelling" linear content can counteract this fact. Those who harbor this notion grossly underestimate the strength of web culture.
Nonlinear content consumption is now more than just a habit; it has become a vital part of the culture of those on the web. This is particularly true of people in their teens and twenties. Burst Media's recent study reports that "about 34 percent of college students...reported spending more than 10 hours a week online, while only around 19 percent say they devote at least the same amount of time to watching television or listening to the radio." Members of this generation of Americans, born after The Godfather and Star Wars were made, are much more likely to "lean forward" than "lean back," regardless of the content they're consuming, and regardless of where they are. This also means they're likely to have a pointing device in their hand wherever they are—a mouse if they're at a desk, a video game control if they're sitting on a couch—and that they'll use it frequently.
Those who belong to older generations, especially Generation X and Baby Boomers, are as likely to lean forward as lean back because they spend about as much time watching television as they spend online. When they're sitting in front of a computer, however, they're much more likely to lean forward and discover with their mouse and keyboard.
The web is a nonlinear medium that has spawned a nonlinear culture; yet videos are viewed mostly linearly. If web videos are linear and the web is nonlinear, how can it be that the popularity of web video has exploded in the past year?
Videos on the web have achieved this wild success "on one leg," so to speak. The explosion of broadband created the opportunity. Web videos have become extremely popular despite the absense of a nonlinear viewing interface. Instead, almost all interface development has gone into the nonlinear experience (in the HTML or Ajax) that surrounds the video.
In order to compensate for the shortcomings of the Play/Pause interface, video websites like YouTube emphasize videos of very short duration. One minute or less is the norm. For videos this short, watching linearly is a satisfying experience for most web users.
Unfortunately, this approach puts an enormous amount of control in the hands of one person (the video's editor) and takes control away from users. Most videos don't start out as 30-second events. Rather, an editor cuts up a longer video presentation—a football game, political speech, or TV episode—into short sound bites. These preselected edits preclude web users from identifying sets of clips they find interesting on their own. Users are left with one person's point of view.
But today's web is all about offering users the fruits of multiple perspectives. This is the secret of the success of Wikipedia, blogs, and social networking sites like del.icio.us and digg.com. Certainly, not all users want to identify and comment on interesting points in a video but more people want to do this than are currently able to. There is a very active, vocal subset of the total web population that wants to get involved in content rather than just consume it. Witness what bloggers have done for news reporting. Twenty years ago, practically all news reporting was done by a handful of powerful news organizations. Today thousands of blogs perform an integral function in the news delivery ecosystem.
Bloggers uncover news that wouldn't be uncovered by conventional news organizations. To cite just one example from a few years ago, one blogger discovered in 2004 that a 60 Minutes episode alleging that President George W. Bush received special treatment from the National Guard was based on fraudulent documents. Other bloggers report from parts of the world where mainstream news organizations either do not cover the news adequately or cannot because of threats to safety.
Play/Pause video players are not well suited for bloggers to comment on particular sections of a video. It is very difficult, if not impossible, to provide a hypertext link from a blog comment to a point in a video of a Play/Pause video player.
The web needs a nonlinear player. In the remainder of this article, I describe an alternative approach to consuming and commenting on web video, overcoming the limitations of existing web video technology.
To offer a nonlinear user experience for viewing video on the web, two key components are necessary:
Figure 2 demonstrates how this can be accomplished. In this example, the Click.TV player shows the 2006 FIFA World Cup Final between Italy and France.

Figure 2. Video player with user-generated comments
The next sections describe two different ways to make web video a nonlinear experience.
As I mentioned earlier, it is important to provide the capability for web users to comment on, or blog about, web video. Most video players on the web today make this task awkward because there is no way to tie comments to specific points in time in the video.
Unlike the Play/Pause interface, Click.TV allows visitors to comment inside the video. These comments also serve as hyperlinks into the video because users can click on a comment to go instantly to the appropriate point in the video. Flash Media Server makes it possible to serve up that portion of the video instantly. Users view comments that others make about what's happening inside a video and they "talk back," engaging in a digital conversation about that event in the video.
To exploit the nonlinear nature of video viewing further, the Click.TV player permits users to navigate to places in the video by hovering over the video itself, revealing a matrix of dots over the video, as shown in Figure 3. Each row represents a track, which is a collection of comments, and each dot represents a comment. Clicking a dot in the matrix is another way of moving the video to a comment.
The Click.TV approach allows web users to navigate around the video instantly via comments entered by the producer of the video, subject matter experts, or even other users. Certainly users can also click Play and just lean back and watch in linear fashion because Play/Pause is part of Click.TV, but "lean-forward" web users can decide to jump to specific portions of the video presentation and take control of it.
With hypertext video browsing, described previously, the user clicks to a specific point in a video, after which the video plays linearly. With dynamic video editing, however, the user plays a new version of a video by playing a sequence of comments created by other users. Comments in Click.TV are like video clips with text attached to them; they have "start points" and "end points." Thus in dynamic video editing the user plays multiple comments, one after the other. The result is a new video presentation assembled from parts of the original.
Users can take advantage of dynamic video editing in a number of ways.
A user can play another user's track, in effect playing that other user's "highlight reel" (see Figure 3).

Figure 3. Playing the "edman" track
We see from the matrix overlaying the video that the user is playing the "edman" track, a collection of comments created by the user edman. The full information about each comment is shown in the comments panel below the video. The comment currently playing is at the top of that panel. Once it is played from 1:55:07 to 1:55:17 (a 10-second clip), the next comment plays from 1:57:34 to 1:57:44, skipping the video between these two clips. This goes on until the last commented clip is played.
By playing the edman track, the user is playing a compressed version of the video as created by edman. The video is over two hours long but the edman track is less than two minutes long. Users can play other tracks, such as "Mike Lanza" in Figure 3, to get another person's perspective of the World Cup Final.
Finally, note that a user can click the EMAIL button to send the edman track to someone else in an e-mail or click the BLOG button to post it in a blog.
A Click.TV user can run a search on a video and view the search results as a track. In Figure 4, the user has run a search for the word "zidane" in the 2006 FIFA World Cup Final video and is viewing the Zidane search track. Shown are the eight comments that have that term, including France star Zinedine Zidane's penalty kick goal, his headbutt, and a few plays in between.
As in the case of the edman track, this user can share the Zidane search track via e-mail or on a blog by clicking Share.

Figure 4. Playing the Zidane search track
The two examples of dynamic video editing mentioned previously require a mere click of a button. In the case of a mash-up of clips, such as the capability offered by Jumpcut.com, users can add clips from one or more videos into a list of clips. They can also drag those clips around to reorder them. This is similar to the kind of functionality offered by video editing programs like Adobe Premiere Pro and Apple iMovie or Final Cut Pro. Click.TV does not offer this functionality at the present time.
Figure 5 helps explain how the Click.TV player makes nonlinear video viewing possible.

Figure 5. Click.TV initialization architecture
The Click.TV player is initialized as follows:
Step 3: The user's browser encounters the Click.TV code inside the web page’s code (see the following sample <object> tag) and finds a reference to the Click.TV player, which is a SWF file:
<object type="application/x-shockwave-flash" data="http://www.click.tv/ctss.swf/?rtmpurl=rtmp://www.click.tv/movies/Italy_France.flv& ... mid=06949f00870420c50501e82e01f1c9df"> ... </object>
It requests the player from the Click.TV server.
rtmpurl parameter in the Click.TV code (sample above) and uses it to request a video stream from Flash Media Server.mid parameter (movie ID) in the Click.TV code (sample above) and uses it to request comment data from the Click.TV server.Once initialized, the Click.TV player continues to communicate with Flash Media Server and the Click.TV server. It streams video from Flash Media Server when the video is playing or when the playhead is moved. It sends comment data to the Click.TV server when the user saves a new or edited comment.
Perhaps the key feature of the Click.TV architecture is that the video file is stored totally separately from the Click.TV player and comment data, so that they need only come together on the user’s computer. Thus, all that needs to be done to put a video into the Click.TV player is to drop the proper code on a web page and reference the video file in a parameter. Integration is really that simple.
To read more about the issue of making long-form content work on the web, see the following blogs:
The following resources can help you get up to speed with Flash video and Flash Media Server:
Mike Lanza is the former CEO of Click.TV. Prior to that, Mike founded and served as CEO of four software companies, the last two in online finance: 1 View Network, which he sold to Digital Insight, and Avolent Corporation. In addition to his software and Internet experience from his previous companies, Mike holds an MBA from Stanford University, as well as MA and BA degrees in Economics from Stanford. You can contact him at mlanza@click.tv.