23 November 2009
External code libraries
Drawing and rendering programs optional
I developed the idea for animated business cards when I read about a new concept that utilizes AR (augmented reality) technology to display 3D animations on a screen based on 2D block patterns. The finished product allows you to create business cards with unique color codes printed on the back. When you hold the color code up to a web camera, an animated character appears on the screen and starts talking! We named these digitally enhanced business cards BOW cARds. You can see how they work by watching this YouTube video we made about our augmented reality business cards.
This article describes the process that our team used to develop the BOW cARds. We began working with AR content in 2008, just after we formed our agency, BOW. It was then that our team decided to focus on creating business cards that display animations using a standard laptop with a webcam because we realized that business cards are a natural way of distributing 2D paper with printed patterns.
We decided to use the FLARToolKit, which was growing in popularity at the time. We completed our first round of testing without encountering any issues. The next step involved creating three unique 3D models that appeared onscreen when the camera recorded the corresponding marker patterns. With our proof of concept intact, we embarked on a full-scale development cycle to create interactive business cards so that people could customize their avatars and exchange them.
We faced two obstacles when trying to make this idea into a reality. In order to make the animated avatars appear, all of the marker patterns that can be captured by the camera must be registered by the standard multiple marker recognition mechanism when the FLARToolKit is initialized. (To accomplish this, we used the FLARMultiMarkerDetector, which is included in the FLARToolKit.) This limitation makes it difficult to add marker patterns dynamically, and it was clear that even if this became possible later, performance would suffer as the number of registered patterns increased.
Instead of searching for registered patterns, we developed a strategy to introduce common principles into the recognition patterns. Then we scanned these patterns into the system in order to identify the business cards and display unique, customized 3D models.
The process of extracting the data by scanning the marker patterns is similar to the process used to scan barcodes or other two-dimensional codes. For a while, we considered using QR codes, but we found that they were too detailed and failed to be recognized using low-resolution web cameras.
After conducting many tests, we concluded that QR codes are not suited to AR use when the receiving camera has to be able to move freely. It was imperative that the business card codes rendered successfully on any laptop, so we started work on developing our own proprietary designs. As we defined our code system, we considered the following aspects:
As we investigated using different configurations of marker patterns, we discovered that monochrome codes make it easy to maintain the level of accuracy, while colored blocks make it possible to increase the amount of data stored in each cell.
After another round of testing, we decided to use four-color codes in the marker patterns. This approach ensures that the pattern remains as simple as possible while reducing the number of grid squares. We also based our color patterns on the appearance of generic business cards—printing the designs with 100% CMYK colors to ensure that a clear contrast could be produced using a consumer-grade home printer.
When we studied the issue of orientation identification, we found that it is necessary for markers captured by a camera to be recognized while inverted at a 90-degree angle. If the top edge of the marker is not identifiable, the code cannot be scanned. Without proper registration, it was impossible to determine the direction to display the 3D image. We decided to follow a simple rule for all patterns: In every marker, when you review the four corner cells, the top-left cell will always be cyan and the remaining three corners will always be black or yellow. This makes it easy for a webcam to establish the correct orientation when the input is received.
In order to implement error detection and correction, we included additional data as a mechanism for recognizing errors (detection) and restoring original data (correction) whenever a portion of the data is not correctly captured. When we tested the recognition of prototype markers, there were occasional failures in reading the color of a cell, but the error correction proved effective. We discussed numerous ways to perform error correction, but in the end we decided to use Hamming code because of its ease of implementation and fluid operation.
Finally, we determined the number of grid squares to use. We began this process by evaluating the points we planned on using and the volume of data we wanted to store. After testing several different iterations, we managed to store 33 bits of information in a 5 × 5 grid, using 45 bits of data from the 25 cells (after excluding the orientation identification) and incorporating a three-block (15,11) Hamming code. After testing, we confirmed that this configuration provided the sufficient size for representing a business card ID.
Next, our team focused on developing the color code specification. We came up with the idea of making color codes with visual meanings for each team member. To begin, we designed several dot-art images that fit into 5 × 5 cells. However, these images were drawn in an offhand manner, which caused the images to display data errors unlike those with properly encoded IDs. To resolve this issue, we created a special tool that displays the decoded and re-encoded color codes to the side during the drawing process. To resolve any errors, it also adjusts the marker colors until the left and right pictures are identical. As a result of this process, my coworkers Yoshikawa and Takahashi ended up with smiley faces on their cards, while Ishizaki had an "Ishi (stone)" character on his card (see Figure 1).
Since business cards are a tool for networking and making connections, we came up with the idea to make the avatars interact with each other and say hello. When only one business card is scanned, the avatar usually just stands motionless on the business card. We wanted to take this a step further, so that when you place two business cards side by side and scan them, they rush over to say hello.
In other projects, you could use the orientation matrices that FLARToolKit calculates from a marker to determine when another avatar is present, but this method did not work for us. Testing confirmed that when two business cards are placed next to each other in real space, the slightest misidentification often returned two orientation matrices that didn't mesh at all. We decided to abandon 3D determination because it was not well suited to our goals.
We resolved this issue by calculating the screen coordinates of the four corners of each marker to produce a "view box" in 2D. Once the view box is established, the program can determine whether or not the avatars exist within the boundary of the markers (see Figure 2).
Next, we added functionality to make the avatar speak. We accomplished this by displaying messages sent by the business card owner over Twitter. When a second avatar appears onscreen, the Twitter messages sent by its owner are displayed (see Figure 3).
As we defined the scope of our project further, we decided to create an online system that allows site visitors to control the design of their own avatar. We felt that personalization would make the avatars more likable, and encourage more users to participate in the BOW cARd project.
Unlike typical FLARToolKit applications, BOW cARd calls the 3D drawing API in Adobe Flash Player 10 directly instead of using Papervision3D. We made this choice to achieve more fluid operation, and also to keep our options open for integrating new functionality in future versions. We were very careful as we selected the design process, balancing the strengths and limitations of the Flash drawing API.
We realized that performance benefits are harder to gain and z-sorting becomes more complex when the Graphics.drawTriangles method is called multiple times. To address this, each avatar is drawn with a single call. The 3D modeling of the avatars is carried out using Metasequoia, and then we use a Perl script to convert the MQO file into an ActionScript 3 class file.
All of the body part variations (including the torso, legs, and ears) are stored as a single vertex vector. During rendering, only the vertex indices for the necessary parts are merged, z-sorted, and passed to the drawTriangles method. This approach makes it possible to achieve a wide range of different characteristics when displaying the customized avatar models (see Figure 4).
Every day, the number of custom avatars using the BOW cARd technology increases, but the range of individuality that can be displayed by simply combining body parts is limited. To make the avatars more customizable, we decided to add a picture drawing function. It is a common practice to display the texture as a development view and to draw onto this area. But for this project, we decided to draw directly onto avatars appearing in 3D space.
The first step of the drawing process involves identifying the mouse coordinates. Next, each polygon is scanned, beginning with those objects closest to the camera. If the mouse coordinates fall within the screen coordinates of the three apexes of a polygon, texture coordinates are obtained by temporarily interpolating the UV value for these three apexes with the current UV value.
Next, we added programming that handles instances in which the processing is performed right after the avatar is rendered. As the z-sorting function completes itself directly after rendering, each polygon is lined up in order of proximity to the camera. As the screen coordinates of the three apexes of the polygons have already been calculated, scanning of the marker patterns can be performed with extremely low overhead.
By brainstorming and prototyping the functionality, we developed the strategy that forms the basis of the direct drawing mechanism. We did not consider the reverse transformation of each avatar's projection, so there are times when the drawing on the avatar model doesn't appear exactly as intended—but drawing directly onto the image is definitely more fun. And this new feature allows users to truly customize each avatar to create unique personalities (see Figure 5).
As you can see, our initial idea to create innovative business cards kept developing. Before we knew it, we incorporated a whole variety of other customization options.
There are many other features that we are planning to add. For example, we hope to introduce a commemorative photo function that works in conjunction with Flickr. We are also developing an Apple iPhone app that allows you to create marker patterns without a printer. We invite you to visit our site and create your own business card avatar.
To learn more about working on AR projects, read Samuel Asher Rivello's article, Augmented reality using a webcam and Flash. Also be sure to find more tutorials and ActionScript 3 sample projects in the Flash Developer Center.
Note: This article appeared originally in the Japanese edition of the Edge newsletter.