13 February 2005
Note: This article was written for Dreamweaver 8. However, the content in general is still valid.
You may have been recently introduced to XML in a variety of ways, so it can often be difficult to glean the most relevant information quickly as to how the language can be applied to your own projects and workflows. Although there's plenty of material available online to help you learn the basics of XML, I'd like to take a step back and look at what XML is—and is not—and how it's used in a wide range of applications from consumer to developer.
As connected applications take a greater foothold in the marketplace, and standard means of communicating between them are required, XML will play a key role in brokering this information exchange. In some ways, the revolution has already silently taken hold. Let's take a look.
At its core, XML (eXtensible Markup Language) represents a standardized framework for defining markup languages, along with an associated set of generic tools for processing XML-based documents. It's a way for you to define structured information of all kinds—content, object data, inter-application messages, or syndicated content summaries.
RSS (Really Simple Syndication), arguably the most popular content syndication format on the web today, is simply an XML-based schema. XHTML is essentially the HTML markup language you're well familiar with—only adjusted to conform to XML structure, syntax, and validation. So it is yet another XML-based language. WML is the XML-based markup for WAP services. The list goes on. By being built upon XML, these case-specific schemas have inherited XML's rich infrastructure for free. No complaints here!
Outside of the many structured language implementations of XML, the most obvious role of XML as a stand-alone "file format" is to represent data apart from visual markup. But why would you want to store content in XML as opposed to a database? To one point of view, XML is a more neutral and accessible container for your data to reside within—one that does not require as much database adapters and local configuration. Although it's also common to find XML used as streamed data from a connected application—that is, as an environment-neutral communications vehicle—XML content is designed to be equally effective stored as a flat file.
XML is very effective at providing aggregate views of content, such as syndicating a range of documents. RSS is one format widely used for syndication, and in most cases is autogenerated by the system managing the site content. Web log frameworks like Movable Type and Blogger generate and update a static RSS file on your server whenever a new post is published to the system, which RSS-savvy clients can then read and use (in most cases an internal stylesheet) to create the visual layout of the structured content data in the RSS/XML feed. Here XML is not serving as a replacement for a database but as a specific view upon the data within that can be read and processed by a much wider variety of clients than a fixed web application.
As I noted earlier, if XML has become the standard for representation of web-based information, then you would need some way both to make the more generic XML data appear itself as other XML-based documents, and visually format the information for web browsers, mobile phones, and so on. XML transformation allows you to define rules that convert one class of XML document into another, and by using generic XML tools (XSLT, XQuery, and XPath) you can build reasonably complex aggregate views of your content.
The structure of an XML document is hierarchical in fashion—a labeled and ordered tree of data. The nodes of this tree largely fall into one of two types: character data (or CDATA) nodes that contain text strings and element data nodes that correspond to the structured data of your document. The element data nodes consist of both a name for the element's data type and a set of attributes that relate to that element type.
<img> tag, for example, which contains several properties that further define it: height, width, alt text, border size, and so on. There are other types of nodes and elements in XML trees such as comment nodes, instructions for processing the document, and (probably most importantly) the schema to which your document subscribes. The schema lets consumers of the document know both how it's to be parsed and what constitutes its validity.
As opposed to HTML, which—despite being a markup language itself—has a somewhat lax structure, XML is a much stricter environment and requires its documents be well-formed. So although you may not use XML directly in practice today, it's important to at least become comfortable with its conventions and start writing well-formed code early on.
The actual structure of an XML document is straightforward. The following XML document is the data representation of a message from Jane to John:
<?xml version="1.0" encoding="ISO-8859-1"?> <message> <from>Jane Doe</from> <to>John Doe</to> <date>February 14, 2006</date> <body>Are the annual report files finished yet?</body> <priority>high</priority> <attachments> <attachment type="jpg">http://intranet.foo.com/files/file_1.jpg</attachment> <attachment type="pdf">http://intranet.foo.com/files/file_2.pdf</attachment> </attachments> </message> </xml>
Note first that this (and each well-formed) XML document starts with an XML declaration that defines both the XML version and character encoding used within the document itself. The root node of this particular document is
message—wholly appropriate given its content—and that
message can contain several properties:
Syntax-wise, XML is both very strict in its interpretation, while reasonably easy to learn and use. Here are a few guiding principles:
<!-- comment -->)
In a nutshell, an XML file is simply a text file that wraps data into logical constructs. How those constructs are parsed is another matter. XML schemas (representing classes of documents like an RSS feed or WML deck) are the current model for describing the structure of information within a class of XML documents, defining the constraints that a particular XML-based language or implementation follows and the structure of the data encapsulated within. Without a schema, machine validation has no larger context to determine whether your data structure is valid, or translate that data into other forms.
As you can see, XML not only lays out its data in a very clear, structured format, but it's also incredibly verbose and self-explanatory to the various clients and applications that may access it.
A good example of how XML can be used as a clean content model in a shipping product today is the new (and relatively unearthed) XSLT Transform feature in Dreamweaver 8. You can read through a more detailed overview of this in my blog posting, FOTD 22: Dreamweaver 8 – XSLT Authoring. A site could be easily constructed by combining an XML document that holds the content data (your page text, perhaps pointers to specific image files and their dimensions, site navigation, headers/footers, and so on) and an XSL (eXtensible Stylesheet Language) document in conjunction with a CSS style sheet that defines how the page is visually constructed.
Why is this important? By separating your content from your data, you have the ability to work on each without affecting the other. For example, content teams could be updating the XML content and image pointers in the XML file at the same time that visual designers are tweaking the CSS style sheet (visual layout) and an integration engineer is fine-tuning the XSL document that pulls it all together. No collisions by three parties trying to check out or open the same HTML file at the same time. Development harmony achieved.
Earlier in this article, I stated that RSS feeds are one of the most popular XML-based schemas in practice today. It's become very common to see the XML icons alongside content in popular news and content sites these days, pointing to syndicated summaries of new and updated content. More and more web-based applications like Google's Gmail and the social bookmarking site del.icio.us allow you to receive RSS-based summary information whenever new e-mail reaches your account, or new bookmarks in your subscriptions are entered.
The new XSLT transformation feature in Dreamweaver 8 is one of the first front-end design applications to really dig into the depth of XML and the resulting flexibility when transforming that XML into an XHTML-based representation for browsers. Although you can read more about it in my blog posting, the feature essentially allows you to access the hierarchical structure of a local or remote XML source and manipulate it much as you would a database query.
Besides autogenerating the XSL file for your transform, Dreamweaver provides a front end to the generic XML utility, XPath, which can allow you to build conditional logic into your transformation by pulling out specific elements from an XML file and acting on them. With the combination of an XML data source, XPath queries to find content-specific data within that XML source, XSLT to help transform that content into browser-readable XHTML, and a CSS style sheet to add the visual look and feel to it, you can build a web page (and website) without a single HTML file in view. Very cool.
Furthermore, many traditional desktop-based applications are starting to embrace XML in a number of ways. The Apple Mac OS X operating system makes use of XML-based PLIST files to store application preferences and data. The file formats in Microsoft Office are now based on XML. New hybrid connected applications like the Delicious Monster Library manage datasets in XML and let you publish your data easily.
With the ubiquity of a flexible content markup environment like XML, it makes less and less sense to bury data in proprietary formats, and more and more sense to standardize the data and let the applications—whether web-based, device-based, service-oriented, or desktop-based—handle the rest.
I hope this article has broadened your perspective of XML and given you a high-level overview of the need it fills in the next-generation, Web 2.0 world of connected applications and content. But I also hope it has provided some ideas as to how you can start considering XML in future projects, and as solutions to the problems you face today.
It's not a leap of faith to say that you can expect to see more and more data exchange implementations move towards XML-based solutions in the future. So if you've been putting off learning more about XML, now's the time!