8 August 2005
You are not required to have previous knowledge of XML or XSL. This article is intended to introduce you to XML and the fundamentals of XML-based application development.
I recommend that you have previous knowledge of HTML, WWW, and building web pages.
Note: This article was originally written for Dreamweaver 8. However, you can follow along with the examples in this article if you have Dreamweaver MX 2004 or later installed, or you can simply read the examples. You can also use any other text editor to use the examples. Other text editors, however, do not include the native XML support that Dreamweaver offers.
Whether your team contracted a project that requires the use of XML or you simply want to play with the technology, this article introduces you to the basics of XML. Many companies have started using XML as a way to exchange data. Governments have standardized on XML as a data-exchange format, and web developers are being required to learn how to work with XML. So it really is in your interest to learn about XML and enhance your skill set. Plus, the soon-to-be-released Macromedia Dreamweaver 8 has native support for creating and editing XML and XSL documents and it allows you to perform client-side and server-side XSL transformations. In this article, I will introduce you to XML, so you are ready to hit the ground running when Dreamweaver 8 ships. I will start by explaining the technology, some of its advantages and disadvantages, and give you examples of where and how to use it.
By now, you probably know that XML stands for EXtensible Markup Language. OK, so it's a markup language, much like HTML, which means it uses tags. But what does it do, and why do your clients ask for it? Surprisingly, XML doesn't do anything. It simply describes information and distributes it in a platform-independent format.
XML achieves platform-independence by not using a specific language. XML tags are not predefined, which means you get to write your own tags. The advantage that comes with this is that XML is self-descriptive.
Believe it or not, you've been using XML all along, without even noticing it. When you read news headlines directly in your e-mail client, or when you visit web pages from your cell-phone, you're actually using XML-based technology.
First of all, XML is not a replacement for HTML and it has a totally different purpose. XML was designed to describe, store, and exchange data, while HTML was designed to present data in a human-readable format. HTML uses a fixed set of predefined elements (usually known as tags and attributes) to define visual aspects of a document, such as page layout, text formatting, and to include links to documents or images. In HTML, you are confined to using this limited set of tags, therefore the type of information you can display is limited. Displaying a mathematical formula with HTML, for instance, can cause you a lot of hassle. XML solves such problems through extensibility: you can "invent" your own tags and your own document structure. You can add and remove elements without affecting the overall structure of the document.
<department> <employee> <name>John Doe</name> <job>Software Analyst</job> <salary>2000</salary> </employee> <employee> <name>Jane Fletcher</name> <job>Designer</job> <salary>2500</salary> </employee> </department>
You can try typing or copying this text into a new XML document in Dreamweaver, and then preview it in a browser. To create a new XML document in Dreamweaver MX 2004, click the File menu and select New. In the New Document window, select the Basic page category, then select XML from the list of basic pages.
Then, click the Create button. Dreamweaver will create a document that contains a line similar to this:
<?xml version="1.0" encoding="iso-8859-1"?>
This is the XML declaration and it must be included at the beginning of each XML document. It specifies the XML version and the character set used in the document.
Type the text from the above example in the new XML document after the first line. Notice that Dreamweaver natively supports syntax coloring for XML documents:
Note: Most browsers have XML support by default. To see how different browsers display and handle XML documents, please visit this page of the W3Schools website.
Clicking the minus sign next to each tag will collapse the element. To expand an element, you'll have to click the plus sign next to it.
The same example would be written in HTML like this:
<table> <tr> <td>John Doe</td> <td>Software Analyst</td> <td>2000</td> </tr> <tr> <td>Jane Fletcher</td> <td>Designer</td> <td>2500</td> </tr> </table>
If you load the page in the browser, it will look like a classic HTML table ( I added a border for visibility).
The tags in the example above were specifically designed to describe information about a company's employees. By comparing the two examples, you can clearly see that XML is content-driven, while HTML is format-driven—the XML tag names describe the data in the tags, whereas the HTML tags describe the presentation of the data in the tags.
The example above is meant to illustrate some key differences between XML and HTML. However, you should be aware that XML was not designed as a replacement for HTML and not every XML document can be translated into an HTML document.
At this point, things might seem a bit blurry with all these new buzz words floating around you. While XHTML is beyond the scope of this article, I will briefly explain what it is, to make things more clear.
XHTML stands for Extensible Hyper Text Markup Language and it's a cleaner, XML-based version of HTML. It is a stricter specification of HTML, designed to eventually replace HTML. XHTML contains almost the same tags as HTML, but rewritten to observe the XML syntax rules, which will be presented in greater detail further in this article.
Since XML stores and describes data, it resembles a database. You've probably already heard the words "schema," or "query language" mentioned in relation to XML. What makes then XML different from a typical DBMS (database management system)?
First of all, XML is portable. While databases rely on the specific database language they were designed for in order to be correctly interpreted, XML carries its meaning in its tags. XML is self-describing, which means it carries both the structure and the semantics of the data, while databases can only define the structure of data.
Furthermore, XML can represent data as hierarchical trees, as you have seen in the previous section. For instance, the
<employee> element is a child of the
<department> element, and is embedded as such in the XML tree.
Of course XML also has its drawbacks when compared to databases, but this is normal, since they were designed with different goals in mind. The most obvious disadvantage is that XML lacks database-specific features such as triggers, multi-user access, efficient storage, indexes, security, transactions, data integrity checks, or queries across multiple documents. This is normal, considering that a DBMS is designed for manipulating, storing, and retrieving data in a fast and secure way, while XML was designed to exchange data across platforms.
Consequently, searching in XML documents is also slower, because of the lack of indexes and search optimization features, common to databases.
Also, XML is more verbose, requiring a pair of tags and/or attributes for each data item.
While the data-centric/document-centric divide is somewhat obsolete, it is nevertheless important for understanding the XML philosophy and the main concepts behind the XML technology.
The next section looks at some of the advantages and disadvantages of XML and illustrates some real-life situations where you would be better off by using XML.
In order to figure out whereXML should be used, you should bear in mind its key advantages. Ibriefly mentioned some of them earlier. This section takes an in-depthlook at XML advantages over other mark-up languages or similartechnologies.
Whenworking with XML, people often forget the X stands for "Extensible."Extensibility means you can define any number of extra tags, withoutcrashing the application. For instance, you can add children to thiselement:
<employee> <name>John Doe</name> <job>Software Analyst</job> <salary>2000</salary> </employee>
To make it look like this:
<employee> <name>John Doe</name> <job>Software Analyst <responsibility>Write technical specifications</responsibility> <responsibility>Translate client requirements into software requirements</responsibility> </job> <hire_date>Jun 25, 2005</hire_date> <salary>2000</salary> </employee>
The application reading this XML document will still understand which employee is referred.
Portability comes from the fact that you get to define the tags and attributes. No special libraries or application servers are required to "read" an XML document (although your development environment needs some configuring in order to become XML-aware, as I will explain in a future article). XML documents are plain-text files, hence they do not require a proprietary software to interpret them, as do most binary files. This means you can use Notepad to open and edit an XML file.
Therefore, XML is the choice, when information needs to be exchanged across several (incompatible) hardware or software platforms or applications. This has led to several applications of XML in communication technology, including the popular WML (Wireless Markup Language) and WAP (Wireless Application Protocol). WML is an XML-based language used to markup Internet applications for hand-held devices, such as phones or PDAs. To learn more about WML, you can read this tutorial from the W3 Consortium.
Portability also comes in handy in business-to-business applications, where many companies need to exchange a large amount of financial information in a platform-independent way. Different applications use SOAP (Simple Object Access Protocol), a popular XML-based protocol, to exchange such information over the Internet. These XML-based information sharing applications are called web services.
With XML, you can reduce the risk of content redundancy. Your clients will concentrate on using HTML and CSS for defining the layout and presentation, which will not be affected by any changes in the underlying information stored separately in an XML file.
For instance, a content management system might supply documents to end-users in a variety of formats: HTML, PDF, etc. However, there is no point in storing a separate version of each document for each format. The content would be duplicated and would take up valuable disk space, making the CMS clogged with redundant information and slower to use. Using an engine based on XML, the content can be stored only once and then extracted and displayed in the desired format.
Any processing or formatting requirements should be handled by a separate XSL (Extensible Style sheet Language) document. An XSL style sheet specifies the presentation of data contained by an XML file. XML and XSL are combined at output time to apply the required formatting to the data in the same way that Cascading Style Sheets (CSS) let you style HTML.
I will cover XSL in more detail in my next articles, and I will explain how to use the two together to produce formatted output for your application. Dreamweaver 8 offers you a quick, visual way to create your own XSL style sheets for a custom XML file. You’ll learn how simple it is to import an RSS feed in your site and provide your visitors with the latest news or updates.
If you look at the three important benefits of XML, you'll understand immediately where XML should be used:
Already XML is widely used to transfer data between different database applications. Most DBMS (including Microsoft Access or phpMyAdmin already allow exporting database tables as XML files.
RSS (Really Simple Syndication) is one of the most popular applications of XML. RSS is just another XML document format designed for syndicating news and news-like content. Popular websites that rely on RSS include community sites like Wired and Slashdot, or personal blogs. Macromedia makes its Developer Center articles available as an RSS feed. The main advantage of RSS is that you have people read your content in the format they want and import it in their own websites. I will show how to consume a RSS feed in one of my next articles, using Dreamweaver 8.
It's easy to figure out where you shouldn't use XML, just by looking at some of its disadvantages.
If you followed the previous example, you should have an idea of what an XML document looks like. The syntax is pretty straightforward─its rules are clear and very simple. An XML document is made up of an XML declaration and a root element or tag containing several nested elements. I will briefly present the most important syntax rules before:
All XML documents must start with the XML declaration. If you use Dreamweaver to create your XML documents, the XML declaration is added automatically. The XML declaration is used by the applications calling the XML document to correctly read and interpret the information. By default, Dreamweaver creates XML documents that conform to the 1.0 specification and use the iso-8859-1 (Latin-1/West European) character set. The XML declaration is not an element and is not regarded as part of the XML document.
Next, the document should contain a single root element. In the previous example, the root element is
<department>. Suppose, however, the company has more than one department. Could a second
<department> element be added to the document, like this?
<?xml version="1.0" encoding="iso-8859-1"?> <department> </department> <department> </department>
No. You will need to define a new root element in this case:
<company>. The new root element can now have any number of child elements, that is, departments:
<company> <department> <employee> <name>John Doe</name> <job>Software Analyst</job> <salary>2000</salary> </employee> <employee> <name>Jane Fletcher</name> <job>Designer</job> <salary>2500</salary> </employee> </department> <department> <employee> </employee> </department> </company>
All other child elements must be enclosed within the root tag.
While in HTML single-tagged elements such as
<br>are allowed, in XML all elements must have a closing tag. If you omit aclosing tag, your browser will throw an error similar to this one:
The following tags were not closed: department. Error processing resource 'http://www.domain.org/company.xml'.
One of the new features in Dreamweaver 8 is the default codecompletion, which also works for XML files. If I type the following:
<company> <department> <employees>
Dreamweaver 8 will automatically close the tags correctly when I type
</. In the example above, the first time I type
</, Dreamweaver will insert
</item>. The next time I type
</, Dreamweaver will insert
</items>. The next time I type
</, Dreamweaver will insert
</root>Dreamweaver 8 understands where you are in the page and closes the tagappropriately. Code completion can really help you produce well-formedXML documents, especially if you're not a coding guru.
Moreover, tag names are case-sensitive, which means
<Department> is a totally different element than
<DEPARTMENT>.Obviously, the opening and the closing tags of the same element must bewritten in the same case. The following is an example of an illegal tagpair in XML:
<JOB>Software Analyst </job>
As I mentioned previously, XML elements are related through child-parent relationships. In the above example,
<employee> is a child of
<department>, which, in its turn, is a child of the unique root element,
<company>.In order to preserve these relationships, elements must be properlynested. While HTML allows tags to cross, as in the following example,in XML all elements must be properly nested within each other.
<b>This text is <i> emphasized </b> and italic</i>.
This is perfectly legal in HTML displays in the browser like this:
In XML, content, or actual information, is carried by elementsand/or their attributes. An element can contain simple text, otherelements or both. For instance, the following element:
<employee> <name>John Doe</name> <job>Software Analyst</job> <salary>2000</salary> </employee>
can be rewritten as:
<employee> John Doe <job>Software Analyst</job> <salary>2000</salary> </employee>
This means the element
employee contains mixed content: simple text and other elements.
Empty elements are also allowed. The next element could be read as"we have a job offering, but we're still looking for the right person":
The same element could be re-written using attributes:
<employee job="Software Analyst"> John Doe <salary>2000</salary> </employee>
Attributes in XML are the properties of an element. They describe its characteristics. You can use simple quotes (' ') or double quotes (" ") to mark attribute values. As you can see from the previous series of examples, the same data can be stored as either child elements or attributes. So which method is better? Ideally, you should use attributes only to give extra information about data, that is, when you need meta data. For instance:
<employee id="31"> <name>John Doe</name> <job>Software Analyst</job> <salary>2000</salary> </employee>
The employee ID is not relevant in this case for the actual data.This ID however, can be used by XML processing software to identify thecorresponding employee faster. Such information is called meta data,that is, data about data.
Using attributes instead of elements also has some disadvantages.The overall structure of the XML document becomes less clear and lessexpandable. Also, attributes cannot have multiple values and are moredifficult to work with. Imagine if the information about an employeewere stored like this:
<employee name="John Doe" job="Software Analyst" salary="2000"></employee>
This would defeat the whole purpose of an XML document–making information clearly structured and easy to exchange.
At this point, you may ask yourself this legitimate question: "OK, if I get to define my own tags, can I really use anything for an element?" The answer is yes and no. You can use anything for an element name, since there are no reserved words in XML, BUT you must observe these simple naming rules:
While it is OK to use "." and "-" characters within element names, I don't recommend it. The application processing the XML file might interpret these signs as operators. You can replace them with "_" characters if you need to use a longer name, as in the following example:
<employee> <first_name>John</first_name> <last_name>Doe</last_name> <job>Software Analyst</job> <salary>2000</salary> </employee>
Yes, basically anything. Non-English characters are also supported,but make sure you set the correct character set and that the clientapplication processing the XML document supports non-English content.Also, white space inside the content will be preserved, unlike in HTML.This means you can write consecutive spaces and they will not bestripped or removed.
All markup or programming languages allow comments. XML does too! The syntax is the same as in HTML:
<!-- This employee deserves a raise. -->
Earlier in this article, I've mentioned RSS as an application of XML.The next example shows a simplified version of the Macromedia DeveloperCenter feed, in order to illustrate the structure of XML documents andthe syntax rules that were applied.
<?xml version="1.0" encoding="utf-8"?> <rss version="1.0"> <channel> <title>Macromedia Developer Center RSS Feed June 27, 2005</title> <link>http://www.macromedia.com/devnet/</link> <description>Macromedia Developer Center is your center for the tutorials, articles, and sample applications you need to master Macromedia products.</description> <item> <title>Creating a Dynamic Playlist for Progressive Flash Video</title> <link>http://www.macromedia.com/devnet/flash/articles/prog_download.html</link> <description>Learn how to create an XML-driven playlist for viewing progressive FLV files.</description> </item> <item> <title>Chatting Through IM Gateways in ColdFusion MX 7</title> <link>http://www.macromedia.com/devnet/coldfusion/articles/imgateway.html</link> <description>Learn the fundamentals of gateway apps as you build a sample chat application that monitors your ColdFusion server status</description> </item> </channel> </rss>
Notice how RSS carries list-oriented information (items, or articles). Of course, RSS was designed to exchange large quantities of similar items, but for simplicity, I only included two items in the above example. The items make up a "channel" of information, which is characterized by a title, a link, and a description. Each item represents an article published on the Developer Center and has as child elements the title of the article, the link to it and a short description. Additionally, elements can be added for the author, the publishing date and the subject of each article, as you will see in my next articles. In this topic, I have covered what is necessary for XML to be well formed, meaning the rules to make it syntactically correct. When you become more familiar with XML, you should also understand XML document type definitions (DTD) and XML schemas, which are the rules you make for the XML-based languages you create and use. A DTD specifies the legal elements that can be used in an XML document. To learn more about validating your XML documents against a DTD, you can read this tutorial from W3Schools. An XML schema is just an XML-based version of DTD.
This article was meant to introduce you to XML. It is by no means a comprehensive description of the technology or a complete guide on developing XML-based applications. Instead, I encouraged you to complement your knowledge of XML and explore the different ways XML can be used for web development by visiting these additional resources:
If you already started working with XML, this XML validator from W3 Schools will let you know if your XML documents are well-formed.
My next article will introduce you to XSL, the language for formatting and presenting XML data. You can read it here. Also you should take a look at my articles on using the new XML and XSL features in Dreamweaver 8 on how to consume an RSS feed in your site using Dreamweaver 8 and how to configure your server for server-side XSL transformations.