18 June 2012
All
Why do you write HTML using the <h1> tag and the <p> tag? Why not just use <div> and <span> tags for everything? Why use any specific HTML tags at all?
The reason is that <p> and <h1> tags convey extra information about the content. They say "this is a paragraph" and "this is a heading at the first level", respectively. This is semantic HTML, or HTML for which the author makes every effort to ensure that the markup organizes and structures the content. To some degree all web developers practice it. This article will explain semantic HTML, and explain why you should go deeper into it.
Put simply, semantic HTML is HTML that uses the correct element or tag for the job. Take the <h1> element as an example. What is it for? The semantic answer is "For the top level heading on the page." Using an <h1> element just to make text larger is the textbook example of nonsemantic HTML. In addition to the heading tags ( <h1> through <h6> ) there are a slew of HTML elements from the common <p> element to the rarely encountered <cite> and <dfn> elements. Each has a specific meaning, and each can be used to give your HTML a better structure to style with CSS and manipulate with JavaScript.
Semantic HTML is also about labeling content by what it is rather than what it looks like. For example, consider the ubiquitous blog archive panel. It's the list of links to other pages in the blog that often sits on the right side of the page. When giving the archive an ID or a class for CSS styling, you can use a name that reflects where it is on the page; for example, rightpanel . Alternatively, you can use a name that indicates the role it plays in the content; for example, sidebar or better yet archive .
Semantic HTML is HTML in which:
<p> element.<ol> element.<blockquote> element.<h1> contains headings; it is not for making text bigger.<blockquote> contains a long quote; it is not for indenting text.<p></p> ) is not used to skip lines.<font> or <center> .The purpose of all this is so consumers of your code, be it people, browsers, or screen readers can consume the content and easily parse it, both objectively for machines and subjectively for people.
Traditionally, there are five main arguments in favor of semantic HTML.
Semantic HTML styled by CSS typically requires less code than HTML formatted by tables. However, it's worth noting that you can write table-less HTML that isn't semantic. You'll still probably reduce the size of your code, but you won't make the code any easier to understand.
Accessibility enables people with disabilities to consume your site. Semantic code tends to be more accessible. When you properly label aspects of your pages as titles, headings, paragraphs, and lists, you make it easier for screen readers and other assistive technologies to parse and present the content in a form that a disabled person can understand. However, note the emphasis on the word tends. Semantic HTML is not a magical solution to make your site compliant with accessibility guidelines; it just makes building accessible sites a little easier.
As with accessibility, semantic HTML tends to improve search engine optimization (SEO) by making your site easier for software to parse your site content. Search engines scan the HTML text contained in your HTML files. They don't render CSS; they don't run JavaScript. If your important content isn't in the HTML, search engines may never see it, and won't rank you accordingly. Also, by removing HTML cruft from your page, and having only markup that describes your content, it is easier for a search engines to get at what your site is really about. This technique is considered "white hat" SEO. It's perfectly acceptable, and no search engine is going to penalize you because you semantically optimized your page. (In contrast, using hidden text to increase your relevance on a particular topic is considered "black hat" SEO.)
It should be noted that there is no guarantee that semantic HTML is better for SEO. Web developers think search engines favor semantic HTML, and Google's input into HTML5 suggests that they do. However, search engines closely guard their algorithms, and have to allow for the fact that extremely relevant content may be placed in nonsemantic HTML.
Semantic HTML takes advantage of the fact that a news item will always be a news item, and an archive will always be an archive, no matter where they are positioned on the page. However, a rightbar won't always be on the right side. Additionally if you are syndicating your content via an RSS feed and including HTML in it, the less markup the better. However, most sites and blogs don't syndicate their content straight from prepared HTML. They're usually built separately, and the syndication format is handled to make sure that other consumers understand the content.
You may have noticed a theme in the arguments for semantic HTML covered thus far. All of these sound like good reasons to use semantic HTML, but none of them individually really seals the deal for me. Perhaps all of them add up to be enough to justify it to you. If so, that's great. But I contend that there is a really good reason to write semantic HTML today. Coding is communication, both to a computer (which is the easy part) and to other developers. Semantic HTML is easier for humans to understand than nonsemantic HTML. A div element with a class of r1c4 is not as easy to figure out as one named pullquote . By using semantic HTML you help other developers and HTML authors understand what your code is doing.
It's important to note that there is a great deal of subjectivity in this space. In his article About HTML semantics and front-end architecture, Nicholas Gallagher makes the case that classes and IDs can't be nonsemantic because semantics are about meaning, and anything you put in a class or ID has meaning. His logic is sound, but in my opinion this is not the best way of looking at things.
Semantics are not binary. You cannot be entirely semantic, nor can you be completely nonsemantic. Semantic is a continuum quality. Web content exists on a continuum between completely nonsemantic and completely semantic, because those poles are unreachable. With that in mind, as you increase the ease with which consumers of web content can understand what all the pieces of your web content mean, you move the content across the continuum from nonsemantic to semantic.
One of the more contentious parts of HTML semantics is the naming of IDs and classes. Part of the problem has been the gap between what is defined in the HTML specs and what shows up on real world pages. For example, consider site navigation. Since the mid-1990s the navigation menu has been pretty ubiquitous, and yet for the bulk of that time, there has not been a single standardized way to mark up navigation. The same holds for page headers, footers, and content in the form of articles or posts. Because there is no standard way to do it, and lots of possible ways to do it, there's been considerable disagreement on how to do it properly.
Some of this has been alleviated by HTML5. During the development of the HTML5 spec, leading web content companies shared analysis of the most commonly used ID and class names, and used this to guide them in the creation of new HTML5 elements that would be useful in marking up many of the commonly used pieces of content on the web. The result is a list of new elements that includes:
<header><footer><nav><article>There are many more elements, which are detailed at the W3C's list of differences between HTML4 and HTML5.
The point of these is to create a more standard way of marking up content. The more consistent the markup, the easier it is for both humans and devices to consume it.
I think one of the biggest problems with semantics in HTML is the tendency for people to take the features of semantic HTML as objective rules instead of guidelines. Writing HTML with good semantics is something you should do, not something you must do. If you are going to use semantic HTML, doing it to enhance the content will yield better results than doing it to follow a set of best practices without understanding the theory behind them.
Semantics make HTML easier to understand. This means you should take care that your <h1> tags, <h2> tags, and <h3> tags are properly nested. It means you should spare a moment of thought to ensure that <section> is the right tag to wrap some content. It doesn't mean that you should get into a week-long email thread with forty messages discussing whether or not that <div> on that page should have an ID of news or breaking . Pick one, after a bit of consideration, and then move on with your life. As long as thought has gone into communicating what your intensions for the content were, you should be fine. Another developer will come by, look at the content, understand it, and be able to jump in. Suppose you chose news . There's a good chance that a semantic minded developer will wonder why in the world you named it news when it clearly should be breaking . There's almost nothing you can do to prevent that sort of questioning in such a subjective field. However, they won't confuse it for a header, footer, or navigation bar, and that's a more important achievement.
Semantics are valuable, and can aid in accessibility, SEO, reusability, and developer-to-developer communication. They exist along a continuum and your goal should be to move along that continuum to more semantic content, but not kill yourself trying to reach an unattainable goal of perfect semantics.
Below are a few resources to help you improve your use of semantic HTML.
The World Wide Web Consortium (W3C) is the standards body for the web. As such they are responsible for the definitions of all elements. They have strong ideas (or absolute, depending on whom you ask) on the use of HTML elements. Each one of the elements in HTML has detailed documentation, which includes semantic usage of the element.
WhichElement.com is a site that I started to discuss proper semantics for content. For example if you are looking for a way to mark up a calendar semantically, whichElement.com has the answer for you.
HTML5 Doctor is a resource to help developers use HTML5 as it stands in the ever involving present. The site has lots of great information on the semantic usage of HTML5 elements.
Stephanie (Sullivan) Rewis has a great series of articles, Understanding HTML5 semantics, that can help you jump into using new semantic HTML elements.
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Permissions beyond the scope of this license, pertaining to the examples of code included within this work are available at Adobe.