Semantic HTML5?

Written by Ian Elliot

Friday, 09 September 2011

Semantic markup - it's an idea whose time has come in HTML5. You will often encounter beginners being instructed on how to use tags to indicate the meaningful structure of documents. But is this a good idea? Is markup about meaning or about layout?

The story of semantic markup is an interesting one. At first it looks as if it is an arbitrary decision to be imposed on markup languages just to confuse beginners. However, when you look into it, the reason that HTML5 has gone semantic is not down to a choice. It is almost a law of nature that when you introduce a separation of concerns you create a semantic categorization on one of the "concerns". But first let's take a more naive view of the situation.

HTML started out as a simple markup language. You used tags to indicate how parts of the text were to be displayed and to incorporate other resources such as images. It wasn't a very powerful language but it worked.

HTML5 has introduced a whole set of semantic tags - Semantics (from Greek sēmantiká, neuter plural of sēmantikós) is the study of meaning. For example: <article>, <aside> <figcaption>, <figure>, <footer>, <header>, <hgroup>, <mark>, <nav> and <section>. We are supposed to use these to indicate the type of the content and not how it is to be displayed. HTML5 is semantic markup. The reason put forward is that the use of these tags makes it possible for programs to extract the information in a page and reuse it to create some sort of "mashup".

The danger of mashups

This seems like a good idea but there are a number of obvious problems with it. The first is that my meaningful tags are probably not going to be your meaningful tags and certainly not just the limited set provided by HTML5. The usual solution to this is to use divs with meaningful class names such as name, address and so on. Standardizing these class names leads on to the wonderful world of microformats and, yes, this does sort of work. It does allow search engines to index information more accurately. On the other hand, improvements in search engine technology should make semantic markup unnecessary.

It is also often stated that one of the advantages of semantic markup is that it allows programs to scan a web page, extract the information and create a "mashup". While semantic markup brings search engine advantages, having your site used in a mashup doesn't thrill everyone. Perhaps marking up your site in a non-meaningful way is a protection against having the information sucked out of it. This is just the old practice of program obfuscation applied to web sites.

The table tag

On the whole semantic markup is a weak and fairly blunt tool - especially when used to beat beginners into submission. There is one irritating case, however, where semantics seems to have gone mad - the table tag. You will often find that HTML experts lay down rules like

"Don't use tables if you just want to create a grid layout".

The beginner is often told that there are other, more meaningful ways, of creating a grid layout using CSS. However, these usually seem complex and difficult compared to the direct route of using a table with rows and columns.

We are told that tables are for data that is naturally and meaningfully laid out in a table format and not for arbitrary presentational schemes like adding columns to a page. This explanation, that tables should be meaningful and the table tag is a semantic not a layout tag, is often difficult for the beginner to swallow.

So HTML5 is about semantic tagging - but who forced that decision on us?

The simple answer is that no one did.

You have to convert HTML into a semantic tagging system if you are going to invent CSS, Cascading Style Sheets.

Separation of concerns

The idea is that anything to do with layout and presentation should be in the CSS file and not in the HTML. This is an example of the "separation of concerns" principle that runs though programming. Whenever you can separate out different aspects of creating a system the result is a simplification. In this case we separate out the layout and presentation aspects of markup into CSS.

The fact of the matter is that HTML became a semantic markup language as soon as CSS was invented.

In this case all matters of layout and presentation are to be removed from HTML and placed in CSS. So what is left for HTML to markup?

The answer is that HTML has the task of trying to identify similar entities that can be presented in the same sort of way. Imagine if every Button object was to be treated in an entirely different way - this would make the task of CSS next to impossible with a style having to be defined for every entity in an HTML file. Separating layout and presentation into CSS only makes sense if the HTML is used to markup up semantically similar entities - and so it is with all separations of concern. Whenever you abstract away an aspect of the system, what is left behind acquires a meaning that reflects the aspect that has been factored out.

This is deep, but it is easy to see how it works with respect to HTML. If you find that any aspect of an HTML tag implies a particular aspect of a layout then this must be a layout that is intrinsic to the object and cannot be changed by CSS. Let's look at how this applies to the most controversial of tags - the table. If you have a table of data with 10 rows and 3 columns then the table object that you use to represent it has 10 rows and three columns. Hence the table is set to 10x3 in the HTML and it is unthinkable that the CSS could change this aspect of the presentation. It is clear that in this case the 10x3 table is a semantic entity and not something that should be shifted over to the CSS.

Now consider a table tag used to simply reformat a page to have two columns. The fact that the table has two columns is decided as part of the presentation logic and as such it should be something handed over to the CSS and not part of the HTML. However, part of the HTML, not the style sheet, is determining an aspect of the presentation, which is hence a failure of the separation of concerns.

So if you want to use a page with two columns this should be something that you can determine in CSS and not a part of the HTML. If you don't agree with this then you are rejecting the fundamental aim of placing presentation in the CSS. Not using a table for presentational reasons is only down to the need to use semantic markup because semantic markup is all that is left once you move presentation to the control of the CSS.

Put like this it all seems very reasonable. As a principle it is hard to argue with.

Of course there are some things you can complain about. Why is it that grid layout are still not obvious enough for a beginner to achieve without looking longingly at the forbidden table tag?

HTML5, or rather CSS3, has done a lot to make the layout problem easier. You can now have multiple columns within a block element, for example, a <div>, and these columns can even be automatically balanced. There is also a grid layout specification, but at the moment it is only in draft form. So for now the quickest and easiest way to a grid layout is still to use a table.

If you would like to be informed about new articles on I Programmer you can either follow us on Twitter or Facebook or you can subscribe to our weekly newsletter.

Last Updated ( Tuesday, 13 September 2011 )