Programmer's Introduction to XML
Written by Ian Elliot   
Article Index
Programmer's Introduction to XML
Practical XML
Checking XML
A Glossary

XML is a general purpose markup language that can be used to control the structure of data. Despite the fact that many prefer the simplicity of JSON it still has many advantages. What makes it so good?

 

Banner

 

XML is very similar to HTML and this is not surprising as they both stem from same base technology - SGML Standard Generalized Markup Language. XML was designed to be both human and machine readable and as a result it can be verbose and not as compact a way of packaging data as you might desire. It can also be criticized for being too much to type in but in most cases XML generation and consumption is done by programs with humans only getting involved when things go wrong.

If you know HTML then you will certainly recognize XML. There is a sense in which XML is a general purpose version of HTML. You use HTML to markup web pages but you can use XML to markup and give structure to any data you care to think of. There was even an attempt to make HTML a subset of XML i.e. XHTML but this has been abandoned in favor of HTML5.

Even though XML may not have taken over the data markup world as it was intended to it still has enough advantages to be preferred in many situations and this means that as a programmer you should know something about it. It does have a reputation for being difficult to process from scratch but the fact is that most languages have XML libraries that make working with it just a matter of making the right function calls.

So let's see what it is all about..

Tags, nothing but tags

EXtensible Markup Language, to give XML its full title, is a way of indicating where different parts of a document start and end.

For example, if you were keeping a list of your favorite books you might well use something like:

 Title: Life of Pi
Author: Yann Martel
Publisher: Canongate

You are using the convention that a colon separates a “field name” that describes and identifies the data from the actual data. XML uses a different, but just as obvious, convention to do the same thing. It uses field names enclosed in “angle brackets”, or tags, and in this case the convention is that the actual data is between an opening and closing tag.

If you know HTML this will seem obvious and as in the case of HTML5 XML's tags are to be interpreted as providing the semantic structure of the data. That is the tags tell you what the data is about not how it should be treated. Of course if you know what the data is about, its semantics, then what you do with it often follows immediately.

For example, the same book data in XML would be something like:

 <Title> Life of Pi </Title>
<Author> Yann Martel </Author >
<Publisher >Canongate</Publisher>

You can see from this that to each opening tag there is a closing tag of the same name but starting with /.

The beauty of this system is that the layout of the document doesn’t make any difference and data can include line breaks without any problem. For example, this version of the document means exactly the same thing as the previous one:

 <Title>
Life of Pi
</Title>
<Author>
Yann Martel
</Author >
<Publisher >
Canongate
</Publisher>

The whole point is that XML can represent the structure in the data without it having to be laid out in any particular way.

For example, you could read the XML document out letter by letter over a phone connection and the person at the other end could write it down as one long string of text. It still means the same thing and the ability to “serialize” XML documents makes it possible to store them on disk or transfer them byte-by-byte, or even bit-by-bit, over a network without any special processing.

Nesting

Things can be a little more interesting than the example above suggests because you can use tags within the content of other tags.

In particular, to be strictly correct, the XML example given earlier needs an outer pair of tags that enclose everything. That is:

 <Books>
<Title>
Life of Pi
</Title>
<Author>
Yann Martel
</Author >
<Publisher >
Canongate
</Publisher>
</Books>

Notice that the indenting has been used to show clearly that all of the other tags are within the <Books></Books> pair. By now you already know that layout is irrelevant to the meaning of an XML document, but it helps to make it readable.

An XML document always starts with tag that encloses everything – a so-called “top level” tag.

You can repeat the “tag within tag” idea as often as you like and this one of the many things that makes XML powerful. It also makes XML look more complicated than it is.

This nesting structure means that XML is capable of representing any data that forms a hierarchy or tree structure.

 

<ASIN: 1840783370>

<ASIN: 0596007647>



 
 

   
RSS feed of all content
I Programmer - full contents
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.