Programmer's Introduction to XML
Written by Ian Elliot   
Thursday, 27 January 2022
Article Index
Programmer's Introduction to XML
Attributes & Checking XML
A Glossary

Where Next?

Now you know how XML works to describe data so that it can be exchanged, transferred and displayed in a completely universal way.

Of course we have just scratched the surface and things get even more interesting but detailed when we move into any one of the specific XML application areas. You not only have to deal with the ideas of XML but the additional specification that have been added. For a programmer using existing standards the best next step is to learn about the DOM API in what ever language you use the most.

 


A Short XML Glossary

Tag

An XML document contains pairs of opening and closing tags surrounding text that you can regard as the data. An opening tag is simply a word in angle brackets <start> and a closing tag identical but with the addition of a backslash. So the closing tag to <start> is </start>. It is also useful to know that you can’t include spaces within a tag.

Tags always occur in pairs unless you don’t need to include data between the tags, in which case you can indicate the closing of a tag by adding a backslash to the end. For example, <start/> is an opening tag and its own closing tag.

XML Documents

A valid XML document starts with a single tag and ends with a closing tag. All of the other tags within the document have to be nested between the opening and closing “top level” tags.

Attributes

Tags can have “attributes” within them to record information about the type of data between the tags or to modify the interpretation of the tags. Attributes always take the form name=value and you can invent attributes just as freely as you can invent XML tags.

Schema and DTD

A Schema, and the older technology a DTD, is a document that describes the grammar of an XML document. If you provide a schema and a document to an XML-aware application then in most cases the application will be able to work out if the XML document is correct or contains errors, even though it might not “know” anything more about the way you are using XML.

Name Spaces

One of the most mysterious parts of XML is the concept of a “name space”. The big problem with all systems that allow users to invent their own identifiers is that we tend to invent the same identifiers over and over again. For example, in many XML documents we are likely to invent a <name> tag but not all <name> tags are going to mean the same thing.

To avoid name clashes you can use a namespace declaration in the form of the xmlns attribute. To set a namespace for the entire document you might use something like:

 <book xmlns="http//www.mywebsite">

Following this the namespace

" http//www.mywebsite"

applies to everything contained within <book></book>, unless of course an inner tag declares its own namespace.

Notice that the namespace is a URL. The only reason for this is that you are supposed to possess a unique URL so no-one else will use it. There is no sense in which the URL has to correspond to a relevant web page (although it can), it’s just a tricky way of getting a unique identifier.

With a namespace applied you can think of every name between the tags that it applies to as being prefixed with the namespace

e.g. http// http//www.mywebsite:name

If you want to make explicit the namespace an identifier belongs to then you can actually write it as a qualified name namespace_prefix:name. The namespace_prefix is supposed to be unique, so all the names now used in the document are unique.

 


 

Related Articles

XML in C#

Linq and XML

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Programmer's Guide To Theory - Practical Grammar

Computational grammar is a subject that is sometimes viewed as a form of torture by computer science students, but understanding something about it really does help ....



Hexadecimal

Hexadecimal is the most common way of displaying the raw data sitting in a machine's memory, but if you are not familiar with it you might ask "What the hex..?"


Other Articles

 

 

<ASIN: 067232797X>

<ASIN: 0321559673>



Last Updated ( Friday, 28 January 2022 )