XML in C#
Tuesday, 30 November 2010
Article Index
XML in C#
Content
Getting at the data

 

Banner

The value of it all

XML is about a standard format for data and just occasionally we need to actually get at the data.

As already mentioned, if an XElement has a single XText child node then you can access this data as a string using the Value property. However, if it has multiple XText child objects, perhaps contained within other XElement child objects, then the Value property contains a concatenation of these strings.

This makes processing data contained within tags a matter of navigating down to the XElement object which corresponds to the last pair of tags that enclose the text in question. Again, as already mentioned, XAttributes are easier to deal with in this sense because they always correspond to a single name value pair. In this case the Value property always contains the data we are interested in.

For simplicity the following examples will deal with XAttributes but the same methods work with XElement objects.

You could just assign a new value to the Value property but it is usually easier to use the SetValue method because this performs automatic type conversions.

For example:

XAttribute Att1 =new XAttribute("Epoc",2000);
Att1.Value = "2008";

works and sets the attribute to a string “2008”.

However:

Att1.Value = 2008;

generates a runtime error because you need an explicit conversion.

For example:

Att1.Value = 2008.ToString();

The good news is that:

Att1.SetValue(2008);

works without the need for an explicit conversion, as does:

Att1.SetValue(DateTime.Now);

Going the other way is almost as easy but you do have to use an explicit cast and don’t need to  use Value.

For example:

DateTime T = Att1.Value;

doesn’t work, neither does;

DateTime T = (DateTime) Att1.Value;

but, perhaps surprisingly:

DateTime T = (DateTime) Att1;

does.

You can cast and XAttribute to any numeric, bool, DateTime, TimeSpan and Guid. You can also cast to nullable versions of each of these data types, which can make handing missing data easier.

Converting to real XML

There is more to XML than a node tree.

You can add many of the XML “decorations” by wrapping the root XElement node in an XDocument object. This can have only one XElement object as the root but also a single XDeclaration, single XDocumentType and any number of XProcessingInstruction and XComment objects.

As you can easily work out, each of these objects adds a corresponding “meta” tag to the XML that the XDocument represents. There are various, very reasonable, rules about when declarations are emitted, default declarations, and other bookkeeping concerns – all very obvious.

Namespaces are also fairly simple but deserve a simple example. All XNames are created by default with an empty namespace. There are lots of different ways of adding a namespace specifier but the most useful is via the XNamespace class.

For example, to add a namespace to a name you could use:

XNamespace ns = "http://www.i-programmer.info";
XName fullname = ns + "root";

Notice that an XName has LocalName, NameSpace, and NameSpaceName properties to allow you to work more creatively with XML names. Also remember that all strings that you use in XElement, XAttribute etc names are automatically converted to XName objects.

If you use a namespace then you have to explicitly include it when creating each element of the tree and you have to use it when searching for elements with specific names.

That's more or less all there is to working with XML. Next we look at how to manipulate it using Linq see: Linq and XML.

If you would like to be informed about new articles on I Programmer you can either follow us on Twitter, on Facebook , on Digg or you can subscribe to our weekly newsletter.

Banner


What's The Matter With Pointers?

Back in the days when C was the language of choice, pointers meant programming and vice versa. Now in the more sophisticated and abstract days of C#, and even C++, raw pointers are a facility that is  [ ... ]



C# Bit Bashing - The BitConverter

Is C# a high-level or a low-level language? It doesn't really matter - all languages are low-level when you are thinking in terms of bits, and sometimes you just can't avoid thinking in bits.


Other Articles

<ASIN: 0596002521>

<ASIN:193435645X>


<ASIN:0470191376>

XML, which is all about tree structures, and Linq, which is all about querying collections, might not seem to fit together but Mike James explains that they work together just fine.

 

Linqing to XML

BY MIKE JAMES

 

Linq isn’t just about SQL and it isn’t even just about database. After looking last month in some detail at the basic idea behind Linq, it is instructive to examine probably its second most common application - working with XML.

 

There always was good XML support in .NET but Linq adds a set of classes that makes it easier to work with XML, particularly if you’re not an XML specialist. There are a number of standard protocols and ways of working with XML – Xpath, SAX, DOM and so on. All of them are good but they all focus on some specific particular aspect of XML and a particular way of getting the job done. Linq’s version of XML goes “back to basics”. It is important to realise that much of what we are about to investigate can be used without any reference to Linq – it happens to be a good way to work with XML and Linq is almost just a bonus.

 

Even if you aren’t interested in working with XML looking at how Linq handles a more complicated data structure, a tree in this case, is instructive and has a lot to teach you about the way Linq is designed, how it works and how you might extend it to other data structures.

 

 Xelement

 

The core of XML is the tag as in

 

 an opening tag <Record>

 a closing tag </Record>

 

The rules for XML are simple – tags occur, almost always, in matched pairs and you can nest tags as if they were brackets. The only exception to the matched pairs rule is a tag that is its own closing tag – as in <Record/> which opens and closes the tag in one go.

 

It’s not difficult to see that you can use tags to build a general tree structure and all you need to represent it in a program is a class that has a collection of itself as a property. This is exactly how the xNode class, and the more useful xElement descended from it via xContainer, operates. The important point is that xElement has a Nodes collection which can be used to store an element’s child elements. A simple example will make this clear.

 

First we need a root node:

 

XElement root = new XElement("Record");

 

The string “Record” is automatically converted to an XName object and this is used to set the Name property of the new XElement. An XName is used instead of a simple string because XML names have some additional behaviour because of name spaces – more later.

 

Now have a root for our tree let’s create a leaf node

 

XElement child1 = new XElement("Name");

 

and hang it off the tree….

 

root.Add(child1);

 

If you place a textbox on a form you can see the XML that the tree represents using:

 

textBox1.Text = root.ToString();

 

What you will see is:

 

<Record>

     <Name />

</Record>

 

You can carry on in the same way to build up a tree of any complexity you like. For example:

 

XElement root=new XElement("Record");

XElement child1=new XElement("Name");

root.Add(child1);

XElement child2=new XElement("First");

XElement child3=new XElement("Second");

child1.Add(child2);

child1.Add(child3);

XElement child4=new XElement("Address");

root.Add(child4);

 

creates the following XML:

 

<Record>

     <Name>

          <First />

          <Second />

     </Name>

     <Address />

</Record>

 

The idea of nesting XElements within XElements is fairly obvious but there are neater ways of achieving the same result. For example, you can combine the two Add methods into a single call:

 

child1.Add(child2,child3);

 

The reason this works is due to an overload of Add not mentioned in the documentation:

 

public void Add(params object[] content);

 

You can, of course construct a list of child objects to insert into multiple XElements if you want to.

 

Another style of XML tree construction is based on the use of the XElement constructor. One overloaded version allows you to specify the XElements content. So to create an XElement with two children you would use:

 

XElement root = new XElement("Record",

      new XElement("Name"),

      new XElement("Address"));

 

You can continue this nested construction to any level you need to. For example, the following creates the same XML tree we had earlier:

 

XElement root = new XElement("Record",

      new XElement("Name",

      new XElement("First"),

            new XElement("Second")),

            new XElement("Address"));

 

This is generally referred to as “functional construction” and if you format it correctly then it looks like the tree it is constructing and it has the advantage that you can pass it directly to any method that cares to make use of it. Of course in this style of construction you don’t get variables to keep track of each node but in most cases you don’t need them.

 

There are two additional very easy ways of converting XML into an XElement tree – the static Load and Parse methods. Load will take a file specification as a URI or as a TextReader or XmlReader and parse the text stream into an XElement tree. The Parse method does the same but by accepting a string of XML tags. For example, to construct the same XML tree given earlier:

 

string XML = @"<Record>

      <Name>

            <First />

            <Second />

      </Name>

      <Address />

</Record>";

XElement root= XElement.Parse(XML) ;

 

If the XML you try to load or parse is syntactically incorrect then you will have to handle the resulting exception.  If you want to go the other way then there is a Save method and you can specify options on the Save and the ToString methods that control some aspects of formatting.

 

Content

 

Now you can see how to build an XElement tree but what about content? A tree of XML tags isn’t usually the whole story. The main data payload in XML is suppose to be anything you put between opening and closing tags. In the XElement tree any text between tags is stored as an XText node in node collection i.e. it is just another type of child node. As such you can add XText nodes using all of the methods described earlier – if you try to add a string object as a child then the methods simply convert it to an XText object and add it to the collection of nodes. For example:

 

XElement root = new

                  XElement("Record","Addreess Record",

      new XElement("Name",

            new XElement("First","Mike"),

            new XElement("Second")),

      new XElement("Address"));

 

This adds “Address Record” and “Mike” as text between the specified tags. It is also worth knowing that if an XElement object has only a single XText child then this is also its string Value property.

 

Text between the tags isn’t the only sort of data carried by XML. You can also specify any number of name value pairs within the tags themselves as attributes. Attributes are supposed to be used as “metadata”, i.e. they describe the nature of the data between the tags - formatting, time zone, context etc - and within the XElement tree they are stored as XAttribute objects within the Attributes collection. The reason for this is that if you enumerate the tree you will list each of the tags and text between tags but not the attributes. Once you have found a tag, i.e. an XElement object, you can enumerate its attributes. Viewed in this way it should be clear that the main distinction between attributes and other nodes is that attributes are not part of the tree structure being simply stored at a node.

 

The rule that you have to keep in mind is that if you add an XAttribute object to an XElement object then it will automatically be added to the Attributes collection. For example:

 

XAttribute Att1 =

      new XAttribute("Epoc", 2000);

root.Add(Att1);

 

adds an attribute Epoc= “2000” to the Attributes collection. The XML generated reads:

 

<Record Epoc="2000"> …<Record>

 

You can achieve the same result using the function method of constructing an XML tree:

 

XElement root = new XElement("Record",

            new XAttribute("Epoc", 2000),

      new XElement("Name",

            new XElement("First","Mike"),

            new XElement("Second")),

      new XElement("Address"));

 

What the constructor or the Add method does with an object you supply to it depends on the object’s type. The rules are:

 

·        If it’s null, nothing happens.

·        If it’s descended from XNode then it is added to the Nodes collection.

·        If it’s an XAttribute then it is added to the Attributes collection

·        If it’s a string then it is converted to XText and added to Nodes.

 

Less obvious behaviour is that if you supply an object that implements IEnumerable then it is enumerated and the resulting objects are treated as above and if an object is anything that can’t be handled by the above rules it’s converted to a String and then to an XText object and added to Nodes. The IEnumerable behaviour is very handy because it means you can add a collection of objects in one step. 

 

Manipulating the tree

 

Constructing a tree is usually just the beginning of the problem. You usually want to process, i.e. examine and change the tree as a result. As an XElement tree is dynamic this is fairly easy. You can of course use Add to add additional elements of any kind and there are also two remove methods – RemoveNode and RemoveAttribute.  These simply remove the object that they belong to from the parents collection. For example:

 

root.Add(child4);

child4.Remove();

 

This first adds child4 to root’s node collection and then removes it.

 

A particularly useful method is SetElementValue as this will modify an elements value and create a new object if it doesn’t already exist. For example, the instruction:

 

root.SetElementValue("Tel","123");

 

will set an existing XElement child of root with XName “Tel” to a value of “123” or if such a XElement doesn’t exist it will first create a new instance. There is some apparently odd behaviour here in that the value “123” is applied between a new pair of tags <Tel>123</Tel> but it is also appended to root’s Value property as if it was new text located between the <root></root>.

 

The reason for this is that the Value property is the concatenation of all the XText objects contained within its opening and closing tags. This makes the Value property often less than helpful. Notice that if you add some additional text in the form of a string then it is simply added to any existing XText object as a concatenation. If you first explicitly create XText objects and add these then you get new child objects – which seems reasonable behaviour.

 

The SetAttributeValue method works in much the same way but on the Attribute collection. For example:

 

root.SetAttributeValue("Epoc", "2008");

 

updates or adds an Epoc attribute.  As an attribute generally has only one value, its Value property is much more useful.

 

Another useful pair of methods are AddBeforeSelf and AddAfterSelf which, as their names suggest, allow the adding of elements above and below the current element in the tree. There are lots of other methods that modify the tree structure but they are all fairly obvious and contain no surprises – check the documentation for details.

 

The value of it all

 

XML is about a standard format for data and just occasionally we need to actually get at the data. As already mentioned, if an XElement has a single XText child node then you can access this data as a string using the Value property. However, if it has multiple XText child objects, perhaps contained within other XElement child objects, then the Value property contains a concatenation of these strings. This makes processing data contained within tags a matter of navigating down to the XElement object which corresponds to the last pair of tags that enclose the text in question. Again, as already mentioned, XAttributes are easier to deal with in this sense because they always correspond to a single name value pair. In this case the Value property always contains the data we are interested in. For simplicity the following examples will deal with XAttributes but the same methods work with XElement objects.

 

You could just assign a new value to the Value property but it is usually easier to use the SetValue method because this performs automatic type conversions. For example:

 

XAttribute Att1 =

            new XAttribute("Epoc", 2000);

Att1.Value = "2008";

 

works and sets the attribute to a string “2008”.

However:

 

Att1.Value = 2008;

 

generates a runtime error because you need an explicit conversion. For example:

 

Att1.Value = 2008.ToString();

 

The good news is that:

 

Att1.SetValue(2008);

 

works without the need for an explicit conversion, as does:

 

Att1.SetValue(DateTime.Now);

 

Going the other way is almost as easy but you do have to use an explicit cast and don’t need to  use Value. For example:

 

DateTime T = Att1.Value;

 

doesn’t work, neither does;

 

DateTime T = (DateTime) Att1.Value;

 

but, perhaps surprisingly:

 

  DateTime T = (DateTime) Att1;

 

does. You can cast and XAttribute to any numeric, bool, DateTime, TimeSpan and Guid. You can also cast to nullable versions of each of these data types, which can make handing missing data easier.

 

Converting to real XML

 

There is more to XML than a node tree. You can add many of the XML “decorations” by wrapping the root XElement node in an XDocument object. This can have only one XElement object as the root but also a single XDeclaration, single XDocumentType and any number of XProcessingInstruction and XComment objects. As you can easily work out, each of these objects adds a corresponding “meta” tag to the XML that the XDocument represents. There are various, very reasonable, rules about when declarations are emitted, default declarations, and other bookkeeping concerns – all very obvious.

 

Namespaces are also fairly simple but deserve a simple example. All XNames are created by default with an empty namespace. There are lots of different ways of adding a namespace specifier but the most useful is via the XNamespace class. For example, to add a namespace to a name you could use:

 

XNamespace ns = "http://www.vsj.co.uk";

XName fullname = ns + "root";

 

Notice that an XName has LocalName, NameSpace, and NameSpaceName properties to allow you to work more creatively with XML names. Also remember that all strings that you use in XElement, XAttribute etc names are automatically converted to XName objects. If you use a namespace then you have to explicitly include it when creating each element of the tree and you have to use it when searching for elements with specific names.

 

Linq

 

So far everything described is completely non-Linq-specific and can be used to work with XML in other contexts. Now we turn our attention to the Linq aspects of the new XML support. If you recall from last month’s article, Linq is a very simple idea with a fairly simple implementation. Linq queries are provided by extension methods applied to objects that implement the generic IEnumerable interface. In this case the main objects that we have been examining don’t implement IEnumerable but some of their method’s return objects do. This is a slight expansion of the Linq idea, but a fairly obvious one. For example, the Elements method returns an IEnumerable supporting collection of all  the child elements of the object. This means you can write a foreach loop to step through each of the child elements:

 

foreach( XElement ele in root.Elements())

{

      textBox1.Text += ele.ToString();

}

 

You can also make use of the usual Linq extension methods – although this isn’t the most common way of explaining how Linq to XML works. For example, you can use the Where method to filter the collection of child nodes:

 

var q = root.Elements().

            Where<XElement>(E=>E.Name=="Address");

foreach( XElement ele in q)

{

      textBox1.Text += ele.ToString();

}

 

which, of course, selects only those child elements that are called “Address”. You can chain together a set of Linq extension methods to produce something more complicated and you can use the syntactic shortcuts introduced into C# to make it even easier. For example the previous query can be written as:

 

var q = from E in root.Elements()

      where  E.Name == "Address"

      select E;

 

and the compiler translates it back into the method calls. If you understand the general workings of Linq then the only new element is using a method, i.e. Elements, that returns an IEnumerable collection rather than an object that implements IEnumerable. This may appear to be a small difference but it does alter the “flavour” of using Linq ever so slightly. The point is that the XML tree is quite a complicated data structure and there are lots of different ways that its nodes or attributes could be enumerated. This is the reason why it doesn’t just implement the IEnumerable interface in its own right and why it is preferable to delegate the enumeration to other methods – called in the Linq to XML jargon, XML Axis methods.

 

This small difference gives us a lot of power but it can also be confusing because it often provides more than one way of doing things. For example, most Linq to XML instructors would not demonstrate finding an XElement with a specific name using the Where method. The reason is simply that the Elements method comes with the ability to construct a collection of child nodes that are restricted to a single name. For example, you can return a collection of elements named “Address” in one simple step:

 

var q=root.Elements("Address");

 

No need for Linq proper here as the axis method does the job of picking out the specific objects and returns them as a collection. Notice, however, that this isn’t returned as a standard collection type. The axis method adheres to the “deferred” execution model of Linq by returning an XContainer.GetElements  type which is only enumerated when the enumeration is really needed. 

 

Another slightly confusing issue that is solved by Axis methods is determining which type of object needs to be returned. For example:

 

  var q = root.Attributes();

 

is a query that returns all of the attributes set on the root object. Once you have constructed the query you can step through it in the usual way using a foreach loop.

 

Most of the Axis methods allow the user to specify some simple filtering conditions that often mean that you don’t need to use a full Linq query at all. Some Axis methods are so specific that they return a single element. For example, FirstNode and LastNode return the first and last node respectively. Similarly Element(“name”) returns the first matching element which should be contrasted with Elements(“name”) which returns all child elements that match. As well as working with sequences of elements that go “down” the tree you can work back up to the top most level using the “Ancestor” methods. For example:

 

var q = root.LastNode.Ancestors();

 

returns a collection of all of the elements in the tree by, perversely, starting at the last node and extracting all of its ancestors.

 

Now what about querying sub-trees? This is very easy and almost doesn’t need any thought. All you have to do is find the node that is the root of the sub-tree and use its Descendants method. For example:

 

var q=root.Element("Name").Descendants();

 

This returns all of the child nodes below the Name XElement in the tree, i.e. First and Second in our earlier example. Notice that Descendants is “recursive” in the sense that it returns all of the child nodes of the first node specified, then the child nodes of each of those and so on. The order in which the child nodes are returned is described as “document” order, i.e. the order in which the tags appear when the XML is listed down a page.

 

Notice that if you use a Linq query to return an element you automatically get its “deep” value – i.e. all of the child nodes it contains. In this sense the query:

 

var q2 = from E in root.Elements()

      where E.Name == "Name"

      select E;

 

returns a sub-tree starting at the XElement “Name”. It is slightly different to the previous example because it includes the “Name” node and not just the sub-tree below it.

 

You can also chain Axis methods just as you can chain standard Linq methods. For example:

 

var q =

      root.Element("Name").Attributes();

 

finds the first element that matches “Name” and then returns a sequence of its attributes, if any.

 

Some things are much easier to do with axis methods which are designed to work with a tree structure and some are easier using standard Linq queries which are designed to work with flat collections. Sometimes a combination of the two works even better. For example consider:

 

var q = from E in root.Elements()

      where (E.Element("First")!=null)

      select E;

 

This selects all of the elements that have at least one “First” child element. Again as a deep value is returned, you actually get the subtree below any node that has a “First” child node.

 

Once you start to follow the relentless logic of IEnumerable and its Linq methods it becomes almost fun to try and work out the most “interesting” way of obtaining a result. Not necessarily good programming practice but a good way to master the techniques.

 

Select and the projection

 

The Select part of a Linq expression normally works as a projection operator that “reduces” the size of the data structure returned by a query. For example, it can be use to select which “columns” are returned from a SQL query which otherwise would return a complete record. For example, the query

 

var q = from E in root.Elements()

      where E.Name == "Name"

      select E;

 

returns a deep copy in the sense that the XElement corresponding to Name brings with it an entire subtree. Suppose you want a “shallow” copy, i.e. just the node and its text value. In this case we can project the XElement to a string that contains all of the XText in the subtree:

 

var q = from E in root.Elements()

      where E.Name == "Name"

      select E.Value;

 

However, projection can be used to create a larger or completely different type using the data returned by the query. For example, you could perform a Linq to SQL query to return some values and then package them up into an XML tree. To understand how this works all you really have to do is focus on the role of the “range” variable. It is set to each of the objects that match the selection criterion and eventually this can be used to construct any other data type. For example, suppose you wanted to reconfigure the XML tree that stored the first and second names in our example into a different XML tree. You could do the job using a slightly more advanced select projection:

 

var q = from E in root.Elements()

      where E.Name == "Name"

      select new XElement("NameRecord",

            new XElement("Name1",

                        E.Element("First").Value),

            new XElement("Name2",

                        E.Element("Second").Value)

                            );

 

In this case we take each of the selected elements and build a new XML tree using their values and it is a collection of these new XML trees that are returned. While this isn’t a particularly useful transformation it gives you the basic idea, i.e. use the result of the query to build new types, and you can generalise it to other situations. In more complex transformations it can even be useful to work out intermediate results to be used later within the select clause.

 

To create a variable within a Linq expression you simply use the let keyword. For example the previous query can be re-written:

 

var q = from E in root.Elements()

  where E.Name == "Name"

   let z1=E.Element("First")

   let z2 = E.Element("Second")

  select

   new XElement("NameRecord",

     new XElement("Name1",z1.Value),

     new XElement("Name2",z2.Value));

 

In this case the let keywords create two new variables z1 and z2 which are used to hold the child elements that match the two sub-queries. Notice that the result is exactly the same if you don’t use the temporary variables.

 

Notice that you can even use the select to modify the existing XML tree using SetValue and so on but – be warned a tree is a complicated structure and you need to be sure that what you are trying to do can be done in every case. In most cases it is much better to use the functional approach to build a new tree as part of the select clause. Notice that this is made even more powerful by the simple fact that the constructors can accept IEnumeration sequences and will automatically iterate through all of the objects thus adding them to the new tree. For example, consider:

 

var q = from E in root.Elements()

  where E.Name == "Name"

  select

   new XElement("NameRecord", root.Attributes(),

     new XElement("Name1",

       E.Element("First").Value),

     new XElement("Name2",

       E.Element("Second").Value));

 

Notice that the use of root.Attributes adds all of the attributes in the collection to the new XElement corresponding to the tag <NameRecord>.

 

Linq, and Linq to XML in particular, provides so many ways of doing things that it can leave you feeling slightly queasy about the whole thing and it is certain that you can write code that is deep and impenetrable – don’t. It is nearly always true that simple is better than clever.

 

 

Dr. Mike James’ programming career has spanned many languages, starting with Fortran. The author of Foundations of Programming and he has always been interested in the latest developments and the synergy between different languages.

 

 



Last Updated ( Monday, 06 December 2010 )
 
 

   
RSS feed of all content
I Programmer - full contents
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.