Page 1 of 2
XML, which is all about tree structures, and Linq, which is all about querying collections, might not seem to fit together but they work together just fine.
Linq isn’t just about SQL and it isn’t even just about database. After looking in some detail at the basic idea behind Linq, it is instructive to examine probably its second most common application - working with XML.
The basics of working with the new XML facilities are covered in XML in C# and for this article it is assumed that you know all about the basic XML facilities.
Linq is a very simple idea with a fairly simple implementation - see The LINQ principle.
Linq queries are provided by extension methods applied to objects that implement the generic IEnumerable interface.
In the case of XML the main objects don’t implement IEnumerable but some of their methods return objects that do.
This is a slight expansion of the Linq idea, but a fairly obvious one.
For example, the Elements method returns an IEnumerable supporting collection of all the child elements of the object.
This means you can write a foreach loop to step through each of the child elements:
foreach( XElement ele in root.Elements())
textBox1.Text += ele.ToString();
You can also make use of the usual Linq extension methods – although this isn’t the most common way of explaining how Linq to XML works.
For example, assuming we have an XML tree something like:
you can use the Where method to filter the collection of child nodes:
var q = root.Elements().
foreach( XElement ele in q)
textBox1.Text += ele.ToString();
which, of course, selects only those child elements that are called “Address”.
You can chain together a set of Linq extension methods to produce something more complicated and you can use the syntactic shortcuts introduced into C# to make it even easier.
For example the previous query can be written as:
var q = from E in root.Elements()
where E.Name == "Address"
and the compiler translates it back into the method calls.
If you understand the general workings of Linq then the only new element is using a method, i.e. Elements, that returns an IEnumerable collection rather than an object that implements IEnumerable.
This may appear to be a small difference but it does alter the “flavour” of using Linq ever so slightly.
The point is that the XML tree is quite a complicated data structure and there are lots of different ways that its nodes or attributes could be enumerated. This is the reason why it doesn’t just implement the IEnumerable interface in its own right and why it is preferable to delegate the enumeration to other methods – called in the Linq to XML jargon, XML Axis methods.
This small difference gives us a lot of power but it can also be confusing because it often provides more than one way of doing things.
For example, most Linq to XML instructors would not demonstrate finding an XElement with a specific name using the Where method. The reason is simply that the Elements method comes with the ability to construct a collection of child nodes that are restricted to a single name.
For example, you can return a collection of elements named “Address” in one simple step:
No need for Linq proper here as the axis method does the job of picking out the specific objects and returns them as a collection.
Notice, however, that this isn’t returned as a standard collection type. The axis method adheres to the “deferred” execution model of Linq by returning an XContainer.GetElements type which is only enumerated when the enumeration is really needed.
Another slightly confusing issue that is solved by Axis methods is determining which type of object needs to be returned.
var q = root.Attributes();
is a query that returns all of the attributes set on the root object. Once you have constructed the query you can step through it in the usual way using a foreach loop.
Most of the Axis methods allow the user to specify some simple filtering conditions that often mean that you don’t need to use a full Linq query at all.
Some Axis methods are so specific that they return a single element.
For example, FirstNode and LastNode return the first and last node respectively. Similarly Element(“name”) returns the first matching element which should be contrasted with Elements(“name”) which returns all child elements that match.
As well as working with sequences of elements that go “down” the tree you can work back up to the top most level using the “Ancestor” methods. For example:
var q = root.LastNode.Ancestors();
returns a collection of all of the elements in the tree by, perversely, starting at the last node and extracting all of its ancestors.