An Extensive Examination of LINQ: Querying and Searching XML Documents Using LINQ to XML

Introduction
XML is an increasingly popular way to encode documents, data, and electronic messages. Over the years Microsoft has offered a variety of libraries to facilitate creating, modifying, querying, and searching XML documents. LINQ to XML is a relatively new set of XML-related classes in the .NET Framework (found in the System.Xml.Linq namespace), which enable developers to work with XML documents using LINQ's features, syntax, and semantics. As discussed in an earlier article, Introducing LINQ to XML, LINQ to XML is a simpler and easier to use API than previous libraries. Because LINQ to XML can utilize LINQ's query syntax and assortment of standard query operators, LINQ to XML code is usually very terse and readable.

This article continues our look at LINQ to XML. Specifically, we explore how to query XML documents using axis methods as well as how to search and filter XML documents using both LINQ's Where method and XPath expressions. Read on to learn more!

Retrieving Child Elements
As discussed in Introducing LINQ to XML, the most frequently used class in the LINQ to XML API is the XElement class, which represents an XML element. It's Load method loads an XML document form disk or over the Internet and returns the root of the just-loaded document. The most frequently used class in the LINQ to XML API is the XElement class, which represents an XML element. This class is used when programmatically constructing an XML document, when loading an XML document, and when searching, filtering, or otherwise enumerating the elements within an XML document. The Value property returns concatenated text contents of the element and the text content of its descendants.

When working with an XML document we are often interested in a particular element or attribute value or a particular subset of elements and attribute values. The XElement object has a number of helpful methods that we can use to retrieve such data. Let's start by looking at two of the most commonly used methods, Elements and Element. The Elements method returns all of the child elements of the current element. You can optionally pass in an element name and then only those children element with a matching name are returned. The Element method requires a name as an input parameter and then returns the first child element with that name.

The Elements and Element methods - along with a number of other methods we'll be examining in this article - are referred to as axis methods and operate relative to the current node. To hammer home this point, let's look at an example. For this example and others in this article I will be using an XML file named NutritionInfo.xml. This XML file can be found in the App_Data folder in the demo available for download at the end of this article.

The NutritionInfo.xml document contains nutritional information about a variety of food items. Here is a snippet of this XML document:

<nutrition>
   <daily-values>
      <total-fat units="g">65</total-fat>
      <saturated-fat units="g">20</saturated-fat>
      <cholesterol units="mg">300</cholesterol>
      <sodium units="mg">2400</sodium>
      <carb units="g">300</carb>
      <fiber units="g">25</fiber>
      <protein units="g">50</protein>
   </daily-values>

   <food>
      <name>Avocado Dip</name>
      <mfr>Sunnydale</mfr>
      <serving units="g">29</serving>
      <calories total="110" fat="100"/>
      <total-fat>11</total-fat>
      <saturated-fat>3</saturated-fat>
      <cholesterol>5</cholesterol>
      <sodium>210</sodium>
      <carb>2</carb>
      <fiber>0</fiber>
      <protein>1</protein>
      <vitamins>
         <a>0</a>
         <c>0</c>
      </vitamins>
      <minerals>
         <ca>0</ca>
         <fe>0</fe>
      </minerals>
   </food>

   ...
</nutrition>

The <nutrition> element is the root element and contains a single child element named <daily-values>, which spells out the recommended daily allotments for the various nutritional metrics provided by each <food> item. Note that there is only one <daily-values> element. Following this sole <daily-values> element there are a number of <food> elements that spell out the nutritional information for a number of food items. The snippet above shows a single <food> element describing the nutritional information for Avocado Dip.

Now, imagine that we wanted to retrieve the name of the first food item in the XML document. To accomplish this we'd need to start by loading the XML document. Recall that the Load method returns the root of the document as an XElement object (in this example, <nutrition>).

// C#
var root = XElement.Load(Server.MapPath("~/App_Data/NutritionInfo.xml"));

Now that we have a reference to the root we can get the first <food> element using the following syntax:

// C#
var firstFoodElement = root.Element("food");

This syntax says, in English, "Get me the root's first child element named <food>." (If there are no <food> child elements then root.Element("food") will return null.) Once we have the first <food> element we can get its <name> child element using the same syntax:

// C#
var nameElement = firstFoodElement.Element("name");

Note that to get the <name> element we call the firstFoodElement XElement object's Element method. Had we accidentally used root.Element("name") we'd get back a null value because the root element does not have any <name> children elements (it only has <daily-value> and <food> child elements).

Now that we have the <name> element (of the first <food> element) we can get its text value ("Avocado Dip", in this example) by using the Value property.

// C#
var theActualNameOfTheFirstFoodItem = nameElement.Value;

Reading Attributes
Another important class in the LINQ to XML API is the XAttribute class, which represents an XML attribute. The XElement class has two methods that return XAttribute values:

Attribute(attributeName) - returns an XAttribute object for a specific attribute, and
Attributes - two overloads; the first accepts no input parameters and returns all attributes of the XElement; the second overload accepts an attribute name and returns a collection of attributes of the XElement with a matching name.

And like XElement, The XAttribute class has a Value property, which returns the value of the attribute.

Let's look at using the Attribute method to retrieve calorie information for the first food item (Avocado Dip). The NutritionInfo.xml specifies calorie information using a <calories> element with two attributes - total and fat - which return the total calories and the calories from fat, respectively. To retrieve these values programmatically we could use the following code:

// C#
var totalCal = root.Element("food").Element("calories").Attribute("total").Value;
var fatCal = root.Element("food").Element("calories").Attribute("fat").Value;

This syntax, I think, it pretty readable. For example, to get the total calories for the first food item we say, "Hey, root, give me your first <food> element and then, from that, give me the first <calories> element and then from that get the total attribute and then give me its value. In the case of Avocado Dip, this returns a value of "110".

While the above syntax is quite terse and readable, it does make a number of presumptions - namely that there will be at least one <food> child item from the root and that that <food> item will have a <calories> child and that the <calories> element will have a total attribute specified. If any of these elements or attributes are missing the above code will throw a NullReferenceException because if no match is found the Element and Attribute methods return null. To more safely query the XML document you would need to get the pieces one at a time and ensure that a null value was not returned; the code in the demo available for download has a sample of this more careful syntax.

Returning Descendant and Ancestor Elements
The Element and Elements methods only search the set of children elements. For XML document specifying a hierarchical structure, such as the XML format of the Web.sitemap file, there may be elements with the same name buried at arbitrary depths. To search across all descendants for the current node (and not just children) use the Descendants method. The Descendants method has two overrides. The first accepts no input parameters and returns all descendant nodes. The second accepts a name and returns only those descendants whose name matches.

The following snippet of code shows how to use the Descendants method to determine how many <siteMapNode> elements exist in the Web.sitemap file. (If you are unfamiliar with the Web.sitemap file it is an XML-formatted file that developers can create to define a logical structure to their site. Once defined, navigation web controls like the Menu or TreeView can be used to display this site structure. The Web.sitemap file is composed of an arbitrary number of <siteMapNode> elements, where each <siteMapNode> element represents a section on the site. These elements can be (and often are) nested. See Examining ASP.NET's Site Navigation for more information on this file and ASP.NET's site map functionality.)

// C#
// Load the site map...
var root = XElement.Load(Server.MapPath("~/Web.sitemap"));

// Define the namespace used by the Web.sitemap file...
XNamespace siteMapNS = "http://schemas.microsoft.com/AspNet/SiteMap-File-1.0";

// Count the number of <siteMapNode> elements...
var totalSections = root.Descendants(siteMapNS + "siteMapNode").Count();

The code here is a little bit more involved than previous examples because the Web.sitemap uses XML namespaces. If you examine the Web.sitemap you'll find that its root element (<siteMap>) defines a namespace named "http://schemas.microsoft.com/AspNet/SiteMap-File-1.0":

<siteMap xmlns="http://schemas.microsoft.com/AspNet/SiteMap-File-1.0">
   ...
</siteMap>

Querying an XML document that uses namespaces requires that the namespaces be included in the querying syntax. This is accomplished by creating an XNamespace object that specifies the namespace name and then including it as part of the name in the XElement's methods. In the above example this is accomplished by creating an XNamespace object named siteMapNS and then including it when calling the Descendants method: root.Descendants(siteMapNS + "siteMapNode").

Along with the Descendants method, the XElement also offers an Ancestors method. This method is the inverse of Descendants - rather than returning the nodes (or matching nodes) beneath the element it returns the parent node, the grandparent node, and so forth, all the way up to the root. See the demo available for download for a demo using the Ancestors method.

Searching / Filtering an XML Document
Because the LINQ to XML API gives us full access to LINQ's standard query operators, searching or filtering an XML document is very straightforward. As discussed in previous installments of this article series, the Where extension method can operate on an enumeration and filter certain elements out of that enumeration using lambda expressions.

For example, use the following Where clause to retrieve only those food items with less than 300 total calories:

// C#
var root = XElement.Load(Server.MapPath("~/App_Data/NutritionInfo.xml"));

var lowCalorieFoods = root.Elements("food")
                          .Where(f => Convert.ToDecimal(f.Element("calories").Attribute("total").Value) < 300M);

The code here says, in English, "Give me all <food> child elements off the root and then only return those whose <calories> element's total attribute has a value less than 300." Bear in that in the lambda expression in the Where method we are dealing with <food> elements; in other words, each f here is an XElement that represents a particular <food> element in the XML document. Consequently, to retrieve the calorie information for each <food> element we use f.Element("calories") to get a reference to the <calories> element and then Attribute("total").Value to get the value of the total attribute. The XAttribute class's Value property returns a string, so we need to convert this string into a decimal value in order to compare it to a numeric value, in this case 300.

In addition to searching and filtering XML documents using the LINQ standard query operators you can use XPath expressions. XPath is a standardized syntax for filtering XML documents. To filter documents using XPath expressions use the XPathSelectElements method, which is an extension method defined in the System.Xml.XPath namespace. The following example uses an XPath expression to return only those food items with less than 300 calories:

// C#
var root = XElement.Load(Server.MapPath("~/App_Data/NutritionInfo.xml"));

var lowCalorieFoods = root.XPathSelectElements("/food[calories/@total <= 300]")

Personally, I prefer using LINQ's standard query operators. Using the standard query operators and lambda expressions you get IntelliSense and compile-time checking. Moreover, the same standard query operators can be used with LINQ to Objects, LINQ to SQL, LINQ to Entity Framework, or any other LINQ providers. XPath expressions, on the other hand, are an opaque string. There is no compile-time checking - you need to actually execute the code to see if the XPath expression is valid and returns the expected results. And XPath's syntax is specific to XML.

Check out the demo for more searching and filtering code examples. The demo includes a web page that allows the user to search for food items that meet a variety of criteria, including upper bounds for the calories, grams of fat, and milligrams of sodium, as well as the presence of certain vitamins or minerals. The screen shot below shows this page from the demo in action and includes code showing how to filter using the standard query operators and using XPath expressions.

Users can search for foods that meet specific nutritional values.

Looking Forward...
At this point we have examined how to create, query, and filter XML documents using the LINQ to XML API. In a future installment we'll see how to edit existing XML documents by modifying existing values and by adding and removing XML elements.

Until then... Happy Programming!

魔兽就是毒瘤，大家千万不要玩。