Html Agility Pack

Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes

HTML Parser

HTML Parser allow you to parse HTML and return an HtmlDocument.

Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

Example

The following example loads an Html from the specified string.

var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");

Console.WriteLine(htmlBody.OuterHtml);



HTML Selectors

Selectors allow you to select HTML node from HtmlDocument.

Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

HTML SelectSingleNode

SelectSingleNode Method

Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

Parameters:

xpath: The XPath expression. May not be null.

Returns:

The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

Examples

The following example selects the first node matching the XPath expression using SelectNodes method.

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

///如果用child.SelectSingleNode("//*[@class="titlelnk"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。

Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class="diggnum"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class="titlelnk"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class="post_item_summary"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class="post_item_foot"]").InnerText));

HTML Manipulation

Traversing allow you to traverse through HTML node.

Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.


HTML Traversing

Traversing allow you to traverse through HTML node.

Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name

HTML Writer

Save HtmlDocument && Write HtmlNode

HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.


HTML Utilities

HtmlDocument Utilities

HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.


HTML Attributes

Traversing allow you to traverse through HTML node.

Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

原文地址:https://www.cnblogs.com/micro-chen/p/8085024.html