蜘蛛中可以用到的正则收集

1，得到网页上的链接地址：
      string matchString = @"<a[^>]+href=\s*(?:'(?<href>[^']+)'|""(?<href>[^""]+)""|(?<href>[^>\s]+))\s*[^>]*>";
2，得到网页的标题：
      string matchString = @"<title>(?<title>.*)</title>";
3，去掉网页中的所有的html标记：
      string temp = Regex.Replace(html, "<[^>]*>", "");   //html是一个要去除html标记的文档
4, string matchString = @"<title>([\S\s\t]*?)</title>";

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/xinzhyu/p/1207815.html

推荐文章
git 删除时报 the branch is not fully merged 这是什么意思
git log退出方法
git 分支管理推送本地分支到远程分支等
搭建服务器上的GIT并实现自动同步到站点目录（www）
怎样从本地删除git远程仓库里面的文件
如何在git中删除指定的文件和目录
Laravel5.1 路由 -路由分组
laravel .env 文件的使用
华尔街最“伟大”骗子排行榜！
漫谈 Greenplum 开源背后的动机
阿里云大数据三次技术突围：Greenplum、Hadoop和“飞天”
7款优秀的开源数据挖掘工具
数据挖掘中分类算法小结
数据挖掘入门——分词
HBase介绍及简易安装（转）
HttpClient的CircularRedirectException异常原因及解决办法
org.apache.http.client.CircularRedirectException: Circular redirect to "http://xxx"问题解决
python 局部变量和全局变量 global
inline-block和同级的text-align问题
How to choose from Viewstate, SessionState, Cookies and Cache
C#中的转义字符verbatim string
how to use Inspector in fiddler
how to use composer in fiddler
CodeWars上的JavaScript技巧积累
What's the difference between using “let” and “var” to declare a variable in JavaScript?
Loop through an array in JavaScript
Why does typeof array with objects return “Object” and not “Array”?
Owin and Startup class
Qt Widgets、QML、Qt Quick的区别
飞舞的蝴蝶（GraphicsView框架）