第一篇随笔:用VB.NET搞点简单事情(1)

网络上能搜索到的爬虫文章大多是用python做的,也有少部分是C#做的(小声:所以用VB.NET也可以做爬虫.本文写的是第一步:获取网页)

使用代码前先imports以下内容

Imports System.IO, System.IO.Compression, System.Text, System.Net

写程序前先开浏览器(我用的Chrome),随便上个网页,F12看下header,粘下来useragent备用,也可以粘下accept,cookie等(在本文中用不到

用httpwebrequest建立请求,用httpwebresponse得到响应体.然后考虑下压缩的问题(imports System.IO.Compression就是解决这个的)

最后得到真正的返回流,streamreader读取之,然后网页的http代码就搞下来了.用这种方法可以搞定编码为UTF-8的网页对于编码是GB2312或GBK的需有改动:使用streamreader时第二个参数改为Encoding.GetEncoding("gbk")

下面是代码:

 1 Public Function GetHttpContent(url As String) As String
 2         Try
 3             Dim req As HttpWebRequest = HttpWebRequest.CreateHttp(url), resp As HttpWebResponse, sol$
 4             With req
 5                 .UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
 6                 .Accept = "*/*"
 7                 .Method = "GET"
 8                 .Timeout = 300000
 9                 .Headers.Add("accept-encoding", " gzip, deflate")
10             End With
11             resp = req.GetResponse
12             Select Case resp.ContentEncoding.ToLower
13                 Case "gzip"
14                     Using z As New GZipStream(resp.GetResponseStream, CompressionMode.Decompress)
15                         Using sr As New StreamReader(z, Encoding.UTF8)
16                             sol = sr.ReadToEnd
17                         End Using
18                     End Using
19                     Exit Select
20                 Case "deflate"
21                     Using z As New DeflateStream(resp.GetResponseStream, CompressionMode.Decompress)
22                         Using sr As New StreamReader(z, Encoding.UTF8)
23                             sol = sr.ReadToEnd
24                         End Using
25                     End Using
26                     Exit Select
27                 Case Else
28                     Using sr As New StreamReader(resp.GetResponseStream, Encoding.UTF8)
29                         sol = sr.ReadToEnd
30                     End Using
31                     Exit Select
32             End Select
33             Return sol
34         Catch ex As Exception
35             Return ""
36         End Try
37     End Function

(本人水平有限,代码有不完善的地方欢迎指出

原文地址:https://www.cnblogs.com/woshilxcdexuesheng/p/11414764.html