爬虫基础知识

1.URL:URL是web页的地址,这种地址会在浏览器顶部附近的Location或者URL框内显示出来。

2.各种传输协议都有默认的端口;Http默认的端口是80

下载网页数据

import urllib.request     #导入一个包 
response = urllib.request.urlopen("http://www.baidu.com")  
#打开一个网站     将返回的对象返回给参数response
html = response.read()  #将读取的内容赋值给变量html
html = html.decode("utf-8") #将二进制内容转换成utf-8编码呈现出来
print(html) #将内容打印出来

模拟浏览器下载简单的图品

#下载一只猫的图品

import urllib.request

response = urllib.request.urlopen("http://placekitten.com/g/600/600")
cat_img = resopnse.read()

with open('cat.jpg','wb') as f:
    f.write(cat_img)

#有道翻译
import urllib.request
import urllib.parse
import json


message = input("请输入需要翻译的内容")
url = " http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=http://www.baidu.com/link"
data = {}
data['type'] = "AUTO"
data['doctype'] = "json"
data['keyfrom'] = "fanyi.web"
data['typoResult'] = "true"
data['ue'] = "UTF-8"
data['xmlVersion'] = "1.8"
data['i'] = "你好"
data = urllib.parse.urlencode(data).encode('utf-8')


response = urllib.request.urlopen(url,data)
html = response.read().decode('utf-8')


target = json.loads(html)
print("翻译结果:%s"%(target['translateResult'][0][0]['tgt']))




原文地址:https://www.cnblogs.com/bixiaopengblog/p/6338022.html