代理操作

下载中间件作用: 拦截请求,可以将请求的ip进行更换

流程:

   1.下载中间件类的自制定

    a) object

    b) 重写process_request(self, request, spider)的方法

  2.配置文件中进行下载中间价的开启

  middlewares.py

# -*- coding: utf-8 -*-

# Define here the models for your spider middleware
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals


class middleadd(object):

    def process_request(self, request, spider):
        request.meta["proxy"] = "157.65.31.220:3128"

settings.py里开启中间件

spider/midtest.py

import scrapy


class MidtestSpider(scrapy.Spider):
    name = 'midtest'
    # allowed_domains = ['www.baidu.com']
    start_urls = ["https://www.baidu.com/s?wd=ip"]

    def parse(self, response):
        fp = open("record.html", "w",encoding="utf-8")
        fp.write(response.text)

获取免费代理从 www.goubanjia.com

原文地址:https://www.cnblogs.com/cjj-zyj/p/10144106.html