python爬虫笔记(七)网络爬虫之框架(2)——Scrapy爬虫框架(实例1)

1. 第一个scrapy实例

1.1 建立一个Scrapy爬虫工程

scrapy startproject python123demo

1.2 在工程中产生一个scrapy爬虫

(1)生成一个demo的爬虫

scrapy genspider demo python123demo.io

 

1.3 配置产生的spider爬虫

# -*- coding: utf-8 -*-
import scrapy


class DemoSpider(scrapy.Spider):
    name = 'demo'
#    allowed_domains = ['python123.io']
    # 修改为需要访问的链接
    start_urls = ['http://python123.io/ws/demo.html']

    # 修改爬取方法
    # response: 网络中返回内容的对象
    def parse(self, response):
        fname = response.url.split('/')[-1]
        with open(fname, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s.' % fname)
        
        

1.4 运行爬虫,获取网页

(1)运行demo这个爬虫

scrapy crawl demo 

(成功)

2. 完整版配置代码

# -*- coding: utf-8 -*-
import scrapy


class DemoSpider(scrapy.Spider):
    name = 'demo'
    
    def start_requests(self):
        urls = [
                'http://python123.io/ws/demo.html'
                ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)
        
    # 修改爬取方法
    # response: 网络中返回内容的对象
    def parse(self, response):
        fname = response.url.split('/')[-1]
        with open(fname, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s.' % fname)
        
        

3. yield关键字

4. Scrapy爬虫的使用步骤

  

原文地址:https://www.cnblogs.com/douzujun/p/12247467.html