爬虫

单爬虫运行

import sys
from scrapy.cmdline import execute
 
if __name__ == '__main__':
    execute(["scrapy","crawl","chouti","--nolog"])

然后右键运行py文件即可运行名为‘chouti‘的爬虫

同时运行多个爬虫

步骤如下:

- 在spiders同级创建任意目录,如:commands
- 在其中创建 crawlall.py 文件 (此处文件名就是自定义的命令)
- 在settings.py 中添加配置 COMMANDS_MODULE = '项目名称.目录名称'
- 在项目目录执行命令:scrapy crawlall


代码如下:
 1 from scrapy.commands import ScrapyCommand
 2     from scrapy.utils.project import get_project_settings
 3  
 4     class Command(ScrapyCommand):
 5  
 6         requires_project = True
 7  
 8         def syntax(self):
 9             return '[options]'
10  
11         def short_desc(self):
12             return 'Runs all of the spiders'
13  
14         def run(self, args, opts):
15             spider_list = self.crawler_process.spiders.list()
16             for name in spider_list:
17                 self.crawler_process.crawl(name, **opts.__dict__)
18             self.crawler_process.start()
19  
20 crawlall.py
原文地址:https://www.cnblogs.com/zhaohuhu/p/9264216.html