scrapy爬取图片时,出现 ValueError:Missing scheme in request url:h错误

2021-03-12 23:34:24 [scrapy.core.scraper] ERROR: Error processing {'article_name': '灰塔的黎明',
 'article_path': 'D:/data/python/23us',
 'article_url': 'https://www.23us.com/html/76/76788/',
 'article_url_id': '1a8113fbd8d765f4d58002506ece2c2a',
 'article_webaddr': 'https://www.23us.com/book/76788',
 'author': 'xa0湖中羊',
 'chapter_fullnum': 'xa03191844字',
 'chapter_lastest_name': ' 第五百三十一章 活着的城市',
 'chapter_lastest_url': 'https://www.23us.com/html/76/76788/',
 'chapter_num': '湖中羊',
 'collect_num': 'xa023',
 'content_validity': '嘿!对,就是你!你这么行色匆匆的要去哪里啊?哦,我知道,我知道生活不易,不过也别太拼命了。你问我在这里干什么?哈哈,我只是坐在这里,讲一些老掉牙的故事,关于巫师,巨龙……你知道的,那些曾经在我们梦里出现过的东西。嘿,你猜怎么着,如果你不那么着急的话,为什么不坐下来听听它们呢?我虽然自认不是个好的说书人,可我敢保证这故事我绝对用了心!来听听吧,也许,它能让你重新梦到,那些早就被我们忘了的……传奇。书友群:193123031欢迎前来催稿',
 'front_image_url': 'https://www.23us.com/files/article/image/76/76788/76788s.jpg',
 'full_click_num': 'xa0551',
 'full_recommend_num': '3',
 'image_list': 'https://www.23us.com/files/article/image/76/76788/76788s.jpg',
 'mon_click_num': 'xa016',
 'mon_recommend_num': '1',
 'novel_classify': '玄幻魔法',
 'status': 1,
 'update': 'xa02021-03-12',
 'webaddr': 'https://www.23us.com',
 'webaddr_id': 'c2affe5b45bdf9396163dec0fdcea696',
 'webname': '23us',
 'week_click_num': 'xa04',
 'week_recommend_num': '3'}
Traceback (most recent call last):
  File "D:datapythonenvironmentzhaopinlibsite-packages	wistedinternetdefer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "D:datapythonenvironmentzhaopinlibsite-packagesscrapyutilsdefer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "D:datapythonenvironmentzhaopinlibsite-packagesscrapypipelinesmedia.py", line 88, in process_item
    dlist = [self._process_request(r, info, item) for r in requests]
  File "D:datapythonenvironmentzhaopinlibsite-packagesscrapypipelinesmedia.py", line 88, in <listcomp>
    dlist = [self._process_request(r, info, item) for r in requests]
  File "D:datapythonzhaopinWEB-INFArticleSpiderArticleSpiderpipelines.py", line 51, in get_media_requests
    yield Request(image_url,meta={'webname': item['webname'], 'jpg_num': jpg_num,
  File "D:datapythonenvironmentzhaopinlibsite-packagesscrapyhttp
equest\__init__.py", line 25, in __init__
    self._set_url(url)
  File "D:datapythonenvironmentzhaopinlibsite-packagesscrapyhttp
equest\__init__.py", line 73, in _set_url
    raise ValueError(f'Missing scheme in request url: {self._url}')
ValueError: Missing scheme in request url: h

原因:因为在settings.py存储图片,其ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 301}用到的是图片的url列表,而在Spider类中返回的是一个url字符串,所以ITEM_PIPELINES参数在执行循环获取url列表时,出现了只获取到了字符串的h,也就是上述的错误

解决办法:一定要注意图片url要使用双中括号,即红方框中的内容

原文地址:https://www.cnblogs.com/laonicc/p/14527069.html