scrapy-shell

https://segmentfault.com/a/1190000013199636?utm_source=tag-newest
shell
启动
- Linux： ctr+T,打开终端，然后输入scrapy shell "url:xxxx"
- windows: scrapy shell "url:xxx"
- 启动后自动下载指定url的网页
- 下载完成后，url的内容保存在response的变量中，如果需要，我们需要调用response
response
- 爬取到的内容保存在response中给
- response.body是网页的代码
- resposne.headers是返回的http的头信息
- response.xpath（）允许使用xpath语法选择内容
- response.css()允许使用css语法选区内容
selector
- 选择器，允许用户使用选择器来选择自己想要的内容
- response.selector.xpath: response.xpath是selector.xpath的快捷方式
- response.selector.css: response.css是他的快捷方式
- selector.extract:把节点的内容用unicode形式返回
- selector.re:允许用户通过正则选区内容