python 前置,后置正则例子

2010-07-09

python模拟登录及表单提交

文章分类:Python编程

本文将实现从CSDN博客获取内容发布至百度博客，分别实践抓取博客内容、模拟登录、表单提交。在下文代码中间逐一阐述。

Python代码 
# -*- coding: utf-8 -*-  
import re  
import urllib  
import urllib2  
import cookielib  
  
#获取CSDN博客标题和正文  
url = "http://blog.csdn.net/[username]/archive/2010/07/05/5712850.aspx"  
sock = urllib.urlopen(url)  
html = sock.read()  
sock.close()  
content = re.findall('(?<=blogstory">).*(?=<p class="right artical)', html, re.S)  
content = re.findall('<script.*>.*</script>(.*)', content[0], re.S)  
title = re.findall('(?<=<title>)(.*)-.* - CSDN.*(?=</title>)', html, re.S)  
#根据上文获取内容新建表单值  
blog = {'spBlogTitle': title[0].decode('utf-8').encode('gbk'), #百度博客标题  
        'spBlogText': content[0].decode('utf-8').encode('gbk'),#百度博客内容  
        'ct': "1",  
        'cm': "1"}  
del content  
del title  
  
#模拟登录  
cj = cookielib.CookieJar()  
#用户名和密码  
post_data = urllib.urlencode({'username': '[username]', 'password': '[password]', 'pwd': '1'})  
#登录路径  
path = 'https://passport.baidu.com/?login'  
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))  
opener.addheaders = [('User-agent', 'Opera/9.23')]  
urllib2.install_opener(opener)  
req = urllib2.Request(path, post_data)  
conn = urllib2.urlopen(req)  
  
#获取百度发布博客的认证令牌  
bd = urllib2.urlopen(urllib2.Request('http://hi.baidu.com/[username]/creat/blog')).read()  
bd = re.findall('(?<=bdstoken\" value=\").*(?=ct)', bd, re.S)  
blog['bdstoken'] = bd[0][:32]  
#设置分类名  
blog['spBlogCatName'] = 'php'  
#比较表单发布博客  
req2 = urllib2.Request('http://hi.baidu.com/[username]/commit', urllib.urlencode(blog))  
  
#查看表单提交后返回内容  
print urllib2.urlopen(req2).read()  
  
#请将[username]/[password]替换为您的真实用户名和密码  
#搞定收工……  

PHP操作MongoDB | Tkinter开发Symfony命令辅助工具(1)

14:47
浏览 (232)
评论 (0)
分类: python
相关推荐

python 前置,后置正则例子

python模拟登录及表单提交

评论