字符串、文件操作和英文词频统计预处理

作业要求详情：https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/2646

1.字符串操作：

解析身份证号：生日、性别、出生地等。

代码如下：

#仅限广东省广州市内
str="""
440000     广东省 
440100 　　广州市
440103 　　荔湾区
440104 　　越秀区
440105 　　海珠区
440106 　　天河区
440111 　　白云区
440112 　　黄埔区
440113 　　番禺区
440114 　　花都区
440115 　　南沙区
440116 　　萝岗区
440183 　　增城区
440184 　　从化区
"""
ID = input('请输入十八位身份证号码: ')
if len(ID) == 18:
    print("你的身份证号码是 " + ID)
else:
    print("错误的身份证号码")

ID_add = ID[0:6]
ID_birth = ID[6:14]
ID_sex = ID[14:17]
ID_check = ID[17]

print("出生地方："+str[str.find(ID_add)+9:str.find(ID_add)+12])
year = ID_birth[0:4]
moon = ID_birth[4:6]
day = ID_birth[6:8]
print("生日: " + year + '年' + moon + '月' + day + '日')

if int(ID_sex) % 2 == 0:
    print('性别：女')
else:
    print('性别：男')

截图：

凯撒密码编码与解码

代码如下：

plaincode=input('')
for i in plaincode:
    print(chr(ord(i)+3),end='')
plaincode=input('')
s=ord('a')
t=ord('z')
for i in plaincode:
    if s<= ord(i)<=t:
        print(chr(s+(ord(i)-s+3)%26), end='')
    else:
        print(i,end='')

截图：

网址观察

代码如下：

for i in range(2,4):
 url = 'http://news.gzcc.cn/html/xiaoyuanxinwen/{}.html'.format(i)
print(url)

截图：

批量生成

代码如下：

import webbrowser as  web  #引入第三方库，并用as取名
url='http://news.gzcc.cn/html/xiaoyuanxinwen/'
web.open_new_tab(url)
for i in range(2,4):
    web.open_new_tab('http://news.gzcc.cn/html/xiaoyuanxinwen/'+str(i)+'.html')

截图：

2.英文词频统计预处理

下载一首英文的歌词或文章或小说，保存为utf8文件。
从文件读出字符串。
将所有大写转换为小写
将所有其他做分隔符（,.？！）替换为空格
分隔出一个一个的单词
并统计单词出现的次数。

代码如下：

file=open(r'E:Big Big World.txt','r',encoding='UTF-8')
news=file.read()
file.close()
sep=''',.?!":()'''
for i in sep:
    news=news.replace(i,' ')
wordList=news.lower().split()
wordDict={}
wordSet=set(wordList)
wordCutSet={'i','we','the','you','of','in','and','that'}
wordSet=wordSet-wordCutSet
for w in wordSet:
    wordDict[w]=wordList.count(w)
sortWord=sorted(wordDict.items(),key=lambda e:e[1],reverse=True)
save=open(r'E:Big Big World..txt','w',encoding='UTF-8')for w in range(20):
    save.write(str(sortWord[w])+"
")
save.close()

截图：