第一次个人编程作业

第一次个人编程作业

我的github

计算模块接口的设计与实现过程

具体算法流程图如下

模块介绍

基本思想:余弦相似度算法 参考博客
one_hot用于onehot编码
    def one_hot(word_dict, keywords):  
         cut_code = [word_dict[word] for word in keywords]
        cut_code = [0]*len(word_dict)
        for word in keywords:
            cut_code[word_dict[word]] += 1
        return cut_code
def extract_keyword用于提取关键词
    def extract_keyword(content):  
        re_exp = re.compile(r'(<style>.*?</style>)|(<[^>]+>)', re.S)
        content = re_exp.sub(' ', content)
        content = html.unescape(content)
        seg = [i for i in jieba.cut(content, cut_all=True) if i != '']
        # 提取关键词
        keywords = jieba.analyse.extract_tags("|".join(seg), topK=200, withWeight=False)
        return keywords

计算模块接口部分的性能改进

消耗最大的部分


如图所示,main.py消耗最大

性能分析图

计算模块部分单元测试展示

测试结果:


基本都在0.8左右,上下浮动,较为符合预期。

部分测试代码:
if __name__ == '__main__':
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x1, open('F:/qq/sim_0.8/orig_0.8_add.txt', 'r',
                                                                          encoding="UTF-8") as y1:
        content_x1 = x1.read()
        content_y1 = y1.read()
        similarity = CosineSimilarity(content_x1, content_y1)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x2, open('F:/qq/sim_0.8/orig_0.8_del.txt', 'r',
                                                                          encoding="UTF-8") as y2:
        content_x2 = x2.read()
        content_y2 = y2.read()
        similarity = CosineSimilarity(content_x2, content_y2)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x3, open('F:/qq/sim_0.8/orig_0.8_dis_1.txt', 'r',
                                                                          encoding="UTF-8") as y3:
        content_x3 = x3.read()
        content_y3 = y3.read()
        similarity = CosineSimilarity(content_x3, content_y3)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x4, open('F:/qq/sim_0.8/orig_0.8_dis_3.txt', 'r',
                                                                               encoding="UTF-8") as y4:
        content_x4 = x4.read()
        content_y4 = y4.read()
        similarity = CosineSimilarity(content_x4, content_y4)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))

    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x6, open('F:/qq/sim_0.8/orig_0.8_dis_7.txt', 'r',
                                                                          encoding="UTF-8") as y6:
        content_x6 = x6.read()
        content_y6 = y6.read()
        similarity = CosineSimilarity(content_x6, content_y6)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x7, open('F:/qq/sim_0.8/orig_0.8_dis_10.txt', 'r',
                                                                          encoding="UTF-8") as y7:
        content_x7 = x7.read()
        content_y7 = y7.read()
        similarity = CosineSimilarity(content_x7, content_y7)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x8, open('F:/qq/sim_0.8/orig_0.8_dis_15.txt', 'r',
                                                                          encoding="UTF-8") as y8:
        content_x8 = x8.read()
        content_y8 = y8.read()
        similarity = CosineSimilarity(content_x8, content_y8)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x9, open('F:/qq/sim_0.8/orig_0.8_mix.txt', 'r',
                                                                          encoding="UTF-8") as y9:
        content_x9 = x9.read()
        content_y9 = y9.read()
        similarity = CosineSimilarity(content_x9, content_y9)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))
    with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x0, open('F:/qq/sim_0.8/orig_0.8_rep.txt', 'r',
                                                                          encoding="UTF-8") as y0:
        content_x0 = x0.read()
        content_y0 = y0.read()
        similarity = CosineSimilarity(content_x0, content_y0)
        similarity = similarity.main()
        print('相似度: %.2f%%
' % (similarity * 100))

计算模块部分异常处理说明

设计空白对比文档和完全一致的文档

  空白文档的结果:


没有异常。

完全一致文档的结果:


没有异常。
时间有限,暂时没有发现模块异常。

PSP表格如下

PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
Planning 计划 30 40
Estimate 估计这个任务需要多少时间 20 20
Development 开发 480 300
Analysis 需求分析 (包括学习新技术) 300 200
Design Spec 生成设计文档 60 30
Design Review 设计复审 30 20
Coding Standard 代码规范 (为目前的开发制定合适的规范) 30 30
Design 具体设计 60 60
Coding 具体编码 300 200
Code Review 代码复审 30 30
Test 测试(自我测试,修改代码,提交修改) 120 90
Reporting 报告 60 80
Test Repor 测试报告 30 20
Size Measurement 计算工作量 30 15
Postmortem & Process Improvement Plan 事后总结, 并提出过程改进计划 40 35
Total 合计 1620 1140

小总结

  第一次做这种作业,没有经验,难度有点高。只能在网上论坛上找找别人的东西,参考了很多才完成作业。自己还是有很多不足,希望以后再接再厉。
原文地址:https://www.cnblogs.com/gaoyichao/p/13682075.html