数据结构

为元祖中的每个元素命名,提高程序可读性:

学生信息系统中的数据为固定格式:
(名字,年龄,性别,邮箱地址,...)

学生数量很大为了减小存储,对每个学生信息用元祖表示:
("Jim",16,"male","jim8721@gmail.com")
("LiLei",17,"male","leili@qq.com")
...

访问时可以使用索引(index)访问,但大量索引会降低程序可读性

解决方式一:(类似枚举类型,即定义一系列数值常量)

student = ("Jim",16,"male","jim8721@gmail.com")

# 常规方法:
# student[0] 为 name
# student[1] 为 age
# ...


#  解决方法:
name,age,gender,email = range(4)  #  利用列表拆包的方式给每个column赋值  # range(4) 不需要list

#  以后再找name的时候可利用 student[name] 的方法

方式二:(使用标准库中的collections.namedtuple替代内置tuple)

待续。。。

统计序列中元素出现的频度:

1. 某随机序列[12,5,21,6,5,7,8,5,...]中,找出出现次数最多的3个元素,他们出现的次数是多少?

方式一:利用dict.fromkeys()创建一个新字典

from random import randint
"""先随机生成一个数字列表"""
data = [randint(1,10) for i in range(30)]
print(data)
"""
目标是统计列表元素出现的频率,最终放入一个类似于{10:2,8:1,...}的字典中(key是列表中的元素,value是出现的次数};
所以先创建一个以“data列表中的元素作为字典key”的字典,字典的每个value先定义成0
"""
count_dict = dict.fromkeys(data,0)

print(count_dict)

"""循环列表data中的元素以统计出现频率,并把出现次数写入count_dict这个字典中"""
for x in data:
    count_dict[x] += 1

print(count_dict)

# 输出结果:
# [6, 4, 8, 5, 3, 4, 9, 2, 3, 2, 4, 9, 5, 2, 1, 10, 6, 9, 9, 6, 8, 6, 4, 3, 2, 6, 1, 9, 3, 5]
# {6: 0, 4: 0, 8: 0, 5: 0, 3: 0, 9: 0, 2: 0, 1: 0, 10: 0}
# {6: 5, 4: 4, 8: 2, 5: 3, 3: 4, 9: 5, 2: 4, 1: 2, 10: 1}

附:上述方法没输出出现次数最多的3个元素,想要得到这个结果可以利用下面的“根据字典中值的大小,对字典中的项进行排序” 

方式二:利用Counter

from random import randint
from collections import Counter

data = [randint(1,10) for i in range(30)]

"""利用Counter统计各元素的出现频率"""
count_dict = Counter(data)
print(count_dict)

"""利用Counter的 .most_common(数字)来统计出现次数"""
top_count = count_dict.most_common(3)   # .most_common(3)的意思是:找出count_dict中出现次数最多的3个元素
print(count_dict[6])    # top_count[6]是指: 返回6这个元素出现的次数
print(top_count)

#  输出结果:
#  Counter({6: 5, 4: 5, 9: 4, 8: 4, 10: 3, 3: 2, 5: 2, 2: 2, 7: 2, 1: 1})
#  5
# [(6, 5), (4, 5), (9, 4)]

2. 对某英文文章的单词,进行词频统计,找出出现次数最多的10个单词,他们出现的次数是多少?

有名为“introduction.txt”的文件内容如下:

While The Python Language Reference describes the exact syntax and semantics of the Python language, this library reference manual describes the standard library that is distributed with Python. It also describes some of the optional components that are commonly included in Python distributions.
Python’s standard library is very extensive, offering a wide range of facilities as indicated by the long table of contents listed below. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Some of these modules are explicitly designed to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.
The Python installers for the Windows platform usually include the entire standard library and often also include many additional components. For Unix-like operating systems Python is normally provided as a collection of packages, so it may be necessary to use the packaging tools provided with the operating system to obtain some or all of the optional components.

代码如下:

import re
from collections import Counter

data = open("introduction.txt","r",encoding="utf-8").read()   # 把文件内容全读取到内存
data_list = re.split("W+",data,flags=re.IGNORECASE) # 以特殊字符为分隔符把data split分成一个个单词,忽略大小写
data_count = Counter(data_list)
top_count = data_count.most_common(10)
print(top_count)

# 输出结果:
# [('the', 11), ('Python', 10), ('of', 8), ('that', 6), ('library', 5), ('in', 5), ('as', 5), ('to', 5), ('The', 3), ('describes', 3)]

根据字典中值(value)的大小,对字典中的项(key,value)进行排序:

 方式一:利用zip将字典数据转化成元祖

from random import randint

"""先随机生成一个字典,后面对这个字典进行操作"""
name_score = {i:randint(70,100) for i in "xyzabc"}

""" 把key和value分别放到两个列表中 """
name = name_score.keys()
score = name_score.values()

"""把上述两个列表中的值一一对应起来放入一个小元祖内,value在前key在后,因为元祖比较大小时先比较第一个元素,第一个元素相等时再比较后面元素的值"""
score_name = list(zip(score,name))

"""对score_name这个列表中的小元祖进行排序"""  # 元祖进行大小比较,如: (96,"a") > (69,"b")
score_sort = sorted(score_name,reverse=True)   # reverse=True 是指倒序排序,即从大到小

print(score_sort)

# 输出结果:
# [(100, 'x'), (93, 'z'), (90, 'c'), (78, 'y'), (77, 'a'), (71, 'b')]


方式二:传递sorted函数的key参数

from random import randint
"""利用字典生成式,随机创建一个字典"""
name_score = {i:randint(70,100) for i in "xyzabc"}

"""利用dict.items()把字典元素放入一个个小元祖内"""
name_score_tuple = name_score.items()

"""利用sorted的key参数定义排序的依据(key)"""
score_sort = sorted(name_score_tuple,key=lambda x:x[1])   #  key=lambda x:x[1] 的含义: 每次迭代name_score_tuple里面的元素时,把迭代对象中的每一个元素作为x传入到key后面的lambda函数里面,然后让我们自己定义x中的哪一个部分作为sorted排序时的比较大小的依据

print(score_sort)

#  输出结果:
#  [('a', 73), ('z', 76), ('c', 79), ('b', 87), ('x', 88), ('y', 96)]

 附: sorted(字典) 只会按照字典中的key进行排序;字典在循环迭代的时候,只是在迭代key,不是value。

原文地址:https://www.cnblogs.com/neozheng/p/8474045.html