python数据结构

课程内容：

list 和 tuple 的运用
str 的操作
dict 和 set 的运用

1.list 和 tuple 的运用

list

list（列表）是Python内置的一种数据类型，list是一种有序的集合，而且可以对其中的元素进行增加、删除等一系列操作。

那么，这里定义一个list，来存放同学的名字：

>>> students = ['Eric','Jack','Michael']
>>> students
['Eric', 'Jack', 'Michael']

变量 students 就是一个list，可以通过下标访问列表中的元素，下标从0开始计数：

>>>students[0]
'Eric'
>>>students[1]
'Jack'
>>>students[-1] #还可以倒着取，从-1开始
'Michael'
>>>students[-2]
'Jack'
>>>students[3]
Traceback (most recent call last):
  File "<stdin>", line 18, in <module>
IndexError: list index out of range

当下标超出了范围时，Python会报一个IndexError错误，所以，要确保下标不要越界。

用len()函数可以获得list元素的个数：

>>>len(students)
3

下面对列表进行一系列的操作

追加（到最后面）：append

>>>students.append('Bob')
>>>students
['Eric','Jack','Michael','Bob']
>>>students.append('最后一个')
>>>students
['Eric','Jack','Michael','Bob','最后一个']

View Code

插入（到指定位置）：insert

>>>students
['Eric', 'Jack', 'Michael', 'Bob', '最后一个']
>>>students.insert(2,'我要当第二')
>>>students
['Eric', 'Jack','我要当第二', 'Michael', 'Bob', '最后一个']

View Code

删除del、remove、pop

>>> del students[2]  #删除指定下标元素
>>> students
['Eric', 'Jack', 'Michael', 'Bob', '最后一个']
>>> students.remove('Jack')  #删除指定元素
>>> students
['Eric', 'Michael', 'Bob', '最后一个']
>>> students.pop()  #删除列表最后一个值，并返回该值
'最后一个'
>>> students
['Eric', 'Michael', 'Bob']
>>> students.pop(1)  ##删除指定下标元素，并返回该值
'Michael'
>>> students
['Eric', 'Bob']

View Code

扩展与合并

>>> students=['Eric','Jack','Michael']
>>> L=[1,2,3]
>>> students.extend(L)   #扩展：将L添加到students的最后
>>> students
['Eric', 'Jack', 'Michael', 1, 2, 3]
>>>
>>>
>>>
>>> students=['Eric','Jack','Michael']
>>> L=[1,2,3]
>>> L+students   #合并：有前后顺序
[1, 2, 3, 'Eric', 'Jack', 'Michael']
>>> students+L
['Eric', 'Jack', 'Michael', 1, 2, 3]

View Code

替换

>>> students=['Eric','Jack','Michael']
>>> students[1]='Sarah'   #指定下标赋值
>>> students
['Eric', 'Sarah', 'Michael']

View Code

多维列表（嵌套）

>>> p=['C++','C#']
>>> language=['C',p,'Java','PHP','Python']
>>> language
['C', ['C++', 'C#'], 'Java', 'PHP', 'Python']

要拿到'C#'可以写p[1]或者language[1][1]，因此language可以看成是一个二维数组，类似的还有三维、四维……数组，不过很少用到。

>>> language[1][1]
'C#'

统计：count

>>> students=['Eric','Jack','Michael','Bob','Jack',11,12]
>>> students.count('Jack')   #统计Jack的数量
2

View Code

排序和翻转：sort & reverse

>>> students=['Eric','Jack','Michael','Bob','Jack',11,12]
>>> students.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'str'   #不同数据类型不能放在一起排序
>>> students[-1]='12'
>>> students[-2]='11'
>>> students
['Bob', 'Eric', 'Jack', 'Jack', 'Michael', '11', '12']
>>> students.sort()
>>> students
['11', '12', 'Bob', 'Eric', 'Jack', 'Jack', 'Michael']
>>>
>>>
>>> students.reverse()   #翻转
>>> students
['Michael', 'Jack', 'Jack', 'Eric', 'Bob', '12', '11']

View Code

获取下标

>>> students
['Michael', 'Jack', 'Jack', 'Eric', 'Bob', '12', '11']
>>> students.index('Jack')
1   #只返回找到的第一个下标

View Code

复制：copy

>>> students=['Eric','Jack','Bob','Sarah','Michael']
>>> students1=students.copy()
>>> students1
['Eric', 'Jack', 'Bob', 'Sarah', 'Michael']

View Code

copy没那么简单，详细请转 Python 列表深浅复制详解

tuple

另一种有序列表叫元组：tuple。tuple和list非常类似，但是tuple一旦初始化就不能修改，比如同样是列出同学的名字：

>>> students = ('Eric','Jack','Michael')

现在，students这个tuple不能变了，它没有增加、插入、修改、删除元素、排序的操作，只有count()和index()的操作。

不可变的tuple有什么意义？因为tuple不可变，所以代码更安全。如果可能，能用tuple代替list就尽量用tuple。

tuple的注意点：在定义的时候，tuple的元素就必须被确定下来，比如：

>>>t=(6,8)
>>>t
(6,8)

但要定义只有一个元素的tuple时，如果你这样定义：

>>> t=(6)
>>> t
6

定义的不是tuple，是1这个数！这是因为括号()既可以表示tuple，又可以表示数学公式中的小括号，这就产生了歧义，因此，Python规定，这种情况下，

按小括号进行计算，计算结果自然是1。

所以，只有1个元素的tuple定义时必须加一个逗号,，来消除歧义：

>>> t=(6,)
>>> t
(6,)

Python在显示只有1个元素的tuple时，也会加一个逗号,，以免你误解成数学计算意义上的括号。

2.字符串的操作

>>> str='pyTHON'
>>> str.capitalize()  #返回一个首字母大写的字符串。
'Python'
>>>a='1aPPLE'
>>>a.capitalize()    #首字符如果是非字母，首字母不会转换成大写，会转换成小写。
'1apple'
>>> str.center(20,'-')
'-------pyTHON-------'
>>> str.casefold()  #=str.lower()，所有字母变小写
'python'
>>> str.count('T')   #统计字符个数
1
>>> str.encode()    #编码为指定的bytes
b'pyTHON'
>>> str.find('T')   #返回指定字符的下标
2
>>> str.find('A')    #若没有该字符返回-1
-1
>>> str.index('H')   #返回指定字符的下标
3
>>> str.index('A')    #若没有该字符则报错
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> str.isdigit()   #判断字符串是否只由数字组成
False
>>>b='123'
>>> b.isdigit()
True
>>> str.upper()    #将所有字母替换大写
'PYTHON'
>>> c='   pyt  hon'
>>> c.strip()   #移除字符串头尾指定的字符（默认为空格或换行符）或字符序列，不能删除中间部分的字符
'pyt  hon'

3.dict 和 set 的运用

dict

Python内置了字典：dict的支持，dict全称dictionary，在其他语言中也称为map，使用键-值（key-value）存储。

这里就不再做铺垫，直接来定义一个字典：（物品—价格）

>>> shop={'shoes':240,'T-shit':160,'pants':210}
>>> shop['pants']
210

直接根据商品名字查找价格，无论这个表有多大，查找速度都不会变慢。

为什么dict的查找速度这么快？

为了回答这个问题，我们先来看下list：如果列表越大，那么它的查找速度就越慢，因为列表是从第一个元素依次向后查找。

而字典则是根据给定的key值直接计算出对应value值的位置，直接取出即可。

这就好像两个人查新华字典一样，一个人是一页一页的翻着找，另一个人则直接根据偏旁部首锁定该字的页码。

增加：

>>> shop['hat']=60
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 60}

View Code

字典中一个key只能对应一个value，如果多次对一个key放入value，之前的值会被覆盖(相当于修改):

>>> shop['scarf']=120
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 60, 'scarf': 120}
>>> shop['scarf']=130
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 60, 'scarf': 130}

View Code

如果key不存在，dict就会报错

>>> shop['skirt']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'skirt'

View Code

要避免key不存在的错误，有两种办法，一是通过in判断key是否存在；二是通过dict提供的get()方法，如果key不存在，可以返回None，或者自己指定的value。

>>> 'skirt' in shop
False
>>>
>>>
>>> shop.get('skirt')   #注意：返回 None 的时候Python交互环境不显示结果
>>> shop.get('skirt',-1)
-1

View Code

删除：pop & del

>>> shop.pop('pants')    #删除并返回value(推荐用pop)
210
>>> shop
{'shoes': 240, 'T-shit': 160, 'hat': 60, 'scarf': 130}


>>> del shop['shoes']    #删除没有返回值
>>> shop
{'T-shit': 160, 'hat': 60, 'scarf': 130}

View Code

多级字典嵌套及操作

dic={
   '河南':{
         '郑州':
             ['金水区', '二七区'],
         '洛阳':
             ['涧西区', '洛龙区'],
         '信阳':
             ['浉河区', '平桥区']
    },
   '山东':{
          '济南':
              ['槐荫区', '历下区'],
          '菏泽':
              ['牡丹区', '定陶区'],
          '淄博':
              ['临淄区', '淄川区']
    },
   '湖北':{
         '武汉':
              ['江汉区', '汉阳区'],
          '咸宁':
              ['咸安区', '赤壁市'],
          '黄冈':
              ['黄州区', '鄂城区']
    }
}

View Code

5个方法

#values(以列表返回字典中的所有值)
>>> shop={'shoes':240,'T-shit':160,'pants':210}
>>> shop.values()
dict_values([240, 160, 210])


#keys(方法返回一个可迭代对象，可以使用 list() 来转换为列表)
#注意：Python2.x 是直接返回列表
>>> shop.keys()
dict_keys(['shoes', 'T-shit', 'pants'])
>>> list(shop.keys())   #调用list()函数，转换成列表
['shoes', 'T-shit', 'pants']


#setdefault(若key在字典中，返回对应的值。若不在字典中，则插入key及设置的默认值default，并返回default
>>> shop.setdefault('shoes',300)
240
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210}
>>> shop.setdefault('hat',90)
90
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 90}
>>> shop.setdefault('gloves')   #default默认值为None
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 90, 'gloves': None}


#update(dict2 -- 添加到指定字典dict里的字典)
>>> shop={'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 90, 'gloves': None}
>>> info={'Eric':'男'}
>>> shop.update(info)
>>> shop
{'shoes': 240, 'T-shit': 160, 'pants': 210, 'hat': 90, 'gloves': None, 'Eric': '男'}


#items(返回可遍历的(键, 值) 元组数组)
>>> shop.items()
dict_items([('shoes', 240), ('T-shit', 160), ('pants', 210), ('hat', 90), ('gloves', None), ('Eric', '男')])

View Code

请务必注意，dict内部存放的顺序和key放入的顺序是没有关系的。

和list比较，dict有以下几个特点：

查找和插入的速度极快，不会随着key的增加而变慢；
需要占用大量的内存，内存浪费多。

而list相反：

查找和插入的时间随着元素的增加而增加；
占用空间小，浪费内存很少。

所以，dict是用空间来换取时间的一种方法。

dict可以用在需要高速查找的很多地方，在Python代码中几乎无处不在，正确使用dict非常重要，需要牢记的第一条就是dict的key必须是不可变对象。

set

set和dict类似，不过set是一组key的集合，不存储value。由于key不能重复，所以，在set中，没有重复的元素，当然集合自然是无序的。

要创建一个set，需要提供一个list作为输入：

>>> set1=set([1,3,5,7])   #注意定义的格式
>>> set1
{1, 3, 5, 7}

如果定义set时有重复的key，set会自动过滤掉。

>>> set2=set([1,3,5,7,3,5])
>>> set2
{1, 3, 5, 7}

增加和删除：add & remove

>>> set1.add(9)
>>> set1
{1, 3, 5, 7, 9}
>>>
>>>
>>> set1.remove(3)
>>> set1
{1, 5, 7, 9}

set可以看成数学意义上的无序和无重复元素的集合，因此，两个set可以做数学意义上的交集、并集等操作：

>>> s=set([1,2,3,4])
>>> s
{1, 2, 3, 4}
>>> t=set([3,4,5,6])
>>> t
{3, 4, 5, 6}
#并集
>>> s.union(t)                      #姿势2：s | t
{1, 2, 3, 4, 5, 6}

#交集
>>> s.intersection(t)               #姿势2：s & t
{3, 4}

#差集(在s中，不在t中)
>>> s.difference(t)                 #姿势2：s - t
{1, 2}

#对称差集(在s或在t中，不同时在)
>>> s.symmetric_difference(t)       #姿势2：s ^ t
{1, 2, 5, 6}

注意：set和dict的唯一区别仅在于没有存储对应的value，但是，set的原理和dict一样，所以，同样不可以放入可变对象，因为无法判断两个可变对象是否相等，

也就无法保证set内部“不会有重复元素”，所以我们可以在set里放一个list试试：

>>> L=[6,8]
>>> L
[6, 8]
>>> s=set([1,2,3,L,4])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

如果把list放入set中，python就会报错表示集合中不支持列表类型数据。

参考：

廖雪峰的官网

金角大王的博客

终日不为以思，无益，不如学也