Python Tutorial 学习(五)--Data Structures

5. Data Structures

这一章来说说Python的数据结构

5.1. More on Lists

之前的文字里面简单的介绍了一些基本的东西,其中就涉及到了list的一点点的使用.当然,它可不仅仅只有那么一点点,这里给出一个更详细一点的说明.来吧骚连,打开你的命令行窗口

>>>help(list)

看看会出来一些什么~~`

list.append(x)

向一个序列里面追加元素 x

a = []

a.append(x) # 假设x已经定义了

a[len(a):] = [x]

list.extend(L)

扩展另一个序列到当前的序列,用自己的话说,就是两个序列相连接

a[len(a):] = L # L是一个list

list.insert(i, x): 在list的第i个位置插入元素x,第一个变量是你需要在其前插入的东西的索引.因此,a.insert(0, x)就是在序列a的首位插入新元素x,而a.inser(len(a), x)则是在末尾.

list.remove(x): 移除序列中的第一个x元素,当然,如果x并不存在于该序列中的话,这会引发一个错误.

list.pop([i]): 如果你对堆栈的知识点不陌生,那么这里一定会让你感到熟悉.实际上这个也就相当于出栈的操作.可选的参数i用于指定出栈的元素的位置(索引),该操作返回的是pop出来的对应的呀元素. 当i没有指定的时候,默认会pop出的是序列的最后一个元素, (方括号括起来表示当前指定的参数是可选的,不是说你在指定这个参数的时候要用方括号括起来,这个用法在后面的说明里面会经常看到)

list.index(x): 返回序列中元素x的索引,比如['a', 'b', 'c'].index('b') 返回的就是1,需要注意的是当不存在该元素的时候,这会引发一个错误.

list.count(x): 这个返回的是指定的元素x在该序列中出现的次数.

list.sort(cmp=None, key=None, reverse=False)

Sort the items of the list in place (the arguments can be used for sort customization, see sorted() for their explanation).

这句我翻译不准确,直接上原文吧.需要注意的是,sort是直接对原始对象进行操作的,是没有返回值(返回None)的,如果你在前面的文章里面有过浏览.那么你肯定对一个栗子有印象:

pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
>>> pairs.sort(key=lambda pair: pair[1])
这个,用到了的时候,你就自然会明白这是有多么的妙.

list.reverse()

　　　不多说,也还是直接上栗子:

>>> lst=[1,2,3]
>>> l = lst.reverse()
>>> l
>>> lst
[3, 2, 1]

顺手附上官方的例程:

>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print a.count(333), a.count(66.25), a.count('x')
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
>>> a.remove(333)
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
>>> a.pop()
1234.5
>>> a
[-1, 1, 66.25, 333, 333]

You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. [1] This is a design principle for all mutable data structures in Python.

5.1.1. Using Lists as Stacks 将序列用作栈

合理运用list的append()和pop()方法,我们可以利用Python实现堆栈后进先出(LIFO:last in first out)的操作.

push操作就相当于append() #会在序列末尾追加元素

pop操作就相当于pop() #会将最尾部的元素pop出来

栗子在这里:

>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]

5.1.2. Using Lists as Queues 将序列用作队列

将序列当作队列使,实现先进先出的操作是同样可行的.当然,直接使用list显得不那么高效,因为append 和pop在序列的尾部操作起来飞快,可是操作起序列的首个元素就显得稍微慢了点点(对序列的首个元素进行变动会让其它所有的元素被动的被移动了一个位置).

To implement a queue, use collections.deque which was designed to have fast appends and pops from both ends. For example:

其他的不用看了,你只需要有一个东西可以用来实现这个就OK了,那就是collections.deque,这个会更快~

栗子:

>>> from collections import deque
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry")           # Terry arrives
>>> queue.append("Graham")          # Graham arrives
>>> queue.popleft()                 # The first to arrive now leaves
'Eric'
>>> queue.popleft()                 # The second to arrive now leaves
'John'
>>> queue                           # Remaining queue in order of arrival
deque(['Michael', 'Terry', 'Graham'])

5.1.3. Functional Programming Tools 程序化的编程工具(对于我的英语翻译水平,我也只能呵呵了)

当使用到list的时候,有三个非常有用的内建函数: filter(), map(), and reduce().

filter(function, sequence) 简言之,filter就是用来筛选出符合条件的元素出来的.观察一下该函数的两个参数,第一个是一个函数,用来作为筛选条件的,第二个则是筛选的对象,用于从中获取到满足条件的东西.当 sequence 是一个字符串或者元组的时候, 返回的结果也是同其相同的类型;否则,filter过后总是返回一个序列.

这里给出一个栗子,用于从一个序列里面得到所有不能被2和3整除的数:

>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> filter(f, range(2, 25))
[5, 7, 11, 13, 17, 19, 23]

map(function, sequence)则是将sequence中的元素依次作为function函数的参数,并将返回的结果组成的列表返回,举个栗子:

>>> def cube(x): return x*x*x
...
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

再给一个栗子:

>>> def add(a,b,c):return a+b+c
...
>>> map(add, [1,2,3],[4,5,6],[7,8,9])
[12, 15, 18]
>>>

所以你需要知道的就是,map后面的sequence的个数是跟function的参数是要一致的哟.

如果高斯学过Python,那他说不定直接用reduce就把题目给解了

reduce(function, sequence) redece首先将序列的前两个作为参数,然后将结果和序列的第三个作为参数,然后再将结果和序列的第四个作为参数然后......:

>>> def add(x,y): return x+y
...
>>> reduce(add, range(1, 11))
55

高斯菌的问题的解:

>>> reduce(lambda x, y: x+y,range(101))
5050
>>>

需要注意的是:参数sequence如果只有一个元素,那么该元素会被直接返回,如果sequence是个空序列,那么会引发一个错误.

我们可以为reduce指定第三个参数作为启动参数,

In this case the starting value is returned for an empty sequence, and the function is first applied to the starting value and the first sequence item, then to the result and the next item, and so on. For example,

>>> def sum(seq):
...     def add(x,y): return x+y
...     return reduce(add, seq, 0)
...
>>> sum(range(1, 11))
55
>>> sum([])
0

# Don’t use this example’s definition of sum(): since summing numbers is such a common need, a built-in function sum(sequence) is already provided, 
# and works exactly like this.

5.1.4. List Comprehensions

递推式构造列表(List Comprehensions)提供了一个非常简介的创建序列的方式. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

例如,我们想要创建一个平方数构成的序列,就像这样:

>>> squares = []
>>> for x in range(10):
...     squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

我们可以通过另一种方式获取同样的结果:

squares = [x**2 for x in range(10)]

这种构建序列的方式和squares = map(lambda x: x**2, range(10))是等效的, 但前一种更加的简介明了,便于阅读理解.

递推构造序列由一个被括起来的带着for循环的表达式构成, 然后还可以再跟上零个或多个for循环或者if表达式. 这将根据for循环迭代,if判断得到一个新的序列.

下面的栗子里面的序列就是通过比较两个list,当二者不相等时,构成一个新的元素填充进去.

>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

上面的也可以这样表示:

>>> combs = []
>>> for x in [1,2,3]:
...     for y in [3,1,4]:
...         if x != y:
...             combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

注意看for和if的顺序同上面的一段代码里面是相同的(Note how the order of the for and if statements is the same in both these snippets).

如果表达式是一个元组(就如同上面的栗子里面的那样),那么它得用括号括起来.

>>> vec = [-4, -2, 0, 2, 4]
>>> # 构建一个被双倍了的list
>>> [x*2 for x in vec]
[-8, -4, 0, 4, 8]
>>> # 删选得到正整数序列
>>> [x for x in vec if x >= 0]
[0, 2, 4]
>>> # 用函数将序列中的元素'过滤'一遍
>>> [abs(x) for x in vec]
[4, 2, 0, 2, 4]
>>> # 调用每个元素自身的某个方法
>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
>>> [weapon.strip() for weapon in freshfruit]
['banana', 'loganberry', 'passion fruit']

只会比你想得到的更多...:

>>> from math import pi
>>> [str(round(pi, i)) for i in range(1, 6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']

5.1.4.1. Nested List Comprehensions

Python的多维数组:

>>> matrix = [
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12],
... ]

转秩了:

>>> [[row[i] for row in matrix] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

又转秩了:

>>> transposed = []
>>> for i in range(4):
...     transposed.append([row[i] for row in matrix])
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

还可以这样转:

>>> transposed = []
>>> for i in range(4):
...     # the following 3 lines implement the nested listcomp
...     transposed_row = []
...     for row in matrix:
...         transposed_row.append(row[i])
...     transposed.append(transposed_row)
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

现实生活中个,你可能会想要内建函数完成更复杂的操作. 这里举一个zip() 函数的小栗子:

>>> zip(*matrix)
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

See Unpacking Argument Lists for details on the asterisk in this line.

5.2. The `del` statement

有一种通过序列元素的索引而不是元素的值来删除指定元素的方法: del. 同pop()不同之处在于,pop操作将会返回被删除的元素的值. del同样可以用来删除序列中的子序列或者是整个序列 (早期我们可不是这么干的,而是给要删除的序列赋一个[]来实现). 参考一下下面的栗子:

>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]

del 用来删除整个序列:

>>> del a

在这个操作(删除整个a)之后再调用a的时候将会引发一个错误. (at least until another value is assigned to it). 我们将在后文提到更多关于del 的使用.

5.3. Tuples and Sequences 元组和数列

序列和字符串有许多相同的特效, 比如说索引和切片操作. 这里有两个关于数列数据类型的使用的栗子 (see Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange). Since Python is an evolving language, other sequence data types may be added. There is also another standard sequence data type: the tuple.

A tuple consists of a number of values separated by commas, for instance:

>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
>>> # Tuples are immutable:
... t[0] = 88888
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> # but they can contain mutable objects:
... v = ([1, 2, 3], [3, 2, 1])
>>> v
([1, 2, 3], [3, 2, 1])

As you see, on output tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression). It is not possible to assign to the individual items of a tuple, however it is possible to create tuples which contain mutable objects, such as lists.

Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain an heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.

A special problem is the construction of tuples containing 0 or 1 items: the syntax has some extra quirks to accommodate these. Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). Ugly, but effective. For example:

>>> empty = ()
>>> singleton = 'hello',    # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)

The statement t = 12345, 54321, 'hello!' is an example of tuple packing: the values 12345, 54321 and 'hello!' are packed together in a tuple. The reverse operation is also possible:

>>> x, y, z = t

This is called, appropriately enough, sequence unpacking and works for any sequence on the right-hand side. Sequence unpacking requires the list of variables on the left to have the same number of elements as the length of the sequence. Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.

5.4. Sets

Python同样支持集合. 集合是由无序的不重复的元素构成的. Basic uses include membership testing and eliminating duplicate entries. 集合同样支持xxx,xxx,xxx的操作.

花括号或者set()可以用来创建集合. 注意: 创建一个空的集合必须使用set()而不是{} (这是一个空的字典).

一个简明的小栗子:

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> fruit = set(basket)               # create a set without duplicates
>>> fruit
set(['orange', 'pear', 'apple', 'banana'])
>>> 'orange' in fruit                 # fast membership testing
True
>>> 'crabgrass' in fruit
False

>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a                                  # unique letters in a
set(['a', 'r', 'b', 'c', 'd'])
>>> a - b                              # letters in a but not in b
set(['r', 'd', 'b'])
>>> a | b                              # letters in either a or b
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
>>> a & b                              # letters in both a and b
set(['a', 'c'])
>>> a ^ b                              # letters in a or b but not both
set(['r', 'd', 'b', 'm', 'z', 'l'])

同递推式构造列表一样,我们也可以这样来构造集合:

>>> a = {x for x in 'abracadabra' if x not in 'abc'}
>>> a
set(['r', 'd'])

5.5. Dictionaries

另一个非常有用的内建Python数据类型是字典(see Mapping Types — dict). 在其他的一些程序语言中,字典可能又被称作 “associative memories” or “associative arrays”. 与由整数索引的序列不同的是,字典是通过只读(不可变)的键值(key)来索引的; 字符串和数字总是可以被充当为键值. 仅由字符串数字或者元组构成的元组也可以充当字典的key;但是如果元组中包含了可变的元素(不管是直接或间接的)的时候,将其充当字典的key都是不可行的. 序列不能被当作字典的key,因为它可变(我这解释,我也只能呵呵了).

字典的特性(字典中的key不可重复),使得将字典视为由key:value的组合构成的集合,想来也是极好的. 空字典的表示方式: {}.

使用del同样可以删除字典中指定的键:值组合. 为字典中已经存在的key赋新值的时候,字典中key对应的value将会被更新. 从字典中取一个不存在的key的value的时候,是会引发一个错误的.

字典的keys()返回该字典的所有的key构成的序列, in arbitrary order (if you want it sorted, just apply the sorted() function to it). in 关键字同样也可以用来判断字典中是否存在指定的key.

字典使用的一个小栗子:

>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> tel.keys()
['guido', 'irv', 'jack']
>>> 'guido' in tel
True

函数dict() 可以将下面这种格式的序列直接转换为字典:

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}

此外,你也可以任性的构造出一个个由key:value组合而成的字典,就像下面这样:

>>> {x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}

When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:

>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}

5.6. Looping Techniques

通过enumerate() 函数,我们可以遍历获取一个序列(字符串)对应的元素的索引和值.

>>> for i, v in enumerate(['tic', 'tac', 'toe']):
...     print i, v
...
0 tic
1 tac
2 toe

想要同时遍历两个或者更多的时候,那么 zip()就可以派上用场了.

>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in zip(questions, answers):
...     print 'What is your {0}?  It is {1}.'.format(q, a)
...
What is your name?  It is lancelot.
What is your quest?  It is the holy grail.
What is your favorite color?  It is blue.
再来一发:

>>> for i, j, k in zip([1,2,3], [4,5,6], [7,8,9]):print i, j, k
...
1 4 7
2 5 8
3 6 9

reversed() 可以用来逆序.

>>> for i in reversed(xrange(1,10,2)):
...     print i
...
9
7
5
3
1

sorted() 会根据指定的条件直接将原始的序列排序(前面已经指出过,返回None).

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> for f in sorted(set(basket)):
...     print f
...
apple
banana
orange
pear

使用iteritems()方法可以在遍历字典的时候同时取到字典中的key和对应的value:

>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.iteritems():
...     print k, v
...
gallahad the pure
robin the brave

想要在遍历一个序列的同时改变该序列, 一个很好的方法是首先遍历该序列的一个拷贝.切片是个不错的获取序列的拷贝的方法:

>>> words = ['cat', 'window', 'defenestrate']
>>> for w in words[:]:  # Loop over a slice copy of the entire list.
...     if len(w) > 6:
...         words.insert(0, w)
...
>>> words
['defenestrate', 'cat', 'window', 'defenestrate']

5.7. More on Conditions

The conditions used in while and if statements can contain any operators, not just comparisons.

The comparison operators in and not in check whether a value occurs (does not occur) in a sequence. The operators is and is not compare whether two objects are really the same object; this only matters for mutable objects like lists. All comparison operators have the same priority, which is lower than that of all numerical operators.

Comparisons can be chained. For example, a < b == c tests whether a is less than b and moreover b equals c.

Comparisons may be combined using the Boolean operators and and or, and the outcome of a comparison (or of any other Boolean expression) may be negated with not. These have lower priorities than comparison operators; between them, not has the highest priority and or the lowest, so that A and not B or C is equivalent to (A and (not B)) or C. As always, parentheses can be used to express the desired composition.

The Boolean operators and and or are so-called short-circuit operators: their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined. For example, if A and C are true but B is false, A and B and C does not evaluate the expression C. When used as a general value and not as a Boolean, the return value of a short-circuit operator is the last evaluated argument.

It is possible to assign the result of a comparison or other Boolean expression to a variable. For example,

>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
>>> non_null = string1 or string2 or string3
>>> non_null
'Trondheim'

Note that in Python, unlike C, assignment cannot occur inside expressions. C programmers may grumble about this, but it avoids a common class of problems encountered in C programs: typing = in an expression when == was intended.

5.8. Comparing Sequences and Other Types

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings uses the ASCII ordering for individual characters. Some examples of comparisons between sequences of the same type:

(1, 2, 3)              < (1, 2, 4)
[1, 2, 3]              < [1, 2, 4]
'ABC' < 'C' < 'Pascal' < 'Python'
(1, 2, 3, 4)           < (1, 2, 4)
(1, 2)                 < (1, 2, -1)
(1, 2, 3)             == (1.0, 2.0, 3.0)
(1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)

Note that comparing objects of different types is legal. The outcome is deterministic but arbitrary: the types are ordered by their name. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. [1] Mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc.

Footnotes

[1]	The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.