Numpy基础

2 NumPy数组基础

2.1 Numpy数组对象

Numpy中的ndarray是一个多维数组对象, 该对象由两部分组成:

实际的数据
描述这些数据的元数据

大部分的数组操作仅修改元数据部分, 而不改变底层的实际数据.

Numpy数组一般是同质的.

与Python中一样, Numpy数组的下标也是从0开始的.

我们用arange函数创建一维数组, 并获取其数据类型:

In [1]: a = np.arange(5)

In [2]: a.dtype
Out[2]: dtype('int32')

In [16]: a
Out[16]: array([0, 1, 2, 3, 4])

In [17]: a.shape
Out[17]: (5,)

2.2 多维数组

In [18]: m = np.array([np.arange(2), np.arange(2)])

In [19]: m
Out[19]:
array([[0, 1],
       [0, 1]])

In [20]: m.shape
Out[20]: (2, 2)

2.2.1 选取数组元素

首先, 创建一个2x2的多维数组

In [21]: a = np.array([[1, 2], [3, 4]])

In [22]: a
Out[22]:
array([[1, 2],
       [3, 4]])

依次取数为:

In [23]: a[0, 0]
Out[23]: 1

In [24]: a[0, 1]
Out[24]: 2

In [25]: a[1, 0]
Out[25]: 3

In [26]: a[1, 1]
Out[26]: 4

2.2.2 numpy数据类型

bool, inti, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64或float, complex64, complex128或complex

In [28]: np.float64(42)
Out[28]: 42.0

In [29]: np.int8(42.0)
Out[29]: 42

In [30]: np.bool(42)
Out[30]: True

In [31]: np.bool(0)
Out[31]: False

In [32]: np.bool(42.0)
Out[32]: True

In [33]: np.float(True)
Out[33]: 1.0

In [34]: np.float(False)
Out[34]: 0.0

在NumPy中, 许多函数的参数中可以指定数据类型

In [35]: np.arange(7, dtype=np.uint16)
Out[35]: array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)

In [36]: np.arange(7, dtype=np.float)
Out[36]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [37]: np.arange(7, dtype=np.float64)
Out[37]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [38]: np.arange(7, dtype=np.float32)
Out[38]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.], dtype=float32)

数据类型也可以通过字符编码来定义(不推荐使用)

In [40]: np.arange(7, dtype='f')
Out[40]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.], dtype=float32)

In [41]: np.arange(7, dtype='d')
Out[41]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [42]: np.arange(7, dtype='D')
Out[42]: array([ 0.+0.j,  1.+0.j,  2.+0.j,  3.+0.j,  4.+0.j,  5.+0.j,  6.+0.j])

In [43]: np.arange(7, dtype='i')
Out[43]: array([0, 1, 2, 3, 4, 5, 6], dtype=int32)

完整的Numpy数据类型列表可以在sctypeDict中找到

In [45]: np.sctypeDict
Out[45]:
{'?': numpy.bool_,
 0: numpy.bool_,
 'byte': numpy.int8,
 'b': numpy.int8,
 1: numpy.int8,
 'ubyte': numpy.uint8,
 'B': numpy.uint8,
 2: numpy.uint8,
 'short': numpy.int16,
 'h': numpy.int16,
 3: numpy.int16,
 'ushort': numpy.uint16,
 'H': numpy.uint16,
 4: numpy.uint16,
 'i': numpy.int32,
 5: numpy.int32,
 'uint': numpy.uint32,
 'I': numpy.uint32,
 6: numpy.uint32,
 'intp': numpy.int64,
 'p': numpy.int64,
 9: numpy.int64,
 'uintp': numpy.uint64,
 'P': numpy.uint64,
 10: numpy.uint64,
 'long': numpy.int32,
 'l': numpy.int32,
 7: numpy.int32,
 'L': numpy.uint32,
 8: numpy.uint32,
 'longlong': numpy.int64,
 'q': numpy.int64,
 'ulonglong': numpy.uint64,
 'Q': numpy.uint64,
 'half': numpy.float16,
 'e': numpy.float16,
 23: numpy.float16,
 'f': numpy.float32,
 11: numpy.float32,
 'double': numpy.float64,
 'd': numpy.float64,
 12: numpy.float64,
 'longdouble': numpy.float64,
 'g': numpy.float64,
 13: numpy.float64,
 'cfloat': numpy.complex128,
 'F': numpy.complex64,
 14: numpy.complex64,
 'cdouble': numpy.complex128,
 'D': numpy.complex128,
 15: numpy.complex128,
 'clongdouble': numpy.complex128,
 'G': numpy.complex128,
 16: numpy.complex128,
 'O': numpy.object_,
 17: numpy.object_,
 'S': numpy.bytes_,
 18: numpy.bytes_,
 'unicode': numpy.str_,
 'U': numpy.str_,
 19: numpy.str_,
 'void': numpy.void,
 'V': numpy.void,
 20: numpy.void,
 'M': numpy.datetime64,
 21: numpy.datetime64,
 'm': numpy.timedelta64,
 22: numpy.timedelta64,
 'bool8': numpy.bool_,
 'Bool': numpy.bool_,
 'b1': numpy.bool_,
 'float16': numpy.float16,
 'Float16': numpy.float16,
 'f2': numpy.float16,
 'float32': numpy.float32,
 'Float32': numpy.float32,
 'f4': numpy.float32,
 'float64': numpy.float64,
 'Float64': numpy.float64,
 'f8': numpy.float64,
 'complex64': numpy.complex64,
 'Complex32': numpy.complex64,
 'c8': numpy.complex64,
 'complex128': numpy.complex128,
 'Complex64': numpy.complex128,
 'c16': numpy.complex128,
 'object0': numpy.object_,
 'Object0': numpy.object_,
 'bytes0': numpy.bytes_,
 'Bytes0': numpy.bytes_,
 'str0': numpy.str_,
 'Str0': numpy.str_,
 'void0': numpy.void,
 'Void0': numpy.void,
 'datetime64': numpy.datetime64,
 'Datetime64': numpy.datetime64,
 'M8': numpy.datetime64,
 'timedelta64': numpy.timedelta64,
 'Timedelta64': numpy.timedelta64,
 'm8': numpy.timedelta64,
 'int32': numpy.int32,
 'uint32': numpy.uint32,
 'Int32': numpy.int32,
 'UInt32': numpy.uint32,
 'i4': numpy.int32,
 'u4': numpy.uint32,
 'int64': numpy.int64,
 'uint64': numpy.uint64,
 'Int64': numpy.int64,
 'UInt64': numpy.uint64,
 'i8': numpy.int64,
 'u8': numpy.uint64,
 'int16': numpy.int16,
 'uint16': numpy.uint16,
 'Int16': numpy.int16,
 'UInt16': numpy.uint16,
 'i2': numpy.int16,
 'u2': numpy.uint16,
 'int8': numpy.int8,
 'uint8': numpy.uint8,
 'Int8': numpy.int8,
 'UInt8': numpy.uint8,
 'i1': numpy.int8,
 'u1': numpy.uint8,
 'complex_': numpy.complex128,
 'int0': numpy.int64,
 'uint0': numpy.uint64,
 'single': numpy.float32,
 'csingle': numpy.complex64,
 'singlecomplex': numpy.complex64,
 'float_': numpy.float64,
 'intc': numpy.int32,
 'uintc': numpy.uint32,
 'int_': numpy.int32,
 'longfloat': numpy.float64,
 'clongfloat': numpy.complex128,
 'longcomplex': numpy.complex128,
 'bool_': numpy.bool_,
 'unicode_': numpy.str_,
 'object_': numpy.object_,
 'bytes_': numpy.bytes_,
 'str_': numpy.str_,
 'string_': numpy.bytes_,
 'int': numpy.int32,
 'float': numpy.float64,
 'complex': numpy.complex128,
 'bool': numpy.bool_,
 'object': numpy.object_,
 'str': numpy.str_,
 'bytes': numpy.bytes_,
 'a': numpy.bytes_}

View Code

2.3 自定义数据类型

自定义数据类型是一种异构数据类型, 可以当做用来记录电子表格或数据库中一行数据的结构.

作为示例，我们将创建一个存储商店库存信息的数据类型。其中，我们用一个长度为40个字符的字符串来记录商品名称，用一个32位的整数来记录商品的库存数量，最后用一个32位的单精度浮点数来记录商品价格。下面是具体的步骤。

(1) 创建数据类型：

In [47]: t = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])

In [48]: t
Out[48]: dtype([('name', '<U40'), ('numitems', '<i4'), ('price', '<f4')])

(2) 查看数据类型（也可以查看某一字段的数据类型） :

In [49]: t['name']
Out[49]: dtype('<U40')

(3) 创建指定类型的数组

In [50]: itemz = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)], dtype=t)

In [51]: itemz[1]
Out[51]: ('Butter', 13,  2.72000003)

2.4 一维数组的索引和切片

一维数组的切片操作与Python列表的切片操作很相似。

常规切片

In [53]: a[3:7]
Out[53]: array([3, 4, 5, 6])

也可以用下标0~7，以2为步长选取元素：

In [54]: a[:7:2]
Out[54]: array([0, 2, 4, 6])

和Python中一样，我们也可以利用负数下标翻转数组：

In [55]: a[::-1]
Out[55]: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

2.5 多维数组的索引和切片

ndarray支持在多维数组上的切片操作。为了方便起见，我们可以用一个省略号（...）来表示遍历剩下的维度。

举例来说,

(1) 我们先用arange函数创建一个数组并改变其维度，使之变成一个三维数组：

In [62]: b = np.arange(24).reshape(2,3,4)

In [63]: b.shape
Out[63]: (2, 3, 4)

In [64]: b
Out[64]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

(2) 下标取数

In [65]: b[0,0,0]
Out[65]: 0

In [66]: b[1,0,0]
Out[66]: 12

(3) 如果我们不关心楼层，也就是说要选取所有楼层的第1行、第1列的房间，那么可以将第1 个下标用英文标点的冒号:来代替：

In [68]: b[:,0,0]
Out[68]: array([ 0, 12])

选择第一层
In [69]: b[0]
Out[69]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

也可以这样写

In [70]: b[0, :, :]
Out[70]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

多个冒号可以用一个省略号（...）来代替，因此上面的代码等价于:

In [71]: b[0, ...]
Out[71]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

2.6 改变数组的维度

(1) ravel 我们可以用ravel函数完成展平的操作:

In [76]: b
Out[76]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [77]: b.ravel()
Out[77]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

(2) flatten 这个函数恰如其名， flatten就是展平的意思，与ravel函数的功能相同。不过，flatten函数会请求分配内存来保存结果，而ravel函数只是返回数组的一个视图（view）：

In [78]: b.flatten()
Out[78]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

(3) reshape 用元组设置维度除了可以使用reshape函数，我们也可以直接用一个正整数元组来设置数组的维度，如下所示：

In [79]: b.shape = (6, 4)

In [80]: b
Out[80]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

(4) transpose 在线性代数中，转置矩阵是很常见的操作。对于多维数组，我们也可以这样做

In [81]: b.transpose()
Out[81]:
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

(5) resize resize和reshape函数的功能一样，但resize会直接修改所操作的数组

In [82]: b.resize((2,12))

In [83]: b
Out[83]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])

2.7 数组的组合

(0) 创建数组

In [84]: a = np.arange(9).reshape(3,3)

In [85]: a
Out[85]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [86]: b = 2 * a

In [87]: b
Out[87]:
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

(1) hstack 水平组合

In [89]: np.hstack((a, b))
Out[89]:
array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16]])

我们也可以用concatenate函数来实现同样的效果，如下所示：

In [90]: np.concatenate((a, b), axis=1)
Out[90]:
array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16]])

(2) vstack 垂直组合

In [91]: np.vstack((a, b))
Out[91]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

同样，我们将concatenate函数的axis参数设置为0即可实现同样的效果。这也是axis参数的默认值

In [92]: np.concatenate((a, b), axis=0)
Out[92]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

(3) dstack 深度组合

In [93]: np.dstack((a, b))
Out[93]:
array([[[ 0,  0],
        [ 1,  2],
        [ 2,  4]],

       [[ 3,  6],
        [ 4,  8],
        [ 5, 10]],

       [[ 6, 12],
        [ 7, 14],
        [ 8, 16]]])

(4) column_stack 列组合

对于一维数组, column_stack函数对于一维数组将按列方向进行组合

In [96]: oned = np.arange(2)

In [97]: oned
Out[97]: array([0, 1])

In [98]: twice_oned = 2 * oned

In [99]: twice_oned
Out[99]: array([0, 2])

In [100]: np.column_stack((oned, twice_oned))
Out[100]:
array([[0, 0],
       [1, 2]])

而对于二维数组， column_stack与hstack的效果是相同的

In [104]: np.column_stack((a, b)) == np.hstack((a, b))
Out[104]:
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True]], dtype=bool)

(5) row_stack 行组合

与column_stack类似。对于两个一维数组，将直接层叠起来组合成一个二维数组。

In [106]: np.row_stack((oned, twice_oned))
Out[106]:
array([[0, 1],
       [0, 2]])

对于二维数组， row_stack与vstack的效果是相同的

In [107]: np.row_stack((a, b))
Out[107]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [108]: np.row_stack((a, b)) == np.vstack((a, b))
Out[108]:
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

2.8 数组的分割

NumPy数组可以进行水平、垂直或深度分割，相关的函数有hsplit、 vsplit、 dsplit和split。我们可以将数组分割成相同大小的子数组，也可以指定原数组中需要分割的位置。

(1) hsplit 水平分割

In [110]: np.hsplit(a, 3)
Out[110]:
[array([[0],
        [3],
        [6]]), array([[1],
        [4],
        [7]]), array([[2],
        [5],
        [8]])]

(2) vsplit 垂直分割

(3) dsplit 深度分割

分割对比

In [112]: c = np.arange(27).reshape(3, 3, 3)

In [113]: c
Out[113]:
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [114]: np.dsplit(c, 3)
Out[114]:
[array([[[ 0],
         [ 3],
         [ 6]],

        [[ 9],
         [12],
         [15]],

        [[18],
         [21],
         [24]]]), array([[[ 1],
         [ 4],
         [ 7]],

        [[10],
         [13],
         [16]],

        [[19],
         [22],
         [25]]]), array([[[ 2],
         [ 5],
         [ 8]],

        [[11],
         [14],
         [17]],

        [[20],
         [23],
         [26]]])]

In [115]: np.hsplit(c, 3)
Out[115]:
[array([[[ 0,  1,  2]],

        [[ 9, 10, 11]],

        [[18, 19, 20]]]), array([[[ 3,  4,  5]],

        [[12, 13, 14]],

        [[21, 22, 23]]]), array([[[ 6,  7,  8]],

        [[15, 16, 17]],

        [[24, 25, 26]]])]

In [116]: np.vsplit(c, 3)
Out[116]:
[array([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]]]), array([[[ 9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]]]), array([[[18, 19, 20],
         [21, 22, 23],
         [24, 25, 26]]])]

In [121]: c = np.arange(9).reshape(3, 3)

In [122]: c
Out[122]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [123]: np.hsplit(c, 3)
Out[123]:
[array([[0],
        [3],
        [6]]), array([[1],
        [4],
        [7]]), array([[2],
        [5],
        [8]])]

In [124]: np.vsplit(c, 3)
Out[124]: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])]

In [125]: np.dsplit(c, 3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-125-aa2ba1054587> in <module>()
----> 1 np.dsplit(c, 3)

c:pythonpython362libsite-packages
umpylibshape_base.py in dsplit(ary, indices_or_sections)
    665     """
    666     if len(_nx.shape(ary)) < 3:
--> 667         raise ValueError('dsplit only works on arrays of 3 or more dimensions')
    668     return split(ary, indices_or_sections, 2)
    669

ValueError: dsplit only works on arrays of 3 or more dimensions

2.11 数组的属性

除了shape和dtype属性以外， ndarray对象还有很多其他的属性，在下面一一列出。

ndim 给出数组的维数，或数组轴的个数
size 给出数组元素的总个数
itemsize 给出数组中的元素在内存中所占的字节数
nbytes 整个数组所占的存储空间 = b.size * b.itemsize

In [127]: b = np.arange(24).reshape(2,12)

In [128]: b
Out[128]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])

In [129]: b.ndim
Out[129]: 2

In [130]: b.size
Out[130]: 24

In [131]: b.itemsize
Out[131]: 4

In [132]: b.nbytes
Out[132]: 96

T属性的效果和transpose函数一样，如下所示

In [133]: b.resize(6,4)

In [134]: b
Out[134]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [135]: b.T
Out[135]:
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

对于一维数组，其T属性就是原数组

flat属性将返回一个numpy.flatiter对象，这是获得flatiter对象的唯一方式——我们无法访问flatiter的构造函数。这个所谓的“扁平迭代器”可以让我们像遍历一维数组一样去遍历任意的多维数组，如下所示

In [136]: b = np.arange(4).reshape(2,2)

In [137]: b
Out[137]:
array([[0, 1],
       [2, 3]])

In [138]: f = b.flat

In [139]: f
Out[139]: <numpy.flatiter at 0x2cc108e1280>

In [140]: for item in f: print(item)
0
1
2
3

我们还可以用flat对象直接获取一个数组元素：

In [141]: b.flat[2]
Out[141]: 2

In [142]: b.flat[3]
Out[142]: 3

或者获取多个元素

In [143]: b.flat[[1, 3]]
Out[143]: array([1, 3])

flat属性是一个可赋值的属性。对flat属性赋值将导致整个数组的元素都被覆盖

In [144]: b.flat = 7

In [145]: b
Out[145]:
array([[7, 7],
       [7, 7]])

In [146]: b.flat[[1, 3]] = 1

In [147]: b
Out[147]:
array([[7, 1],
       [7, 1]])

tolist Numpy数组转换成Python列表

In [148]: b.tolist()
Out[148]: [[7, 1], [7, 1]]

3 常用函数

3.1 txt文件读写

创建矩阵, 使用savetxt保存

In [149]: i2 = np.eye(2)

In [150]: i2
Out[150]:
array([[ 1.,  0.],
       [ 0.,  1.]])

In [151]: np.savetxt('d:/cache/eye.txt', i2)