Python学习之--数据基础

对于Python来说，一切皆对象。包括数字、字符串、列表等，对象是由类来创建的，那对象的一个优点就是可以使用其创建类中所定义的各种方法。

查看对象/方法

1）可以在命令行中直接查看，如下：

>>> a='I am a string'
>>> type(a)
<class 'str'>

使用type() 可以查看一个变量（对象）的类，找到类后，可以使用dir()来查询里面的方法：

>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>>

也可以使用 help()来查询一个类或者方法的详细说明：

help(str)

help(str.upper)

2）当然，也可以在IDE中去查看，也会比较方便。

变量(variable)命名

1 变量名只能包含以下字符：
• 小写字母（a~z）
• 大写字母（A~Z）
• 数字（0~9）
• 下划线（_）
2 名字不允许以数字开头。

3 变量名不能使用python中保留的关键字，如下：

False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise

python中的基础数据类型以及相关的方法。

数字

在Python中，数字有int和float类型（long在3.x版本中已经被合并为int）.

在Python 2 里，一个int 型包含32 位，可以存储从-2 147 483 648 到2 147 483 647 的整数。
一个long 型会占用更多的空间：64 位，可以存储从-9 223 372 036 854 775 808 到9 223 372 036 854 775 807 的整数。
到了Python 3，long 类型已不复存在，而int 类型变为可以存储任意大小的整数，甚至超过64 位。

对于数字来说，它们可以使用基本的数学运算，如：

>>> 5+3       # 加法
8
>>> 0.8*4     # 乘法
3.2
>>> 2**5      # 幂运算
32 
>>> 5/2        # 除法
2.5
>>> 5//2       # 商取整
2
>>> 9%2       # 模运算
1

具体来看，

int 类中提供的常用方法有(float和int大多数比较类似)：

  1 abs()   # 求绝对值
  2 >>> abs(-3)
  3 3
  4 
  5 __add__(self, *args, **kwargs): # 加法，
  6 >>> a=3
  7 >>> a.__add__(1)
  8 4
  9 
 10 __and__(self, *args, **kwargs): # 等同于与运算
 11 >>> a=3
 12 >>> a.__and__(2)
 13 2
 14 
 15  __bool__(self, *args, **kwargs): # 返回布尔值
 16 >>> a.__bool__()
 17 True
 18 
 19 __divmod__(self, *args, **kwargs): # 返回商和余数
 20 >>> b=9
 21 >>> b.__divmod__(7)
 22 (1, 2)
 23 
 24 __eq__(self, *args, **kwargs): # 判断是否相等
 25 >>> b=9
 26 >>> a=3
 27 >>> b.__eq__(a)
 28 False
 29 
 30 __float__(self, *args, **kwargs): # 转换为浮点数
 31 >>> b
 32 9
 33 >>> float(b)
 34 9.0
 35 
 36 __floordiv__(self, *args, **kwargs): # 等同于 // 运算
 37 
 38  __ge__(self, *args, **kwargs): # 判断是否大于等于
 39 >>> a
 40 3
 41 >>> b
 42 9
 43 >>> b.__ge__(a)
 44 True
 45 
 46 __gt__(self, *args, **kwargs): # 判断是否大于
 47 >>> a
 48 3
 49 >>> b
 50 9
 51 >>> b.__gt__(a)
 52 True
 53 
 54 __le__(self, *args, **kwargs): # 判断是否小于
 55 >>> a
 56 3
 57 >>> b
 58 9
 59 >>> b.__le__(a)
 60 False
 61 
 62 __lshift__(self, *args, **kwargs): # 按位左移，相当于翻倍
 63 >>> b
 64 9
 65 >>> b.__lshift__(1)
 66 18
 67 
 68 __lt__(self, *args, **kwargs): # 判断是否小于
 69 
 70 __mod__(self, *args, **kwargs): # 取模运算，等同于 %
 71 
 72 __mul__(self, *args, **kwargs): # 乘法运算，等同于 *
 73 
 74 __neg__(self, *args, **kwargs): # 取负值
 75 >>> b.__neg__()
 76 -9
 77 
 78 __ne__(self, *args, **kwargs): # 判断是否不相等
 79 
 80 __or__(self, *args, **kwargs): # 相当于 or 
 81 
 82 __pos__(self, *args, **kwargs): # 取正值
 83 
 84 __pow__(self, *args, **kwargs): # 幂运算，相当于 **
 85 
 86 
 87 __round__(self, *args, **kwargs): # Rounding an Integral returns itself.
 88 >>> c.__round__()
 89 6
 90 
 91 __sizeof__(self, *args, **kwargs): # 
 92         """ Returns size in memory, in bytes """
 93 
 94 __str__(self, *args, **kwargs): # 转换成字符串
 95         """ Return str(self). """
 96 
 97 __trunc__(self, *args, **kwargs): # 
 98         """ Truncating an Integral returns itself. """
 99 
100  __xor__(self, *args, **kwargs): # 按位异或运算，相当于 ^

int常用方法

字符串（str）

在字符串str类中，常用的方法有：

  1 capitalize(self): # 首字母大写
  2 >>> str1="this is a string"
  3 >>> str1.capitalize()
  4 'This is a string'
  5 
  6 casefold(self): # 大写字母小写
  7 >>> str2="This Is A String"
  8 >>> str2.casefold()
  9 'this is a string'
 10 
 11 center(self, width, fillchar=None): # 字符串居中，可以填充空白字符
 12 >>> str3=“center”
 13 >>> str3.center(18,'*')
 14 '******center******'
 15 
 16 count(self, sub, start=None, end=None): # 计算某字符或者字符串出现的次数
 17 >>> str3.count('ce')
 18 1
 19 
 20 encode(self, encoding='utf-8', errors='strict'): # 用于字符串编码
 21 
 22 endswith(self, suffix, start=None, end=None): # 判断是否以xx结尾
 23 >>> str3
 24 'center'
 25 >>> str3.endswith('er')
 26 True
 27 
 28 expandtabs(self, tabsize=8): # 把字符串中的tab转化为空格，默认8个空格
 29 
 30 find(self, sub, start=None, end=None): # 找子序列，并返回所在位置，找不到返回-1
 31 
 32 >>> str3.find('er')
 33 4
 34 >>> str3.find('nr')
 35 -1
 36 
 37 format(*args, **kwargs): # 用于字符串格式化
 38 >>> '{0}, is a {1}'.format('This','string')
 39 'This, is a string'
 40 
 41 index(self, sub, start=None, end=None): # 同find，但是找不到的话会报错
 42 >>> str3
 43 'center'
 44 >>> str3.index('nr')
 45 Traceback (most recent call last):
 46   File "<stdin>", line 1, in <module>
 47 ValueError: substring not found
 48 
 49 isalnum(self): # 判断字符串是否是字母和数字的结合
 50 >>> a=''
 51 >>> a.isalnum()
 52 False
 53 
 54 isalpha(self): # 判断字符串是否都是字母
 55 >>> str3
 56 'center'
 57 >>> str3.isalpha()
 58 True
 59 
 60 isdecimal(self): #  Return True if there are only decimal characters in S,
 61 
 62 isdigit(self): # 判断是否都是数字
 63 >>> a='123'
 64 >>> a.isdigit()
 65 True
 66 
 67 isidentifier(self): # 判断是否是关键字
 68 >>> a='123def'
 69 >>> a.isidentifier()
 70 False
 71 >>> a='def'
 72 >>> a.isidentifier()
 73 True
 74 
 75 islower(self): # 判断是否都是小写字母
 76 
 77 isnumeric(self): # 判断是否都是数字
 78 
 79 isspace(self): # 判断是否都是空格
 80 
 81 istitle(self): # 判断是否是title，即每个单词首字母大写
 82 >>> a='This Is A String'
 83 >>> a.istitle()
 84 True
 85 
 86 isupper(self): # 判断是否都是大写字母
 87 
 88 join(self, iterable): # 用来做拼接
 89 >>> a
 90 'This Is A String'
 91 >>> b='*'
 92 >>> b.join(a)
 93 'T*h*i*s* *I*s* *A* *S*t*r*i*n*g'
 94 
 95 ljust(self, width, fillchar=None): # 字符串靠左，可以指定填充字符
 96 >>> str4='left'
 97 >>> str4.ljust(20,'*')
 98 'left****************'
 99 
100 lower(self): # 大写字母全部转换成小写
101 
102 lstrip(self, chars=None): # 字符串左边去除空格或者指定的字符串
103 
104 partition(self, sep): # 用来做分割
105 S.partition(sep) -> (head, sep, tail)
106 Search for the separator sep in S, and return the part before it,
107         the separator itself, and the part after it.  If the separator is not
108         found, return S and two empty strings.
109 
110 replace(self, old, new, count=None): # 替换字符或者字符串
111 >>> a
112 'This Is A String'
113 >>> a.replace('ring','o')
114 'This Is A Sto'
115 
116 rfind(self, sub, start=None, end=None): # 
117         """
118         S.rfind(sub[, start[, end]]) -> int
119         
120         Return the highest index in S where substring sub is found,
121         such that sub is contained within S[start:end].  Optional
122         arguments start and end are interpreted as in slice notation.
123         
124         Return -1 on failure.
125         """
126         return 0
127 
128 rindex(self, sub, start=None, end=None): # 
129         """
130         S.rindex(sub[, start[, end]]) -> int
131         
132         Like S.rfind() but raise ValueError when the substring is not found.
133         """
134         return 0
135 
136 rjust(self, width, fillchar=None): # 字符串右移，可以指定填充字符
137         """
138         S.rjust(width[, fillchar]) -> str
139         
140         Return S right-justified in a string of length width. Padding is
141         done using the specified fill character (default is a space).
142         """
143         return ""
144 
145 rpartition(self, sep): # 
146         """
147         S.rpartition(sep) -> (head, sep, tail)
148         
149         Search for the separator sep in S, starting at the end of S, and return
150         the part before it, the separator itself, and the part after it.  If the
151         separator is not found, return two empty strings and S.
152         """
153 rstrip(self, chars=None): # 字符串右侧删除空格或指定字符串
154 
155 split(self, sep=None, maxsplit=-1): 指定分隔符分割字符串
156 
157 startswith(self, prefix, start=None, end=None): # 判断字符串是否以指定字符开头
158       
159 strip(self, chars=None): # 字符串两边删除空格或者指定字符串
160  
161 swapcase(self): # 大写转小写，小写转大写
162 
163 title(self): # 设置title，即每个单词首字母大写
164 >>> a='this is a string'
165 >>> a.title()
166 'This Is A String'
167 
168 upper(self): # 转换成大写字母
169 
170 zfill(self, width): # 用0来填充字符串没有填充的位置
171 
172 __add__(self, *args, **kwargs): # 字符串拼接
173 
174 __eq__(self, *args, **kwargs): # 判断字符串是否相等
175 
176 __len__(self, *args, **kwargs): # 获取字符串的长度
177 >>> a
178 'this is a string'
179 >>> len(a)
180 16
181 
182 索引操作--字符串可以使用索引来获取相应位置的字符，如：
183 
184 >>> a
185 'this is a string'
186 >>> a[-1]
187 'g'
188 >>> a[0]
189 't'
190 
191  
192 
193 字符串也支持切片操作，如：
194 
195 >>> a[:5]
196 'this '
197 >>> a[7:]
198 ' a string'
199 >>>

字符串常用方法

注意：字符串的拼接最好不要使用‘+’，这样会浪费不必要的空间。

使用* 可以进行字符串复制，如下：

>>> 'ab'*4
'abababab'

字符串切片操作请参考这里。

列表(list)

列表是一组有序的数据集合。对列表可以进行遍历，增删改查等操作。

list类中提供了如下常用的方法：

 1 append(self, p_object): # 在列表尾部新增一个元素
 2 >>> li
 3 [1, 2, 3, 4]
 4 >>> li.append(5)
 5 >>> li
 6 [1, 2, 3, 4, 5]
 7 
 8 clear(self): # 清空一个列表
 9 >>> li
10 [1, 2, 3, 4, 5]
11 >>> li.clear()
12 >>> li
13 []
14 
15 copy(self): # 浅拷贝一个列表
16 >>> li=[1,2,3,4]
17 >>> li2=li.copy()
18 >>> li2
19 [1, 2, 3, 4]
20 
21 count(self, value): # 计算某个元素出现的次数
22 >>> li2
23 [1, 2, 3, 4, 2]
24 >>> li2.count(2)
25 2
26 
27 extend(self, iterable): # 扩展一个列表
28 >>> li3
29 ['a', 'b', 'c']
30 >>> li
31 [1, 2, 3, 4]
32 >>> li.extend(li3)
33 >>> li
34 [1, 2, 3, 4, 'a', 'b', 'c']
35 
36 index(self, value, start=None, stop=None): # 获取元素的index值
37 >>> li
38 [1, 2, 3, 4, 'a', 'b', 'c']
39 >>> li.index('b')
40 5
41 
42 insert(self, index, p_object): # 在index位置前面插入一个元素
43 >>> li
44 [1, 2, 3, 4, 'a', 'b', 'c']
45 >>> li.insert(4,'d')
46 >>> li
47 [1, 2, 3, 4, 'd', 'a', 'b', 'c']
48 
49 pop(self, index=None): # 删除一个元素，可以指定index值
50 >>> li
51 [1, 2, 3, 4, 'd', 'a', 'b', 'c']
52 >>> li.pop(4)
53 'd'
54 >>> li
55 [1, 2, 3, 4, 'a', 'b', 'c']
56 
57 remove(self, value): # 删除一个元素
58 >>> li
59 [1, 2, 3, 4, 'a', 'b', 'c']
60 >>> li.remove('a')
61 >>> li
62 [1, 2, 3, 4, 'b', 'c']
63 
64 reverse(self): # 反转一个列表
65 >>> li
66 [1, 2, 3, 4, 'b', 'c']
67 >>> li.reverse()
68 >>> li
69 ['c', 'b', 4, 3, 2, 1]
70 
71 sort(self, key=None, reverse=False): # 对列表进行排序
72 >>> li
73 [4, 3, 2, 1]
74 >>> li.sort()
75 >>> li
76 [1, 2, 3, 4]
77 
78 len(): #计算列表的长度
79 >>> li
80 [1, 2, 3, 4]
81 >>>
82 >>>
83 >>> len(li)
84 4

列表常用方法

列表可以使用index值来查询元素，但是使用index时要注意边界检查，如：

>>> li
[1, 2, 3, 4]
>>> li[3]
4
>>> li[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

列表解析

会通过对序列中每一项运行一个表达式来生成一个新的列表，字典或者集合

>>> M=[[1,2,3],[4,5,6],[7,8,9]]
>>> col2= [row[1] for row in M]
>>> col2
[2, 5, 8]

>>> {x*2 for x in [1,2,3,4]}
{8, 2, 4, 6}

>>> {x:(x/2) for x in [1,2,3,4]}
{1: 0.5, 2: 1.0, 3: 1.5, 4: 2.0}

复制列表

通过下面任意一种方法，都可以将一个列表的值复制到另一个新的列表中，复制后改变原列表的值不会影响到新列表的值：
• 列表copy() 函数
• list() 转换函数
• 列表分片[:]

元组(tuple)

元组类似于列表，但是它具有不可变性。并且相比起列表，元组占用的空间会小一点。

>>> t1=(1,2,3,4)
>>> t1
(1, 2, 3, 4)
>>> t1[2]
3
>>> t1[2]=5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

tuple类中的常用操作，有：

count(self, value): # 返回某个元素出现的次数
>>> t1
(1, 2, 3, 4)
>>> t1.count(4)
1

index(self, value, start=None, stop=None): # 查找某个元素的index值
>>> t1
(1, 2, 3, 4)
>>> t1.index(2)
1

元组的元素也可以是元组，字典或者列表

>>> t1=([1,2],(3,4),{'k1':'v1'})
>>> t1
([1, 2], (3, 4), {'k1': 'v1'})

字典(dict)

它是一系列‘键：值’对的合集。

clear(self): # 清空一个字典
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.clear()
>>> dic
{}

copy(self): # 浅拷贝一个字典
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic2=dic.copy()
>>> dic2
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}

fromkeys(*args, **kwargs): # 拿到key，指定value，生成一个新的字典
>>> key=[1,2,3,4]
>>> newdic={}
>>> newdic=newdic.fromkeys(key,'value')
>>> newdic
{1: 'value', 2: 'value', 3: 'value', 4: 'value'}

get(self, k, d=None): # 给定key去获取相应的value，,如果key不存在，默认返回一个None，或者指定的值
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.get('k1')
'v1'
>>> dic.get('k4')
>>> dic.get('k5',9)
9

items(self): # 获取字典的键值对
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.items()
dict_items([('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')])

keys(self): # 获取字典的key值
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.keys()
dict_keys(['k1', 'k2', 'k3'])

pop(self, k, d=None): # 删除一个值根据指定的key,如果key没有找到，返回指定的值
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.pop('k5')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'k5'
>>> dic.pop('k5',0)
0

popitem(self): # 随机删除字典内一个键值对

setdefault(self, k, d=None): # 对指定的key设置默认值
>>> dic
{'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic.setdefault('k5','v5')
'v5'
>>> dic
{'k5': 'v5', 'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}

update(self, E=None, **F): # 更新字典
>>> dic
{'k5': 'v5', 'k1': 'v1', 'k2': 'v2', 'k3': 'v3'}
>>> dic3={'key':'value'}
>>> dic3
{'key': 'value'}
>>> dic.update(dic3)
>>> dic
{'k5': 'v5', 'k1': 'v1', 'k2': 'v2', 'k3': 'v3', 'key': 'value'}

values(self): # 获取字典的值
>>> dic
{'k5': 'v5', 'k1': 'v1', 'k2': 'v2', 'k3': 'v3', 'key': 'value'}
>>> dic.values()
dict_values(['v5', 'v1', 'v2', 'v3', 'value'])

在Python 3.x版本中，字典的keys,values和items返回的都是视图对象，如果想对其进行列表的操作，需要先转化为列表才行。

另外，3.x中取消了has_key方法（用来测试key是否存在），同样的功能可以使用in来实现：

>>> dic
{'k5': 'v5', 'k1': 'v1', 'k2': 'v2', 'k3': 'v3', 'key': 'value'}
>>> 'k5' in dic
True

集合(set)

无序，并且不允许内部元素重合。

创建集合

>>> empty_set = set()
>>> empty_set
set()
>>> even_numbers = {0, 2, 4, 6, 8}
>>> even_numbers
{0, 8, 2, 4, 6}
>>> odd_numbers = {1, 3, 5, 7, 9}
>>> odd_numbers
{9, 3, 1, 5, 7}
>>> a = {1,1,2,3,4}
>>> a
{1, 2, 3, 4}

可以使用set()将列表和元组转换为集合。如果尝试用set()转换字典的话，转换后的集合只包含字典的key。

交集运算 &

>>> a={1,2,3,4}
>>> b={3,4,5,6,7}
>>> a&b
{3, 4}
>>> a.intersection(b)
{3, 4}

并集运算 |

>>> a
{1, 2, 3, 4}
>>> b
{3, 4, 5, 6, 7}
>>> a|b
{1, 2, 3, 4, 5, 6, 7}
>>> a.union(b)
{1, 2, 3, 4, 5, 6, 7}

差集运算 |

>>> a
{1, 2, 3, 4}
>>> b
{3, 4, 5, 6, 7}
>>> a-b
{1, 2}
>>> b-a
{5, 6, 7}
>>> a.difference(b)
{1, 2}
>>> b.difference(a)
{5, 6, 7}

其他集合运算

使用^ 或者symmetric_difference() 可以获得两个集合的异或集（仅在两个集合中出现一次）：

>>> a
{1, 2, 3, 4}
>>> b
{3, 4, 5, 6, 7}
>>> a^b
{1, 2, 5, 6, 7}
>>> a.symmetric_difference(b)
{1, 2, 5, 6, 7}

使用<= 或者issubset() 可以判断一个集合是否是另一个集合的子集（第一个集合的所有元素都出现在第二个集合中）：

>>> a
{1, 2, 3, 4}
>>> b
{3, 4, 5, 6, 7}
>>> c
{1, 2, 3}
>>> a<=b
False
>>> c<=b
False
>>> c<=a
True
>>> c.issubset(a)
True

当第二个集合包含所有第一个集合的元素，且仍包含其他元素时，我们称第一个集合为第二个集合的真子集。使用< 可以进行判断

超集与子集正好相反（第二个集合的所有元素都出现在第一个集合中），使用>= 或者issuperset() 可以进行判断：

>>> a
{1, 2, 3, 4}
>>> b
{3, 4, 5, 6, 7}
>>> c
{1, 2, 3}
>>> b>=a
False
>>> a>=c
True
>>> a.issuperset(c)
True
>>> a.issuperset(a)
True

数据格式化

数据格式化的内容点这里。