Python使用

matplotlib.pyplot as plt

该module是用来作图的，有两个大的对象，一个是figure，一个是subplot，前者是画布，后者是在画布上作的图，一个画布可以画多个图，实际作图时应用的对象是图，关于图的配置常用属性有：

颜色：plt.plot(color=)
标记：plt.plot(marker = )
线型：plt.plot(linestyle = )
线宽：plt.plot(linewidth = )
刻度：plt.xlim([])
图例：plt.ployt(label = ) 和plt.legend(loc = 0)，其中legend表示图例的位置，一般选择0表示最好的位置
图片保存：plt.save(fname='name.png')

plt.plot()图形每运行一次相当于是在原有图形上加上一个新的线图，如果是想每个图都是单独的一个

函数是否可调用

max是python内置函数，用于比较数值大小，但是一旦max被定义为其他值时，max函数的属性就失效，比如:

max = max(binDS['bin']) #最大的bin值，也是最大分箱值，作用是用来产生新的分箱值
#max = 1 再次调用max的时候显示TypeError: 'int' object is not callable
callable(max) #False

Python时间处理module--datetime

时间处理常见三种形式：

字符串格式转化为时间格式
时间转化为字符串
计算不同时间之间相隔的天数

datetime共有5个类，分别是：

datetime.date、datetime.datetime、datetime.time、datetime.timedelta、datetime.tzinfo，其中datetime.date和datetime.datetime是比较常用的。

>>> test = "2018-01-03"
>>> type(test)
<class 'str'>
>>> from datetime import *
>>> test1 = datetime.strptime(test,"%Y-%m-%d")
>>> type(test1)
<class 'datetime.datetime'>
>>> test1
datetime.datetime(2018, 1, 3, 0, 0)
>>> print(test1)
2018-01-03 00:00:00
>>> test1.date()
datetime.date(2018, 1, 3)
>>> test3 = date(year = 2019,month = 4,day = 5)
>>> test4 = test1.date()

>>> test5 = test4-test3
>>> test5
datetime.timedelta(days=-457)

需要注意datetime.date没有strptime函数，因此在将字符串转换为日期时，需要先利用datetime.datetime.strptime转换为日期时间型，再利用datetime.datetime.date()函数转换为日期格式。datetime.timedelta只有days对象，没有其他

np.where条件

用法一：np.where(condition) 返回的是满足条件的下标，注意不是index

数据框dataframe as df如下：
   name  score
0    x      1
2    z      1
4    y      1
6    x      3

np.where(df['score']<3)[0]
结果是array([0, 1, 2])
而不是array([0,2,3)]

用法二：np.where(condition = True,Y,N)即如果条件为真，返回Y，否则返回N

Series如下
0    1.0
1    2.0
2    3.0
3    NaN

np.where(Series>1,'123','321')其结果是
array([321, 123, 123, 321])

np.NaN和None二者的比较

相同点是二者在python中都是用来表示空值，在某些操作中NoneType会被替换成float类型的np.nan，原因是None无法参与numpy的一些计算，而pandas的底层又依赖于numpy，因此做了一些变换

test = pd.Series([1,None,np.nan])
test.isnull()的结果是：
0    False
1     True
2     True

test.fillna(0)的结果是：
0    a
1    0
2    0

不同点比较多，重要的有两点：

二者数据类型不同，type(None) = NoneType，而type(np.nan) = float，因此np.nan可以参与数据的比较(虽然都是False)，而None会报错

>>> np.nan > 1
False
>>> np.nan < 1
False
>>> np.nan+1
nan
>>> None > 1
Traceback (most recent call last):
  File "<pyshell#69>", line 1, in <module>
    None > 1
TypeError: '>' not supported between instances of 'NoneType' and 'int'

另外等值判断不同

>>> np.nan == np.nan
False
>>> np.nan is np.nan
True
>>> None == None
True
>>> None is None
True