Python学习笔记：实现SQL中case when构建新列功能

要实现一个类似于 SQL 中的 case when 功能，为用户打上标签。

例如：

select tj_month,
       name,      
       online_time,
       case when online_time < 12 then '(3,12]'
                 when online_time >= 12 and online_time < 24 then '[12,24)'
                 when online_time >= 24 and online_time < 36 then '[24,36)'
                 when online_time >= 36 and online_time < 48 then '[36,48)'
                 when online_time >= 48 and online_time < 60 then '[48,60)'
            else '>60' end as online_time_cut
from table_name
where tj_month = '202106';

一、利用pandas.DataFrame.loc直接筛选

构造测试数据框。

import numpy as np
import pandas as pd
data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
df = pd.DataFrame(data=data, columns=['a', 'b'])
'''
	a	b
0	NaN	0.0
1	2.0	0.0
2	NaN	1.0
'''

直接筛选符合条件数据进行打标。

# 此方法已不推荐 不支持 建议使用loc/iloc定位
df[(df['a'].isnull()) & (df['b'] == 0)]['c'] = 1

# loc定位
df['c'] = 0
df.loc[(df['a'].isnull()) & (df['b'] == 0), 'c'] = 1
'''

    a	b	c
0	NaN	0.0	1.0
1	2.0	0.0	NaN
2	NaN	1.0	NaN
'''

二、利用np.where筛选

# 满足条件 输出x 否则输出y
np.where(condition, x, y)

np.where(df.isnull(), 100, 5)
'''
array([[100,   5],
       [  5,   5],
       [100,   5]])
'''

# 打标签
df['c'] = np.where((df['a'].isnull()) & (df['b'] == 0), 1, 0)

One more嵌套判断的例子：

df['class'] = np.where(df['score'].between(0, 60, inclusive=False), '不及格',
                      np.where(df['score'].between(60, 80, inclusive=True), '良好', '优秀'))

三、利用np.select筛选

np.select 函数可以根据某些条件筛选某些元素，使用语法为：

np.select(condition_list, choice_list, default=0)
# 条件列表、执行操作列表、缺失值
# 返回列表

实操：

df['c'] = np.select([(df['a'].isnull()) & (df['b'] == 0),
                    (df['a'].isnull()) & (df['b'] == 1),
                    (df['a'] == 2) & (df['b'] == 0)],
                    ['one', 'two', 'three'],
                    default = 'XXX')
'''
	a	b	c
0	NaN	0.0	one
1	2.0	0.0	three
2	NaN	1.0	two
'''

四、利用apply函数与if语句

apply 应用在 dataframe 上，用于对行或者列进行计算。

axis=1 指定按行计算
lambda匿名函数判断满足条件为1，不满足为0

df['c'] = df.apply(lambda x: 1 if np.isnan(x[0]) and x[1] == 0 else 0, axis=1)
df
'''
	a	b	c
0	NaN	0.0	1
1	2.0	0.0	0
2	NaN	1.0	0
'''

另外一个简单的例子：

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,11,size=(1000000,5)), columns=('a','b','c','d','e'))

def func(a,b,c,d,e):
    if e == 10:
        return c*d
    elif (e < 10) and (e >= 5):
        return c+d
    elif e < 5:
        return a+b

df['new'] = df.apply(lambda x: func(x['a'], x['b'], x['c'], x['d'], x['e']), axis=1)
df
'''
	a	b	c	d	e	new
0	2	0	5	7	5	12
1	9	3	3	0	2	12
2	2	0	9	10	3	2
3	5	8	3	8	9	11
4	1	10	0	2	0	11
'''

# 例子
def function(x):
    if x['数学'] != 0:
        s = x['语文']/x['数学']
    else:
        s = 0
    return s

data['result'] = data.apply(lambda x: function(x), axis=1)
data

参考链接：Pandas等价于创建新变量的SQL case when语句

参考链接：在 pandas 中如何实现 sql 查询中 case when then end 的功能？

参考链接：pandas apply处理两个参参数如何写case when then else end