- 如何计算两个series之间的欧氏距离
p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1]) \# 方法1 sum((p - q)**2)**.5 \# 方法2 np.linalg.norm(p-q) #> 18.16590212458495
- 如何在数值series中找局部最大值
局部最大值对应二阶导局部最小值ser = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3]) \# 二阶导 dd = np.diff(np.sign(np.diff(ser))) \# 二阶导的最小值对应的值为最大值,返回最大值的索引 peak_locs = np.where(dd == -2)[0] + 1 peak_locs #> array([1, 5, 7], dtype=int64)
- 如何用最少出现的字符替换空格符
my_str = 'dbc deb abed gade' # 方法 ser = pd.Series(list('dbc deb abed gade')) # 统计元素的频数 freq = ser.value_counts() print(freq) # 求最小频数的字符 least_freq = freq.dropna().index[-1] # 替换 "".join(ser.replace(' ', least_freq)) #> d 4 3 b 3 e 3 a 2 c 1 g 1 dtype: int64 #> 'dbcgdebgabedggade'
27如何计算数值series的自相关系数
ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20)) # 求series的自相关系数,i为偏移量 autocorrelations = [ser.autocorr(i).round(2) for i in range(11)] print(autocorrelations[1:]) # 选择最大的偏移量 print('Lag having highest correlation: ', np.argmax(np.abs(autocorrelations[1:]))+1) #> [0.33, 0.41, 0.48, 0.01, 0.21, 0.16, -0.11, 0.05, 0.34, -0.24] #> Lag having highest correlation: 3
- 如何对series进行算术运算操作
# 如何对series之间进行算法运算 import pandas as pd series1 = pd.Series([3,4,4,4],['index1','index2','index3','index4']) series2 = pd.Series([2,2,2,2],['index1','index2','index33','index44']) # 加法 series_add = series1 + series2 print(series_add) # 减法 series_minus = series1 - series2 # series_minus # 乘法 series_multi = series1 * series2 # series_multi # 除法 series_div = series1/series2 series_div series是基于索引进行算数运算操作的,pandas会根据索引对数据进行运算,若series之间有不同的索引,对应的值就为Nan。结果如下: #加法: index1 5.0 index2 6.0 index3 NaN index33 NaN index4 NaN index44 NaN dtype: float64 #除法: index1 1.5 index2 2.0 index3 NaN index33 NaN index4 NaN index44 NaN dtype: float64