featuretools实践

空值衍生得到的还是空值

代码:

import featuretools as ft
import pandas as pd

df = pd.DataFrame(data={"x1": [None,2,3], 'x2': [4, 5, 6]})

es = ft.EntitySet(id='es_hypernets_fit')
es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df,  make_index=True, index='e_hypernets_ft_index')
feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
                                      ignore_variables={"e_hypernets_ft": []},
                                      return_variable_types="all",
                                      trans_primitives=['add_numeric', 'subtract_numeric'],
                                      max_depth=1,
                                      features_only=False,
                                      max_features=-1)
print(feature_matrix)

输出:

                       x1  x2  x1 + x2  x1 - x2
e_hypernets_ft_index                           
0                     NaN   4      NaN      NaN
1                     2.0   5      7.0     -3.0
2                     3.0   6      9.0     -3.0

只要跟空值有关的衍生列,都是NaN,建议在衍生前对空值进行填充

衍生可能产生异常值

对于n/0这种情况,会得到inf

import featuretools as ft
import pandas as pd

df = pd.DataFrame(data={"x1": [1,2,3], 'x2': [0, 5, 6]})

es = ft.EntitySet(id='es_hypernets_fit')
es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df,  make_index=True, index='e_hypernets_ft_index')
feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
                                      ignore_variables={"e_hypernets_ft": []},
                                      return_variable_types="all",
                                      trans_primitives=['divide_numeric'],
                                      max_depth=1,
                                      features_only=False,
                                      max_features=-1)
print(feature_matrix)

结果为:

                      x1  x2  x1 / x2  x2 / x1
e_hypernets_ft_index                          
0                      1   0      inf      0.0
1                      2   5      0.4      2.5
2                      3   6      0.5      2.0

如果算法无法处理极大/小值,建议在衍生之后进行替换。

实验在featuretools ==0.18.1上进行。

原文地址:https://www.cnblogs.com/oaks/p/13565835.html