ALINK(二十八):特征工程(七)特征组合与交叉(二)Cross特征预测/训练 (CrossFeaturePredictBatchOp)

Cross特征预测 (CrossFeaturePredictBatchOp)

Java 类名:com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp

Python 类名:CrossFeaturePredictBatchOp

功能介绍

特征列组合算法能够将选定的离散列组合成单列的向量类型的数据。

参数说明

名称

中文名称

描述

类型

是否必须?

默认值

outputCol

输出结果列列名

输出结果列列名,必选

String

 

numThreads

组件多线程线程个数

组件多线程线程个数

Integer

 

1

modelStreamFilePath

模型流的文件路径

模型流的文件路径

String

 

null

modelStreamScanInterval

扫描模型路径的时间间隔

描模型路径的时间间隔,单位秒

Integer

 

10

modelStreamStartTime

模型流的起始时间

模型流的起始时间。默认从当前时刻开始读。使用yyyy-mm-dd hh:mm:ss.fffffffff格式,详见Timestamp.valueOf(String s)

String

 

null

代码示例

Python 代码

from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
["1.0", "1.0", 1.0, 1],
["1.0", "1.0", 0.0, 1],
["1.0", "0.0", 1.0, 1],
["1.0", "0.0", 1.0, 1],
["2.0", "3.0", None, 0],
["2.0", "3.0", 1.0, 0],
["0.0", "1.0", 2.0, 0],
["0.0", "1.0", 1.0, 0]])
data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint")
train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data)
CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

Java 代码

import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class CrossFeaturePredictBatchOpTest {
  @Test
  public void testCrossFeaturePredictBatchOp() throws Exception {
    List <Row> df = Arrays.asList(
      Row.of("1.0", "1.0", 1.0, 1),
      Row.of("1.0", "1.0", 0.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("2.0", "3.0", null, 0),
      Row.of("2.0", "3.0", 1.0, 0),
      Row.of("0.0", "1.0", 2.0, 0)
    );
    BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
    BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
    new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
  }
}

运行结果

f0

f1

f2

label

cross

1.0

1.0

1.0000

1

$36$0:1.0

1.0

1.0

0.0000

1

$36$9:1.0

1.0

0.0

1.0000

1

$36$6:1.0

1.0

0.0

1.0000

1

$36$6:1.0

2.0

3.0

null

0

$36$22:1.0

2.0

3.0

1.0000

0

$36$4:1.0

0.0

1.0

2.0000

0

$36$29:1.0

0.0

1.0

1.0000

0

$36$2:1.0

Cross特征训练 (CrossFeatureTrainBatchOp)

Java 类名:com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp

Python 类名:CrossFeatureTrainBatchOp

功能介绍

特征列组合算法能够将选定的离散列组合成单列的向量类型的数据。

参数说明

名称

中文名称

描述

类型

是否必须?

默认值

selectedCols

选择的列名

计算列对应的列名列表

String[]

 

代码示例

Python 代码

from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
["1.0", "1.0", 1.0, 1],
["1.0", "1.0", 0.0, 1],
["1.0", "0.0", 1.0, 1],
["1.0", "0.0", 1.0, 1],
["2.0", "3.0", None, 0],
["2.0", "3.0", 1.0, 0],
["0.0", "1.0", 2.0, 0],
["0.0", "1.0", 1.0, 0]])
data = BatchOperator.fromDataframe(df, schemaStr="f0 string, f1 string, f2 double, label bigint")
train = CrossFeatureTrainBatchOp().setSelectedCols(['f0','f1','f2']).linkFrom(data)
CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).collectToDataFrame()

Java 代码

import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.feature.CrossFeaturePredictBatchOp;
import com.alibaba.alink.operator.batch.feature.CrossFeatureTrainBatchOp;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class CrossFeatureTrainBatchOpTest {
  @Test
  public void testCrossFeatureTrainBatchOp() throws Exception {
    List <Row> df = Arrays.asList(
      Row.of("1.0", "1.0", 1.0, 1),
      Row.of("1.0", "1.0", 0.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("1.0", "0.0", 1.0, 1),
      Row.of("2.0", "3.0", null, 0),
      Row.of("2.0", "3.0", 1.0, 0),
      Row.of("0.0", "1.0", 2.0, 0)
    );
    BatchOperator <?> data = new MemSourceBatchOp(df, "f0 string, f1 string, f2 double, label int");
    BatchOperator <?> train = new CrossFeatureTrainBatchOp().setSelectedCols("f0", "f1", "f2").linkFrom(data);
    new CrossFeaturePredictBatchOp().setOutputCol("cross").linkFrom(train, data).print();
  }
}

运行结果

f0

f1

f2

label

cross

1.0

1.0

1.0000

1

$36$0:1.0

1.0

1.0

0.0000

1

$36$9:1.0

1.0

0.0

1.0000

1

$36$6:1.0

1.0

0.0

1.0000

1

$36$6:1.0

2.0

3.0

null

0

$36$22:1.0

2.0

3.0

1.0000

0

$36$4:1.0

0.0

1.0

2.0000

0

$36$29:1.0

0.0

1.0

1.0000

0

$36$2:1.0

原文地址:https://www.cnblogs.com/qiu-hua/p/14901491.html