union

我理解,就是将RDD指定的RDD进行合并。 同时保留合并数据的分区。

val data1 = sc.parallelize(1 to 20)
data1.partitions.length
val data2 = sc.parallelize(25 to 30)
data2.partitions.length

val data3 = data1.union(data2)
data3.partitions.length
data3.collect

data1和data2的分区数量都是默认值,即2

union后data3的分区数量是4。

data3.collect输出结果是

Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29, 30)

原文地址:https://www.cnblogs.com/hark0623/p/4494856.html