SPARK共享变量:广播变量和累加器

Shared Variables

Spark does provide two limited types of shared variables for two common usage patterns: broadcast variables and accumulators.

 Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. 

Broadcast variables are created from a variable v by calling SparkContext.broadcast(v). The broadcast variable is a wrapper around v, and its value can be accessed by calling the value method.    

 val broadcastVar sc.broadcast(Array(123))

Accumulators are variables that are only “added” to through an associative and commutative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types.

scala> val accnum=sc.longAccumulator("ggg")
accnum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 0)

scala> sc.parallelize(Array(1,2,3,4,5)).foreach(x=>accnum.add(x))

scala> accnum
res14: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 15)

 累加器(accumulator)与广播变量(broadcast variable)。累加器用来对信息进行聚合,而广播变量用来高效分发较大的对象

原文地址:https://www.cnblogs.com/playforever/p/9408109.html