spark的那些坑

申明:所有环境均在本地

<spark-streaming>

1. 在本地运行读取kafka的时候

  spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data,
  otherwise Spark jobs will not get resources to process the received data.  
 
2. action的执行时间:  requirement failed: No output operations registered, so nothing to execute
   只有包含action方法才会被真正执行,执行方式懒加载.具体有reduce(),collect(),count(),first(),take()
   saveAsTextFile(path),foreach(),countByKey()等...

3. 如果从kafka读不到消息,则不会处理kafkaStream相关的方法.直接进入下一步.
 
4. 读取kafka消息的两种方法:
    @1 Receiver-based Approach  通过KafkaUtils.createStream().不能控制处理消息的并行度.only one receiver.
    @2 Direct Approach              通过KafkaUtils.createDirectStream()创建.好处Simplified Parallelism(提供消息处理并行度)
今天暂时先到这儿...
 
   
原文地址:https://www.cnblogs.com/yimapingchuan/p/5381696.html