缓存Cache

1.spark的缓存级别参照【org.apache.spark.storage.StorageLevel.scala】
  new StorageLevel(_useDisk,_useMemory, _useOffHeap,_deserialized,_replication: Int = 1)
    val NONE = new StorageLevel(false, false, false, false)
    val DISK_ONLY = new StorageLevel(true, false, false, false)
    val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)
    val MEMORY_ONLY = new StorageLevel(false, true, false, true)
    val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)
    val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)
    val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)
    val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)
    val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)
    val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)
    val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)
    val OFF_HEAP = new StorageLevel(true, true, true, false, 1)
  默认缓存级别:def persist(): this.type = persist(StorageLevel.MEMORY_ONLY),
  默认情况下persist() 会把数据以反序列化的形式缓存在JVM的堆空间中。
  取消缓存,执行RDD.unpersist()
  设置RDD的缓存级别,执行RDD.persist(StorageLevel.MEMORY_AND_DISK)

原文地址:https://www.cnblogs.com/lyr999736/p/9562425.html