spark sql 性能调优

用户级调优主要手段:

https://www.jianshu.com/p/048aa1cac43c

资源调优:

https://www.jianshu.com/p/c853997ea1f6

参数调优:

https://blog.csdn.net/yuanbingze/article/details/97368552

数据本地性:

https://blog.csdn.net/zy_zhengyang/article/details/78714346

推测执行:

https://blog.csdn.net/wangpei1949/article/details/88927332

 官网 调优 tips:

https://spark.apache.org/docs/3.0.0-preview/sql-performance-tuning.html  

https://spark.apache.org/docs/3.0.0-preview/tuning.html

databrick 视频 :

https://databricks.com/session/scalable-monitoring-using-prometheus-with-apache-spark-clusters

调优相关书籍:

https://github.com/vaquarkhan/vaquarkhan/blob/master/high-performance-spark.pdf

spark 内存架构及管理:

https://www.jianshu.com/p/02fca6460c37

https://www.jianshu.com/p/395fc098eedf

spark 性能分析 rest API: 

https://spark.apache.org/docs/latest/monitoring.html#rest-api

spark 性能分析 监测工具:

https://github.com/netdata/netdata/issues/4853

spark 的一些常识:

https://zhuanlan.zhihu.com/p/76518708

https://www.jianshu.com/p/330ec1347423 (集群架构)



# Deep Dive into Spark SQL with Advanced Performance Tuning
refer to `https://databricks.com/session/scalable-monitoring-using-prometheus-with-apache-spark-clusters`


---
### This video talks
- API selection
- optimize the meta catalog
- cache manager
- whole stage code generation
- data sources (eg.parquet vectorized)
- partitioning and bucketing (avoid shuffle)(http://dbricks.co/2oG6ZBL)

---

---

### Databricks official optimization plan

- Catalyst optimization phase (https://databricks.com/glossary/catalyst-optimizer https://databricks.com/session/a-deep-dive-into-the-catalyst-optimizer https://databricks.com/session/a-deep-dive-into-the-catalyst-optimizer-hands-on-lab?utm_campaign=Spark%20Summit%20EU%202016&utm_content=34985851&utm_medium=social&utm_source=twitter)
- rule
- strategy (eg.use HINT)
- etc
- Tungsten Execution phase (https://databricks.com/glossary/tungsten)
- Memory Management and Binary Processing
- Cache-aware computation
- Code generation
- No virtual function dispatches
- Intermediate data in memory vs CPU registers
- Loop unrolling and SIMD
---

原文地址:https://www.cnblogs.com/mangoczp/p/12518030.html