Spark快速入门(一)

spark快速入门

一个简单的例子:

代码:

from pyspark.sql import SparkSession
logFile = "G:\spark\Spark\spark-2.2.0-bin-hadoop2.7\README.md"
spark=SparkSession.builder.appName('hello').master('local[2]').getOrCreate()
#(1)appName 为名称 (2)master  local[2]为本地调用2个线程
logData = spark.read.text(logFile).cache()
numAs = logData.filter(logData.value.contains('a')).count()
print(numAs)
# 61
numBs = logData.filter(logData.value.contains('b')).count()
print(numBs)


截图:  

可以进入SparkUI 地址:默认为 localhost:4040

原文地址:https://www.cnblogs.com/BigStupid/p/8399222.html