python3 + greenplum-spark-connector访问greenplum

我今天下载了greenplum-spark-connector
官方介绍用的是java调用访问(https://greenplum.cn/2020/03/27/greenplum-spark-connector/),

我用python试了一下也是可以的:

import os
from pyspark.sql import SparkSession
# 指定运行的python版本,可以在环境中配置
os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/CDH/lib/spark" #-6.2.0-1.cdh6.2.0.p0.967373
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3"

spark = SparkSession.builder.appName('local').getOrCreate()
# 此时需要将greenplum-spark_2.11-1.6.2.jar驱动放到每个节点的/opt/cloudera/parcels/CDH/lib/spark/jars
url = 'jdbc:postgresql://192.168.1.214:5432/xxgl'
table = 'users'
properties = {"user":"gpadmin","password":"11111111"}
df = spark.read.jdbc(url, table, properties=properties)
df.show()

原文地址:https://www.cnblogs.com/zsfishman/p/12587242.html