apache spark kubernets 部署试用

spark 是一个不错的平台,支持rdd 分析stream 机器学习。。。
以下为使用kubernetes 部署的说明,以及注意的地方

具体的容器镜像使用别人已经构建好的

deploy yaml 文件

deploy-k8s.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:  
  name: spark-master
  namespace: big-data
  labels:
    app: spark-master
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: spark-master
    spec:
      containers:
      - name: spark-master
        image: bde2020/spark-master:2.3.1-hadoop2.7
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 7077
        - containerPort: 8080
        env:
        - name: ENABLE_INIT_DAEMON
          value: "false"
        - name: SPARK_MASTER_PORT
          value: "7077"

---

apiVersion: v1
kind: Service
metadata:
  name: spark-master-service
  namespace: big-data
spec:
  type: NodePort
  ports:
    - port: 7077
      targetPort: 7077
      protocol: TCP
      name: master
  selector:
    app: spark-master

---


apiVersion: v1
kind: Service
metadata:
  name: spark-webui-service
  namespace: big-data
spec:
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
      name: ui
  selector:
    app: spark-master
  type: NodePort


---

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: spark-webui-ingress
  namespace: big-data
spec:
  rules:
  - host: spark-webui.data.com
    http:
      paths:
      - backend:
          serviceName: spark-webui-service
          servicePort: 8080
        path: /

---


apiVersion: extensions/v1beta1
kind: Deployment
metadata:  
  name: spark-worker
  namespace: big-data
  labels:
    app: spark-worker
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: spark-worker
    spec:
      containers:
      - name: spark-worker
        image: bde2020/spark-worker:2.3.1-hadoop2.7
        imagePullPolicy: IfNotPresent
        env:
        - name: SPARK_MASTER
          value: spark://spark-master-service:7077
        - name: ENABLE_INIT_DAEMON
          value: "false"
        - name: SPARK_WORKER_WEBUI_PORT
          value: "8081"
        ports:
        - containerPort: 8081

---

apiVersion: v1
kind: Service
metadata:
  name: spark-worker-service
  namespace: big-data
spec:
  type: NodePort
  ports:
    - port: 8081
      targetPort: 8081
      protocol: TCP
      name: worker
  selector:
    app: spark-worker

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: spark-worker-ingress
  namespace: big-data
spec:
  rules:
  - host: spark-worker.data.com
    http:
      paths:
      - backend:
          serviceName: spark-worker-service
          servicePort: 8081
        path: /

部署&&运行

  • 部署
kubectl apply -f deploy-k8s.yaml
  • 效果

    使用ingress 访问,访问域名 spark-webui.data.com


说明

  • 命名的问题
平时的习惯是deploy service 命名为一样的,但是就是这个就有问题的,因为k8s 默认会进行环境变量的注入,所以居然冲突的。
解决方法,修改名称,重新发布
具体问题:
dockerfile 中的以下环境变量
ENV SPARK_MASTER_PORT 7077
  • spark 任务运行
具体的运行可以参考官方demo,后期也会添加

参考资料

https://github.com/rongfengliang/spark-k8s-deploy
https://github.com/big-data-europe/docker-spark

原文地址:https://www.cnblogs.com/rongfengliang/p/9560329.html