dbt seed 以及base ephemeral使用

seed 可以方便的进行数据的导入,可以方便的进行不变数据(少量)以及测试数据的导入,
base 设置为 ephemeral(暂态),这个同时也是官方最佳实践的建议

项目依赖的gitlab 数据可以参考https://github.com/rongfengliang/graphql-engine-gitlab

参考项目

  • 初始化
dbt init  gitlab-data
  • 配置项目
# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'gitlab'
version: '1.0'

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"] 
test-paths: ["tests"]
data-paths: ["data"] #  可以放seed 数据
macro-paths: ["macros"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
    - "target"
    - "dbt_modules"

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!

# In this example config, we tell dbt to build all models in the example/ directory
# as views (the default). Try changing `view` to `table` below, then re-running dbt
models:
  gitlab:
      gitlab:
        base:
          materialized: ephemeral  # base 建议配置为ephemeral
  • 模型添加
model/gitlab/base/gitlab_projectinfo.sql:
select * from projects

model/gitlab/transform/gitlab_project_counts.sql:
select * from {{ref('gitlab_projectinfo')}}

profile 配置

~/.dbt/profiles.yml
default:
  target: dev
  outputs:
    dev:
      type: postgres
      host: 127.0.0.1
      user: postgres
      pass: password
      port: 5432
      dbname: gitlabhq_production
      schema: public
      threads: 3
pg:
  target: dev
  outputs:
    dev:
      type: postgres
      host: 127.0.0.1
      user: postgres
      pass: password
      port: 5433
      dbname: gitlabhq_production
      schema: public
      threads: 3

运行&&测试&&文档

  • 运行
dbt run && dbt seed --show && dbt docs generate && dbt docs serve
  • 效果


参考资料

https://github.com/rongfengliang/graphql-engine-gitlab
https://docs.getdbt.com/docs/configuring-models
https://docs.getdbt.com/docs/best-practices
https://docs.getdbt.com/reference#seed

原文地址:https://www.cnblogs.com/rongfengliang/p/9828675.html