Skip to content

AaronCoquet-Easypark/dbt-snowflake-datops-materilizations

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt-snowflake-datops-materilizations

Repo contains the materializations for Data Engineers DataOps Framework

Conatins the following materializations for Snowflake:

  • File Format
  • Stages
  • Stored Procedures
  • Tasks
  • Streams
  • Tables
  • Materialised View
  • Generic

Adds the ability to create the raw tables based on the yml file

Stored Procedures

Usage

{{ 
    config(materialized='stored_procedure',
    preferred_language = 'sql',
    override_name = 'SAMPLE_STORE_PROC',
    parameters = 'status varchar',
    return_type = 'NUMBER(38, 0)')
}}
property description required default
materialized specifies the type of materialisation to run yes stored_procedure
preferred_language describes the language the stored procedure is written in no sql
override_name specifies the name of the stored procedure if this is an overrider stored procedure no model['alias']
parameters specifes the parameters that needs to be passed when calling the stored procedure no
return_type specifies the stored procedure return type no varchar

File Formats

Usage

{{
    config(materialized='file_format')
}}
property description required default
materialized specifies the type of materialisation to run yes file_format
preferred_language describes the language the function is written in no sql

View Snowflake create file format documentation for more information on the available options.

example

{{ config(materialized='file_format') }}

    type = json
    null_if = ()
    compression = none
    ignore_utf8_errors = true

Tasks

Usage

{{ 
    config(materialized='task',
    is_serverless = true,
    warehouse_name_or_size = 'xsmall',
    schedule = 'using cron */2 6-20 * * * Pacific/Auckland',
    stream_name = 'stm_orders',
    is_enabled = false)
 }}
property description required default
materialized specifies the type of materialisation to run yes task
is_serverless specifies if the warehouse should be serverless or dedicated no true
warehouse_name_or_size specifies the warehouse size if serverless otherwise the name of the warehouse to use no xsmall
schedule specifies the schedule which the task should be run on using CRON expressions no *
task_after specifies the task which this task should be run after no *
stream_name specifies the stream which the task should run only if there is data available no
is_enabled specifies if the task should be enabled or disabled at the end of the run yes true
  • only one of schedule or task_after is required.

Streams

Usage

{{
    config(materialized='stream',
    source_schema='sales',
    source_model='raw_orders')
}}
property description required default
materialized specifies the type of materialisation to run yes stream
source_schema specifies the source table or view schema if different to the current location yes
source_model specifies the source table or view model name to add the stream to yes

Tables

Usage

    tables:
      - name: raw_customers
        description: Customer Information
        external:
          auto_create_table: true
property description required default
auto_create_table specifies if the table should be maintianed by dbt or not yes false
  • it's recommended that a separate stream object is created instead of setting up the stream via the table object as the stream doesn't appear on the DAG when created via this method, nor can you reference it using the ref macro.

To action the auto-creation of the tables, you need to add the following pre-hook

on-run-start:
  - "{{ dbt_dataengineers_materilizations.stage_table_sources() }}"

File Formats

Usage

{{
    config(materialized='file_format')
}}
property description required default
materialized specifies the type of materialisation to run yes file_format

View Snowflake create file format documentation for more information on the available options.

example

{{ config(materialized='file_format') }}

    type = json
    null_if = ()
    compression = none
    ignore_utf8_errors = true

Stages

{{
    config(materialized='stage')
}}
property description required default
materialized specifies the type of materialisation to run yes stage

View Snowflake create stage documentation for more information on the available options.

To action the auto-creation of the stages before the tables get created, you need to add the following pre-hook before the stage_table_sources pre-hook.

on-run-start:
  - "{{ dbt_dataengineers_materilizations.stage_stages() }}"

Storage Integrations need to be maintained separately as you require Create integration privilage on the role you are using to set those up and they are global to snowflake instead of per database.

example

{{ config(materialized='stage') }}

{% if target.name == 'prod' %}
  url='azure://xxxxxxprod.blob.core.windows.net/external-tables'
{% elif target.name == 'test' %}
  url='azure://xxxxxxtest.blob.core.windows.net/external-tables'
{% elif target.name == 'dev' %}
  url='azure://xxxxxxdev.blob.core.windows.net/external-tables'
{% else %}
  url='azure://xxxxxxsandbox.blob.core.windows.net/external-tables'
{% endif %} 
  storage_integration = DATAOPS_TEMPLATE_EXTERNAL

Generic

Usage

{{
    config(materialized='generic')
}}
property description required default
materialized specifies the type of materialisation to run yes generic

example

{{ config(materialized='generic') }}

CREATE OR REPLACE api integration SnowWatch_Prod_API
    api_provider = azure_api_management
    azure_tenant_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
    azure_ad_application_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
    api_allowed_prefixes = ('https://de-snowwatch-prod-ae-apim.azure-api.net')
    API_KEY = 'xxxxxxxxxxxxxxxxxxxx'
    enabled = true; 

User Defined Functions

When creating a user defined function, you can use a number of different languages. The following are the supported languages:

SQL

To create a user defined function using SQL, you need to add the following config to the top of your model:

{{ 
    config(materialized='user_defined_function',
    preferred_language = 'sql',
    is_secure = false,
    immutable = false,
    return_type = 'float')
}}
property description required default
materialized specifies the type of materialisation to run yes user_defined_function
preferred_language specifies the landuage for the UDF function no sql
is_secure specifies the function whether it is secure or not? no false
immutable specifies the function is mutable or immutable no false
return_type specifies the datatype for the return value yes
parameters specifies the parameter for the function no

Parameters

Parameters are placed into the template with no parsing. To include multiple parameters, use the syntax:

{{ 
    config(materialized='user_defined_function',
    preferred_language = 'sql',
    is_secure = false,
    immutable = false,
    return_type = 'float'
    parameters = 'first int, next float, last varchar')
}}

... Which is to say: provide all parameters as a single string enclosed in quotes. Use the same format as you would for native SQL.

Javascript

To create a user defined function using Javascript, you need to add the following config to the top of your model:

{{ 
    config(materialized='user_defined_function',
    preferred_language = 'javascript',
    is_secure = True,
    immutable = false,
    return_type = 'float')
}}
property description required default
materialized specifies the type of materialisation to run yes user_defined_function
preferred_language specifies the landuage for the UDF function yes javascript
is_secure specifies the function whether it is secure or not? no false
immutable specifies the function is mutable or immutable no false
return_type specifies the datatype for the return value yes
parameters specifies the parameter for the function no

Java

To create a user defined function using Java, you need to add the following config to the top of your model:

{{ 
    config(materialized='user_defined_function',
    preferred_language = 'java',
    is_secure = false,
    handler_name = "'testfunction.echoVarchar'",
    target_path = "'@~/testfunction.jar'",
    return_type = 'varchar',
    parameters = 'my_string varchar')
}}
property description required default
materialized specifies the type of materialisation to run yes user_defined_function
preferred_language specifies the landuage for the UDF function yes java
is_secure specifies the function whether it is secure or not? no false
immutable specifies the function is mutable or immutable no false
handler_name specifies the combination of class and the function name yes
target_path specifies the path for the jar file yes
return_type specifies the datatype for the return value yes
parameters specifies the parameter for the function no

Python

To create a user defined function using Python, you need to add the following config to the top of your model:

{{ 
    config(materialized='user_defined_function',
    preferred_language = 'python',
    is_secure= false,
    immutable=false,
    runtime_version = '3.8',
    packages = "('numpy','pandas','xgboost==1.5.0')",
    handler_name = 'udf',
    return_type = 'variant')
}}
property description required default
materialized specifies the type of materialisation to run yes user_defined_function
preferred_language specifies the landuage for the UDF function yes python
is_secure specifies the function whether it is secure or not? no false
immutable specifies the function is mutable or immutable no false
return_type specifies the datatype for the return value yes
parameters specifies the parameter for the function no
runtime_version specifies the version of python yes
packages specifies the packages required for the python function yes
handler_name specifies the handler name for the function yes

Materialized View

To create a Materialized View, you need to add the following config to the top of your model:

{{ 
    config(materialized='materialized_view',
    secure = false,
    cluster_by="<<your list of fields>>",
    automatic_clustering = false)
}}
property description required default
materialized specifies the type of materialisation to run yes materialized_view
secure specifies that the view is secure. no false
cluster_by specifies an expression on which to cluster the materialized view. no none
automatic_clustering specifies if reclustering of the materialized view is automatically resumed no false

Supported model configs: secure, cluster_by, automatic_clustering, persist_docs (relation only)

Snowflake Documentation for Materialized Views

❗ Note: Snowflake MVs are only enabled on enterprise accounts

❗ Although Snowflake does not have drop ... cascade, if the base table table of a MV is dropped and recreated, the MV also needs to be dropped and recreated, otherwise the following error will appear:

Failure during expansion of view 'TEST_MV': SQL compilation error: Materialized View TEST_MV is invalid.

About

Repo contains the materializations for Data Engineers DataOps Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PLpgSQL 100.0%