DBT Testing: Ensuring Data Quality and Reliability
Data tests
Built-in tests that can be applied across different models:
models:
- name: orders
columns:
- name: order_id
tests:
- unique # Ensures column values are unique
- not_null # Ensures no null values
- name: amount
tests:
- accepted_values:
values: ['>0'] # Custom value validation
- name: customer_id
tests:
- relationships:
to: ref('customers') # Referential integrity test
field: customer_id
Unit tests
- Model
with customers as (
select * from {{ ref('stg_customers') }}
),
accepted_email_domains as (
select * from {{ ref('top_level_email_domains') }}
),
check_valid_emails as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customers.email,
coalesce (regexp_like(
customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
)
= true
and accepted_email_domains.tld is not null,
false) as is_valid_email_address
from customers
left join accepted_email_domains
on customers.email_top_level_domain = lower(accepted_email_domains.tld)
)
select * from check_valid_emails
- Unit test file unit test place in src tests/test_is_valid...md
unit_tests:
- name: test_is_valid_email_address
description: "Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains."
model: dim_customers
given:
- input: ref('stg_customers')
rows:
- {email: cool@example.com, email_top_level_domain: example.com}
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
- {email: badgmail.com, email_top_level_domain: gmail.com}
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
- input: ref('top_level_email_domains')
rows:
- {tld: example.com}
- {tld: gmail.com}
expect:
rows:
- {email: cool@example.com, is_valid_email_address: true}
- {email: cool@unknown.com, is_valid_email_address: false}
- {email: badgmail.com, is_valid_email_address: false}
- {email: missingdot@gmailcom, is_valid_email_address: false}
Run the dbt test command:
$ dbt test
Found 3 models, 2 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 0 sources
17:31:05 | Concurrency: 1 threads (target='learn')
17:31:05 |
17:31:05 | 1 of 2 START test not_null_order_order_id..................... [RUN]
17:31:06 | 1 of 2 PASS not_null_order_order_id........................... [PASS in 0.99s]
17:31:06 | 2 of 2 START test unique_order_order_id....................... [RUN]
17:31:07 | 2 of 2 PASS unique_order_order_id............................. [PASS in 0.79s]
17:31:07 |
17:31:07 | Finished running 2 tests in 7.17s.
Completed successfully
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2