Core Concepts
Data Quality Checks
While schema constraints validate individual column structures, quality checks validate the data itself — row counts, freshness, statistical distributions, and custom SQL logic. All checks are defined under the quality key in your contract.
Where Checks Live
contract.yml
dataset: user_signups
version: 1.0.0
schema:
# ... field definitions ...
quality:
- type: freshness
column: created_at
max_age: 24h
severity: error
- type: row_count
min: 1000
severity: warningBuilt-in Checks
| Check | Description | Config example |
|---|---|---|
| freshness | Data must have arrived within a given time window. | max_age: 24h |
| row_count | Total row count must be within a min/max range. | min: 1000, max: 10000000 |
| completeness | % of non-null values must exceed a threshold. | threshold: 0.99 |
| uniqueness | % of unique values must exceed a threshold. | threshold: 1.0 |
| referential_integrity | All values must exist in a referenced table. | ref: dim_users.user_id |
| distribution | Column value distribution must stay within z-score bounds. | z_score: 3 |
Severity Levels
Every check must declare a severity. This controls what happens when the check fails in CI:
error
Blocks the PR merge. The pipeline is considered failed.
warning
Adds a comment to the PR but does not block the merge.
info
Logged and visible in the ContractHQ dashboard only.
Custom SQL Checks
When built-in checks are not sufficient, write arbitrary SQL assertions using the custom_sql type. The query must return a single boolean column named passed.
contract.yml
quality:
- type: custom_sql
name: revenue_is_positive
severity: error
query: |
SELECT (COUNT(*) = 0) AS passed
FROM user_signups
WHERE revenue < 0💡
Custom SQL checks have access to all tables in your connected warehouse. Use them to validate cross-table consistency — e.g. every
order_id in the events table exists in the orders table.Scheduling Checks
In addition to running checks on every PR, you can schedule them on a cron:
contract.yml
schedule:
cron: "0 6 * * *" # 06:00 UTC daily
notify:
slack: "#data-quality-alerts"
email: [data-platform@company.com]