Core Concepts

Data Quality Checks

While schema constraints validate individual column structures, quality checks validate the data itself — row counts, freshness, statistical distributions, and custom SQL logic. All checks are defined under the quality key in your contract.


Where Checks Live

contract.yml
dataset: user_signups
version: 1.0.0
schema:
  # ... field definitions ...

quality:
  - type: freshness
    column: created_at
    max_age: 24h
    severity: error
  - type: row_count
    min: 1000
    severity: warning

Built-in Checks

CheckDescriptionConfig example
freshnessData must have arrived within a given time window.max_age: 24h
row_countTotal row count must be within a min/max range.min: 1000, max: 10000000
completeness% of non-null values must exceed a threshold.threshold: 0.99
uniqueness% of unique values must exceed a threshold.threshold: 1.0
referential_integrityAll values must exist in a referenced table.ref: dim_users.user_id
distributionColumn value distribution must stay within z-score bounds.z_score: 3

Severity Levels

Every check must declare a severity. This controls what happens when the check fails in CI:

error

Blocks the PR merge. The pipeline is considered failed.

warning

Adds a comment to the PR but does not block the merge.

info

Logged and visible in the ContractHQ dashboard only.

Custom SQL Checks

When built-in checks are not sufficient, write arbitrary SQL assertions using the custom_sql type. The query must return a single boolean column named passed.

contract.yml
quality:
  - type: custom_sql
    name: revenue_is_positive
    severity: error
    query: |
      SELECT (COUNT(*) = 0) AS passed
      FROM user_signups
      WHERE revenue < 0
💡
Custom SQL checks have access to all tables in your connected warehouse. Use them to validate cross-table consistency — e.g. every order_id in the events table exists in the orders table.

Scheduling Checks

In addition to running checks on every PR, you can schedule them on a cron:

contract.yml
schedule:
  cron: "0 6 * * *"  # 06:00 UTC daily
  notify:
    slack: "#data-quality-alerts"
    email: [data-platform@company.com]