Core Concepts

What is a Data Contract?

A data contract is a formal, versioned agreement between the team that produces data and the teams that consume it. It specifies exactly what schema, quality level, and semantics the data must conform to.

The Problem It Solves

In most organisations, software engineers own the systems that generate data. Data analysts and scientists own the pipelines that consume it. Without a formal agreement, engineers silently rename columns, change types, or drop tables — causing downstream breakages that can take days to diagnose.

Common Failure Modes

💥

Silent schema drift

A column is renamed; dashboards go blank overnight.

🕳️

Null explosions

A new code path omits a required field; ML models receive nulls.

📉

Type coercion bugs

An integer field becomes a string; aggregations silently return 0.

🔀

Undocumented semantics

revenue means net in one team and gross in another.

The Producer-Consumer Model

A data contract makes the implicit agreement explicit. The producer commits to a schema and quality bar; consumers can rely on it.

producer-consumer model

  ┌───────────────┐         ┌─────────────────┐         ┌───────────────────┐
  │   Producer    │         │  Data Contract  │         │    Consumers      │
  │ (Eng / App)   │─signs──▶│  (YAML in Git)  │◀─relies─│ Analysts / ML /   │
  │               │         │                 │         │ Data Scientists   │
  └───────────────┘         └────────┬────────┘         └───────────────────┘
                                     │
                               ContractHQ
                            validates on every
                               PR & schedule

Anatomy of a Data Contract

A contract YAML file has four top-level sections:

metadata:

Identity fields — dataset name, owner, version, SLA, and tags.

schema:

Column-by-column field definitions with types and constraints.

quality:

Row-level checks — freshness, completeness, custom SQL assertions.

semantics:

Human-readable definitions — what the data means, not just its shape.

Data Contract vs. JSON Schema

Aspect	JSON Schema	Data Contract
Purpose	Validate API payloads	Govern analytical datasets
Where enforced	At request time	At pipeline / PR time
Versioning	Manual / ad-hoc	Semantic versioning built-in
Quality checks	Structural only	Structural + statistical
Ownership	Not defined	First-class owner field

ℹ️

ContractHQ extends the open-source Data Contract Specification (DCS) format, which means your contracts are portable across other tooling that supports DCS.

← Authentication Schema Definitions →