Automation
Data quality checks are not paperwork. They are executable assumptions about your data. A good check says, in plain terms, what must be true for a pipeline, dashboard, report, model, or automation to be trusted. The goal is not to test everything. The goal is to catch the failures that would create bad decisions, broken customer workflows, or wasted operator time.
Field situation: the dashboard is wrong, but the pipeline is green
A common reliability problem starts like this: the scheduled pipeline ran successfully, the warehouse tables were updated, and the dashboard loaded. But the number on the dashboard is wrong.
The pipeline is green because the orchestration job completed. That only proves the code ran. It does not prove the source extract was complete, the schema was unchanged, the join still matched, the metric logic still made sense, or the result was safe to use.
This is where data quality checks matter. They close the gap between the job ran and the data is fit for its purpose.
What data quality checks are
Data quality checks are repeatable tests that verify important properties of data at a point in the pipeline. They can be simple, such as checking that a table received rows today. They can also be business-specific, such as checking that paid orders never have a negative gross amount.
The most useful checks are tied to a real dependency. If a leadership dashboard, billing workflow, customer segmentation, inventory process, or AI feature depends on a field, table, or metric, then the assumptions behind that dependency deserve checks.
Plain definition: a data quality check is an automated question the system asks before people trust the data.
A pipeline run only proves that code executed. Data quality checks help prove that the result is safe enough to use.
The five check families most teams need first
Most teams do not need a huge testing framework on day one. They need coverage across a few reliable check families. These catch the majority of early pipeline failures without creating a maintenance burden.
- Freshness checks: Did the data arrive when expected?
- Volume checks: Did the number of rows look reasonable compared with recent history?
- Schema checks: Did the expected columns, types, or structures change?
- Validity checks: Do important values fall inside allowed ranges or formats?
- Relationship checks: Do keys, joins, and reconciliations still hold across tables?
These are durable concepts. Tools change, but these check families remain useful across warehouses, lakehouses, orchestration systems, reverse ETL pipelines, and analytics engineering workflows.
| Check family | Question it answers | Example failure |
|---|---|---|
| Freshness | Did expected data arrive on time? | The orders table has not updated since yesterday. |
| Volume | Is the amount of data plausible? | A daily customer export has 90% fewer rows than usual. |
| Schema | Did the structure change unexpectedly? | A source system renamed customer_id to account_id. |
| Validity | Do field values follow allowed rules? | Completed orders include negative totals. |
| Relationship | Do records still connect or reconcile correctly? | Payments no longer match completed orders. |
Examples of useful checks
A useful check is specific enough to act on. A vague check says data looks unusual. A useful check says which assumption failed and why that matters.
For example, a freshness check on an orders table might say: the latest order_created_at timestamp must be within the last 90 minutes during business hours. A validity check might say: order_total must be greater than or equal to zero for completed orders. A relationship check might say: every completed order must join to one customer record.
Notice that these examples are not just technical. They encode how the business expects the data to behave.
Where to run checks in the pipeline
Checks become more useful when they are placed near the failure they are designed to catch. If you only check final dashboard tables, you may discover the issue late and have to inspect every upstream step. If you only check raw source data, you may miss transformation errors.
A practical reliability pattern is to use lightweight checks at multiple layers. Raw ingestion checks confirm that source data arrived. Transformation checks confirm that modeling logic still works. Output checks confirm that trusted tables, metrics, or activation feeds are safe for consumption.
You do not need the same depth everywhere. Critical outputs deserve stronger checks. Low-risk exploratory tables may only need freshness and schema visibility.
| Pipeline layer | Useful checks | Primary purpose |
|---|---|---|
| Raw ingestion | Freshness, volume, schema | Confirm the source delivered usable data. |
| Staging or transformation | Validity, uniqueness, relationship checks | Catch modeling and transformation errors early. |
| Published marts or metrics | Reconciliation, accepted ranges, business rules | Protect dashboards, reports, and operational consumers. |
| Activation or automation outputs | Completeness, eligibility, suppression rules | Prevent bad data from triggering downstream actions. |
Severity matters more than check count
Many teams make the mistake of measuring progress by the number of checks added. That can create noise. Reliability improves when checks are prioritized by consequence.
A critical check should stop, quarantine, or escalate a pipeline because bad data would cause material harm. A warning check should create visibility without waking anyone up. An informational check should help with diagnosis or trend monitoring.
Before adding a check, ask: what happens if this fails and nobody notices for one business day? The answer usually tells you the right severity.
A check without an owner and severity is usually documentation disguised as automation.
| Severity | When to use it | Typical action |
|---|---|---|
| Critical | Bad data could materially affect revenue, customers, compliance-sensitive workflows, or executive decisions. | Stop, quarantine, or escalate immediately. |
| Warning | The issue may matter but needs human review before interrupting operations. | Notify the owner and track resolution. |
| Informational | The signal is useful for diagnosis, trend monitoring, or future tuning. | Log, display, or include in routine review. |
How to automate checks without creating alert fatigue
Automation is useful only when it produces signals people trust. If every minor anomaly pages the same person, the team will eventually ignore the system.
Start with a small set of checks attached to high-value data products. Give each check a clear owner. Route failures to the team that can fix or triage the issue. Include enough context in the alert to avoid starting every incident from zero.
A good failure message should include what failed, when it failed, what data asset is affected, likely downstream impact, and the first diagnostic step. Without that context, automation simply moves confusion from the dashboard to the alert channel.
Every alert should make the next action easier. If an alert only says that something is wrong, it is not finished.
Common failure modes
Data quality checks can fail as a practice even when individual tests are technically correct. The usual causes are predictable.
- No owner: the check fails, but nobody is accountable for the response.
- No severity: every failure looks equally important, so important failures get buried.
- No business meaning: checks verify technical properties that do not protect any real decision or workflow.
- Too late in the pipeline: failures are discovered only after downstream tables have already been built.
- Too brittle: thresholds are so tight that normal seasonality creates false alarms.
- No retirement process: old checks remain active after the data product or business rule changes.
The fix is not always more tooling. Often it is better ownership, clearer severity, and fewer checks with stronger purpose.
A practical starting plan
If you are repairing an unreliable data system, start with the assets people already argue about: executive dashboards, revenue reporting, billing exports, customer-facing data, operational queues, and model inputs.
For each asset, identify the upstream tables and the assumptions that must be true. Then add checks in this order: freshness, volume, schema, critical field validity, and relationship or reconciliation checks.
Keep the first version small. Ten well-owned checks on critical data are more valuable than one hundred checks nobody understands.
Operator checklist before adding a check
Use this checklist to keep data quality checks practical:
- What decision, workflow, dashboard, model, or customer experience does this protect?
- What specific assumption must be true?
- Where in the pipeline can that assumption first be validated?
- What severity should apply if it fails?
- Who owns the response?
- What should happen automatically: warn, stop, quarantine, retry, or escalate?
- What context should the failure message include?
- When should this check be reviewed or retired?
If you cannot answer these questions, the check may still be useful, but it is not ready to be treated as a reliability control.
Key takeaways
- Data quality checks are executable assumptions, not generic test coverage.
- Start with high-value data assets and protect the assumptions that matter most.
- Cover freshness, volume, schema, validity, and relationship checks before adding advanced rules.
- Place checks close to where failures can happen, not only at the final dashboard layer.
- Automation only improves reliability when alerts have ownership, severity, and clear next steps.
Next step
Pick one trusted dashboard, operational export, or model input. Write down the five assumptions that must be true for it to be safe to use, then turn the top three into owned data quality checks with severity and a response path.
- Read Data Quality Checks: Operator Checklist: A practical checklist for deciding which checks to add, where to run them, and how to respond when they fail.
- Read How to Migrate Data Systems Without Breaking Reporting: A practical checklist for moving reporting to a new data system while proving parity, protecting history, and giving users a safe cutover path.