Data Quality Checks: Reliability Field Note

Automation

Data quality checks are not paperwork. They are executable assumptions about your data. A good check says, in plain terms, what must be true for a pipeline, dashboard, report, model, or automation to be trusted. The goal is not to test everything. The goal is to catch the failures that would create bad decisions, broken customer workflows, or wasted operator time.

Field situation: the dashboard is wrong, but the pipeline is green

A common reliability problem starts like this: the scheduled pipeline ran successfully, the warehouse tables were updated, and the dashboard loaded. But the number on the dashboard is wrong.

The pipeline is green because the orchestration job completed. That only proves the code ran. It does not prove the source extract was complete, the schema was unchanged, the join still matched, the metric logic still made sense, or the result was safe to use.

This is where data quality checks matter. They close the gap between the job ran and the data is fit for its purpose.

What data quality checks are

Data quality checks are repeatable tests that verify important properties of data at a point in the pipeline. They can be simple, such as checking that a table received rows today. They can also be business-specific, such as checking that paid orders never have a negative gross amount.

The most useful checks are tied to a real dependency. If a leadership dashboard, billing workflow, customer segmentation, inventory process, or AI feature depends on a field, table, or metric, then the assumptions behind that dependency deserve checks.

Plain definition: a data quality check is an automated question the system asks before people trust the data.

Reliability rule

A pipeline run only proves that code executed. Data quality checks help prove that the result is safe enough to use.

The five check families most teams need first

Most teams do not need a huge testing framework on day one. They need coverage across a few reliable check families. These catch the majority of early pipeline failures without creating a maintenance burden.

Freshness checks: Did the data arrive when expected?
Volume checks: Did the number of rows look reasonable compared with recent history?
Schema checks: Did the expected columns, types, or structures change?
Validity checks: Do important values fall inside allowed ranges or formats?
Relationship checks: Do keys, joins, and reconciliations still hold across tables?

These are durable concepts. Tools change, but these check families remain useful across warehouses, lakehouses, orchestration systems, reverse ETL pipelines, and analytics engineering workflows.

Check family	Question it answers	Example failure
Freshness	Did expected data arrive on time?	The orders table has not updated since yesterday.
Volume	Is the amount of data plausible?	A daily customer export has 90% fewer rows than usual.
Schema	Did the structure change unexpectedly?	A source system renamed customer_id to account_id.
Validity	Do field values follow allowed rules?	Completed orders include negative totals.
Relationship	Do records still connect or reconcile correctly?	Payments no longer match completed orders.

Examples of useful checks

A useful check is specific enough to act on. A vague check says data looks unusual. A useful check says which assumption failed and why that matters.

For example, a freshness check on an orders table might say: the latest order_created_at timestamp must be within the last 90 minutes during business hours. A validity check might say: order_total must be greater than or equal to zero for completed orders. A relationship check might say: every completed order must join to one customer record.

Notice that these examples are not just technical. They encode how the business expects the data to behave.

Where to run checks in the pipeline

Checks become more useful when they are placed near the failure they are designed to catch. If you only check final dashboard tables, you may discover the issue late and have to inspect every upstream step. If you only check raw source data, you may miss transformation errors.

A practical reliability pattern is to use lightweight checks at multiple layers. Raw ingestion checks confirm that source data arrived. Transformation checks confirm that modeling logic still works. Output checks confirm that trusted tables, metrics, or activation feeds are safe for consumption.

You do not need the same depth everywhere. Critical outputs deserve stronger checks. Low-risk exploratory tables may only need freshness and schema visibility.

Pipeline layer	Useful checks	Primary purpose
Raw ingestion	Freshness, volume, schema	Confirm the source delivered usable data.
Staging or transformation	Validity, uniqueness, relationship checks	Catch modeling and transformation errors early.
Published marts or metrics	Reconciliation, accepted ranges, business rules	Protect dashboards, reports, and operational consumers.
Activation or automation outputs	Completeness, eligibility, suppression rules	Prevent bad data from triggering downstream actions.

Severity matters more than check count

Many teams make the mistake of measuring progress by the number of checks added. That can create noise. Reliability improves when checks are prioritized by consequence.

A critical check should stop, quarantine, or escalate a pipeline because bad data would cause material harm. A warning check should create visibility without waking anyone up. An informational check should help with diagnosis or trend monitoring.

Before adding a check, ask: what happens if this fails and nobody notices for one business day? The answer usually tells you the right severity.

Operator warning

A check without an owner and severity is usually documentation disguised as automation.

Severity	When to use it	Typical action
Critical	Bad data could materially affect revenue, customers, compliance-sensitive workflows, or executive decisions.	Stop, quarantine, or escalate immediately.
Warning	The issue may matter but needs human review before interrupting operations.	Notify the owner and track resolution.
Informational	The signal is useful for diagnosis, trend monitoring, or future tuning.	Log, display, or include in routine review.

How to automate checks without creating alert fatigue

Automation is useful only when it produces signals people trust. If every minor anomaly pages the same person, the team will eventually ignore the system.

Start with a small set of checks attached to high-value data products. Give each check a clear owner. Route failures to the team that can fix or triage the issue. Include enough context in the alert to avoid starting every incident from zero.

A good failure message should include what failed, when it failed, what data asset is affected, likely downstream impact, and the first diagnostic step. Without that context, automation simply moves confusion from the dashboard to the alert channel.

Practical checkpoint

Every alert should make the next action easier. If an alert only says that something is wrong, it is not finished.

Common failure modes

Data quality checks can fail as a practice even when individual tests are technically correct. The usual causes are predictable.

No owner: the check fails, but nobody is accountable for the response.
No severity: every failure looks equally important, so important failures get buried.
No business meaning: checks verify technical properties that do not protect any real decision or workflow.
Too late in the pipeline: failures are discovered only after downstream tables have already been built.
Too brittle: thresholds are so tight that normal seasonality creates false alarms.
No retirement process: old checks remain active after the data product or business rule changes.

The fix is not always more tooling. Often it is better ownership, clearer severity, and fewer checks with stronger purpose.

A practical starting plan

If you are repairing an unreliable data system, start with the assets people already argue about: executive dashboards, revenue reporting, billing exports, customer-facing data, operational queues, and model inputs.

For each asset, identify the upstream tables and the assumptions that must be true. Then add checks in this order: freshness, volume, schema, critical field validity, and relationship or reconciliation checks.

Keep the first version small. Ten well-owned checks on critical data are more valuable than one hundred checks nobody understands.

Operator checklist before adding a check

Use this checklist to keep data quality checks practical:

What decision, workflow, dashboard, model, or customer experience does this protect?
What specific assumption must be true?
Where in the pipeline can that assumption first be validated?
What severity should apply if it fails?
Who owns the response?
What should happen automatically: warn, stop, quarantine, retry, or escalate?
What context should the failure message include?
When should this check be reviewed or retired?

If you cannot answer these questions, the check may still be useful, but it is not ready to be treated as a reliability control.

Key takeaways

Data quality checks are executable assumptions, not generic test coverage.
Start with high-value data assets and protect the assumptions that matter most.
Cover freshness, volume, schema, validity, and relationship checks before adding advanced rules.
Place checks close to where failures can happen, not only at the final dashboard layer.
Automation only improves reliability when alerts have ownership, severity, and clear next steps.

Next step

Pick one trusted dashboard, operational export, or model input. Write down the five assumptions that must be true for it to be safe to use, then turn the top three into owned data quality checks with severity and a response path.

Recommended next reads

Read Data Quality Checks: Operator Checklist: A practical checklist for deciding which checks to add, where to run them, and how to respond when they fail.
Read How to Migrate Data Systems Without Breaking Reporting: A practical checklist for moving reporting to a new data system while proving parity, protecting history, and giving users a safe cutover path.

Field situation: the dashboard is wrong, but the pipeline is green

What data quality checks are

The five check families most teams need first

Examples of useful checks

Where to run checks in the pipeline

Severity matters more than check count

How to automate checks without creating alert fatigue

Common failure modes

A practical starting plan

Operator checklist before adding a check

Key takeaways

Next step

Keep reading on this topic.

Data Quality Checks: Operator Checklist

Data Quality Checks: Plain-English Guide

Data Quality Checks: Common Mistake

Keep the data path moving.