AI-Ready Data
Source system drift is the gap between what your migration plan assumes about a source system and what the source system is actually doing now. It matters because migrations often fail less from one big technical problem and more from many small changes that were never captured: renamed fields, repurposed columns, changed status logic, deleted history, new required values, or business teams using the system differently than before.
What source system drift is
Source system drift happens when an operational system, spreadsheet, application, or database changes over time while downstream data processes still rely on older assumptions.
In a migration, those assumptions are usually written into field mappings, transformation logic, validation checks, dashboards, and stakeholder expectations. If the source has moved but the migration plan has not, the target system may technically load data while still producing wrong numbers.
Common examples include:
- A field named customer_type used to mean market segment, but now means pricing tier.
- A spreadsheet column called Status gains new values that are not handled in the migration mapping.
- An operational tool allows users to overwrite timestamps that were previously system-generated.
- A source table keeps the same name but now excludes archived records.
- A team creates a workaround field because the official field no longer supports the real workflow.
The important point is that drift is not always a broken source. Often the business changed, the source system adapted informally, and the data migration was not updated to match.
A migration mapping is only as reliable as the source assumptions behind it. If the assumptions are stale, the target system can be clean and still be wrong.
Why drift breaks migrations
Most migration plans begin with a snapshot of understanding: a source inventory, sample files, schema exports, stakeholder interviews, and mapping documents. That snapshot starts aging immediately.
Drift breaks migrations because the target system is built around expectations. It expects certain columns to exist, values to mean specific things, records to be complete, and historical data to be comparable. When those expectations are wrong, you get quiet failures instead of obvious failures.
Quiet failures are more dangerous than failed jobs. A failed job tells you something needs attention. A quiet failure loads the data, passes basic checks, and then creates inaccurate reporting, broken automations, or misleading AI outputs.
For AI-ready data, source system drift is especially important. Models, retrieval systems, scoring logic, and automated workflows all depend on stable context. If the meaning of a field changes but the metadata, lineage, and tests do not, the system may treat inconsistent history as if it were reliable training or decision data.
The main types of drift to look for
Drift is easier to manage when you separate it into categories. Do not treat every difference as a generic data quality problem. A missing field, a changed business definition, and a new workflow workaround require different responses.
Use the categories below during source review, mapping, testing, and cutover planning. They are simple enough for business stakeholders to understand and specific enough for data teams to act on.
| Drift type | What changed | Migration risk | Example |
|---|---|---|---|
| Schema drift | Fields, tables, files, or data types changed. | Pipelines fail or mappings load into the wrong target fields. | A source column changes from free text to a controlled picklist. |
| Value drift | Allowed or common values changed. | Transformations misclassify records or send them to exception queues. | A status field gains a new value called "Paused" that the target does not recognize. |
| Definition drift | The business meaning of a field changed. | Historical trends become misleading because old and new records are not comparable. | "Active customer" used to mean paid account and now means logged in during the last 30 days. |
| Grain drift | The level represented by each row changed. | Duplicates, inflated counts, or broken joins appear downstream. | A table that was one row per order becomes one row per order line. |
| Completeness drift | Records or fields are missing compared with expectations. | The migration loses history or produces biased reporting. | Archived customers are no longer available through the standard export. |
| Process drift | Users changed how they work in the source system. | Unofficial fields, notes, and workarounds become business-critical but undocumented. | Sales reps store renewal intent in a notes field because no structured field exists. |
Step 1: Freeze your current assumptions
Before you can detect drift, you need to know what you believe to be true. Many migrations skip this step because teams think the mapping document is enough. It usually is not.
Create a short source assumption register. For each important source object, capture:
- The source table, file, API, or spreadsheet tab name.
- The business owner who understands how it is used.
- The fields required for migration, reporting, automation, or AI use cases.
- The expected meaning of each field, not just the technical name.
- Allowed values, null rules, uniqueness rules, and date logic.
- The expected grain, such as one row per customer, order, subscription, invoice, or event.
- The extraction method and extraction date used to define the baseline.
This register does not need to be elegant. It needs to be explicit. Source system drift hides in assumptions that were never written down.
Step 2: Profile the live source, not just the sample
Sample files are useful for early discovery, but they are not enough for migration readiness. Drift often appears in edge cases, recent records, inactive records, historical periods, or team-specific workflows that a sample does not include.
Profile the live source against the assumption register. At minimum, check:
- Whether expected fields still exist.
- Whether field types and formats match the migration plan.
- Whether required fields are actually populated.
- Whether new values have appeared in categorical fields.
- Whether record counts match expected business volumes.
- Whether duplicates exist at the expected grain.
- Whether recent records behave differently from older records.
- Whether archived, deleted, or inactive records are available for migration.
For beginner teams, even simple SQL queries, spreadsheet pivots, or profiling reports can find major issues. The goal is not perfect observability on day one. The goal is to stop migrating based on stale assumptions.
Step 3: Classify drift by impact
Not all drift deserves the same response. Some changes are harmless. Some require a mapping update. Some should stop the migration until the business makes a decision.
Classify each drift finding by impact:
- Cosmetic: Names, labels, or formatting changed, but meaning is stable.
- Mapping: The target can still support the data, but transformation logic must change.
- Definition: The business meaning changed, so historical and current records may not be comparable.
- Completeness: Required records or fields are missing, unavailable, or inconsistently populated.
- Process: Users changed how they operate in the source system, often creating informal fields or workarounds.
- Governance: Ownership, access, retention, or approval rules are unclear enough to create risk.
This classification turns drift from a vague complaint into a migration decision queue.
Do not ask only, “Did the data change?” Ask, “Did the meaning, completeness, grain, or business process change?” That question finds the drift that breaks trust.
Step 4: Choose the right response
Once drift is classified, decide how to handle it. The wrong response is to force every source change into the target model without discussion. That creates a new system that preserves old confusion.
Use four response options:
- Accept: The drift is real but harmless. Document it and move on.
- Map: Update transformation logic, lookup tables, validation rules, or target fields.
- Backfill or repair: Fix missing or inconsistent source data before migration, or create a controlled remediation process.
- Escalate: Ask the business owner to decide because the drift changes definitions, reporting, compliance posture, or workflow design.
A good migration playbook makes escalation normal. If a field changed meaning halfway through the year, the data team should not silently decide how revenue, churn, customer status, or operational performance should be interpreted.
| Finding | Likely response | Who should approve |
|---|---|---|
| A field was renamed but meaning is unchanged. | Accept or map. | Migration lead or analytics engineer. |
| A new status value appears in recent records. | Map after business review. | Business owner and data owner. |
| A required field is blank for 20% of historical records. | Repair, backfill, or document exclusion. | Business owner, operations lead, and migration lead. |
| A KPI-driving field changed meaning mid-year. | Escalate before migration signoff. | Executive sponsor or metric owner. |
| The source no longer exposes archived records. | Escalate and decide whether to obtain history another way. | System owner and business owner. |
| Users rely on an unofficial workaround field. | Redesign workflow or explicitly map it. | Operations owner and target system owner. |
Step 5: Protect the cutover window
The most dangerous drift can happen between final testing and cutover. Teams often validate against one extract, then the source changes before the final load.
Protect the cutover window with practical controls:
- Agree on a source change freeze for critical fields where possible.
- Run a final drift check immediately before the migration load.
- Compare record counts, null rates, value distributions, and key business totals to the tested baseline.
- Log any source changes that occur during the freeze window.
- Define who can approve a late mapping change.
- Keep a rollback or reload plan for high-risk objects.
The control does not need to be heavy. It needs to make late source changes visible before they become production issues.
The final extract should not be treated as a routine reload. It is a new evidence point. Compare it to the tested baseline before you declare the migration ready.
Step 6: Keep drift monitoring after migration
Migration does not end the drift problem. It changes where the problem appears. Once the new system is live, upstream applications, manual processes, integrations, and business definitions will continue to evolve.
Turn the migration checks into lightweight operational monitoring:
- Track schema changes for critical source objects.
- Test accepted values for important status, type, and category fields.
- Monitor null rates and duplicate rates at the expected grain.
- Compare source counts to loaded counts.
- Document business definitions in a place analytics and operations teams actually use.
- Review drift findings during release planning or data quality reviews.
This is where source system drift connects directly to AI-ready data. If you want data to support automation, machine learning, retrieval, or decision support, you need ongoing evidence that the source still means what your system thinks it means.
Common failure modes
Source system drift is rarely missed because people are careless. It is missed because the migration process rewards visible progress: mappings completed, pipelines built, dashboards drafted, and cutover dates scheduled.
Watch for these failure modes:
- Mapping from memory: A stakeholder describes how the source used to work, not how it works now.
- Overtrusting column names: A field keeps the same name after its meaning changes.
- Testing only happy paths: Validation focuses on clean recent records and ignores historical exceptions.
- Ignoring inactive data: Closed accounts, cancelled orders, archived users, or old products are excluded until reporting breaks.
- No business owner: The data team finds drift but no one can approve the correct interpretation.
- Late source changes: Operational teams continue changing workflows during final migration testing without notifying the migration team.
The practical answer is not more meetings. It is a shorter feedback loop between source profiling, business review, mapping updates, and migration testing.
Operator checklist for source system drift
Use this checklist before migration testing, before cutover, and after go-live for the most important source objects.
- Have we written down the source assumptions that the migration depends on?
- Do we know the business owner for each critical source object?
- Have we profiled live data, not only sample data?
- Have we checked recent records separately from older records?
- Have we reviewed new, rare, null, and invalid categorical values?
- Have we confirmed the expected grain and duplicate rules?
- Have we identified fields whose business meaning changed over time?
- Have we classified each drift finding by impact?
- Have we assigned each finding to accept, map, repair, or escalate?
- Have we defined the final drift check before cutover?
- Have we converted critical migration checks into ongoing monitoring?
If you cannot answer these questions for a source that drives reporting, billing, customer operations, or AI workflows, treat that source as migration risk.
Key takeaways
- Source system drift is the difference between your migration assumptions and the current reality of the source system.
- The highest-risk drift is often semantic: the field still exists, but its meaning, grain, or business process changed.
- A simple assumption register, live source profiling, impact classification, and cutover drift check prevent many quiet migration failures.
- For AI-ready data, drift management is not optional. AI systems need stable definitions, lineage, and quality signals to use data responsibly.
- The goal is not to freeze the business forever. The goal is to make source changes visible, reviewed, and reflected in downstream systems.
Next step
Pick one source object that feeds an important migration, dashboard, automation, or AI use case. Write down its assumed fields, meanings, grain, and owner. Then profile the live source against those assumptions and classify every difference as accept, map, repair, or escalate.
- Read Source System Drift: Common Mistake: The mistake is assuming the operational system you connected to yesterday will keep meaning the same thing tomorrow.
- Read Source System Drift: Reliability Field Note: How small changes in operational systems quietly break models, pipelines, and dashboard trust.