Automation
Data modeling during a migration is the work of deciding what each table means, how it connects to the business process, and what quality standard it must meet before people rely on it. The safest migration is not a lift-and-shift of every old table. It is a controlled rebuild of the models that matter most, with clear grain, source mapping, tests, and rollout rules.
Why Migration Is a Data Modeling Problem, Not Just a Moving Problem
A migration looks technical on the surface: move data from one warehouse, tool, or pipeline framework to another. But most of the risk is semantic. The team has to decide what counts as a customer, an active account, a booked order, a refunded transaction, a qualified lead, or a retained user.
If those definitions are hidden in legacy SQL, spreadsheet formulas, dashboard filters, or one analyst’s memory, the migration will expose the weakness. The new system may run faster and look cleaner while still answering the wrong question.
Good data modeling gives the migration a stable target. It defines the business entities, events, relationships, and measures that downstream reporting depends on. It also creates a practical contract between source systems, transformation logic, and dashboards.
For a beginner team, the key mental shift is simple: you are not migrating tables; you are migrating business meaning.
Start With the Business Process and Grain
Every useful model starts with grain. Grain means the level of detail represented by one row. Without grain, every metric becomes negotiable and every join becomes risky.
Before rebuilding a model, write a plain-English sentence for the grain. For example: one row per order, one row per order line item, one row per subscription per day, one row per customer account, or one row per invoice payment event.
Then connect that grain to a real business process. Orders, subscriptions, invoices, support tickets, product events, marketing touches, and sales opportunities are not interchangeable. Each process has its own lifecycle, timestamps, identifiers, and edge cases.
Useful migration questions include:
- What business process does this model describe?
- What does one row represent?
- What event creates a row?
- Can a row be updated, cancelled, merged, deleted, or reversed?
- Which timestamp should reporting use by default?
- Which identifier is stable enough to join on?
- Who owns the definition when there is a dispute?
If the team cannot answer these questions, pause before automating. The migration may still proceed, but that model should be marked as unresolved and not treated as a trusted reporting base.
Never approve a migrated table until someone can say what one row represents. Grain is the first reliability control.
Inventory Legacy Tables, Reports, and Hidden Logic
A migration playbook begins with an inventory. The goal is not to document everything forever. The goal is to separate what is critical, what is duplicated, what is obsolete, and what contains logic that must be preserved or rewritten.
Inventory more than database tables. Include dashboards, scheduled extracts, spreadsheet dependencies, reverse ETL jobs, finance reconciliations, executive metrics, and manually maintained reference files. Many important definitions live outside the warehouse.
For each asset, capture:
- Name and location.
- Primary users or business owner.
- Refresh frequency and freshness expectation.
- Upstream sources.
- Downstream dashboards, exports, or operational workflows.
- Key metrics or dimensions produced.
- Known issues, workarounds, and trust problems.
- Whether it should be migrated, redesigned, retired, or temporarily bridged.
This inventory protects the team from two common mistakes: migrating unused objects because they exist, and deleting important logic because nobody knew where it was used.
Decide What to Migrate, Redesign, Retire, or Bridge
Not every legacy model deserves a new version. Migration is an opportunity to reduce system surface area. The best teams are deliberate about what they carry forward.
Use four categories. Migrate when the model is trusted, well understood, and still useful. Redesign when the business question remains important but the legacy shape is confusing or brittle. Retire when the model has no clear owner or downstream value. Bridge when the business needs continuity while the proper model is being built.
Bridging is useful but dangerous. A bridge can keep old dashboards alive during rollout, but it should have an expiration date. Otherwise the team ends up maintaining both the old worldview and the new one.
A practical rule: if a table feeds an executive dashboard, revenue process, board metric, finance reconciliation, or customer-facing workflow, do not retire or redesign it casually. Put it through explicit acceptance checks with the business owner.
| Decision | Use when | Migration risk | Operator action |
|---|---|---|---|
| Migrate | The model is trusted, used, and has clear grain. | Old assumptions may still be hidden. | Rebuild with tests and reconcile against legacy outputs. |
| Redesign | The business question matters but the old model is confusing or brittle. | Users may expect old numbers to match exactly. | Document definition changes and get explicit acceptance. |
| Retire | The asset has no clear owner, use, or decision value. | Someone may depend on it silently. | Check lineage, dashboard usage, scheduled exports, and stakeholder memory before deletion. |
| Bridge | The business needs continuity while the new model is built. | Temporary compatibility becomes permanent debt. | Set an expiration date and track consumers that still use the bridge. |
Create a Source-to-Target Mapping Before Writing Final Models
A source-to-target mapping translates migration intent into implementation detail. It shows where each field in the new model comes from, how it is transformed, and how it should be tested.
This does not need to be a heavyweight enterprise document. A simple table is enough if it is used. The mapping should be readable by both the builder and the reviewer.
Include field name, source object, transformation rule, null behavior, valid values, owner, and test expectation. For measures, include the aggregation rule. For dates, include the business meaning of the timestamp. For identifiers, include whether the key is natural, surrogate, hashed, or generated during transformation.
The mapping becomes the shared contract for the migration. It reduces rework because disagreements happen before SQL, orchestration, and dashboards depend on the wrong assumption.
| Mapping field | Why it matters | Example |
|---|---|---|
| Target field name | Creates consistent language in the new model. | customer_created_at |
| Source field or table | Shows where the value originates. | crm.accounts.createddate |
| Transformation rule | Makes logic reviewable before deployment. | Convert to UTC and cast to timestamp. |
| Null behavior | Prevents silent metric drift. | Required for active accounts; null allowed for imported legacy accounts. |
| Valid values | Supports testing and dashboard filters. | active, paused, cancelled |
| Owner | Clarifies who resolves disputes. | Revenue operations owns lifecycle status. |
| Test expectation | Turns assumptions into controls. | Not null for current records; accepted values only. |
Use Layered Models to Control Complexity
Layering keeps migration logic understandable. A beginner-friendly pattern is to separate source cleaning, business modeling, and presentation models.
In a source or staging layer, keep close to the raw source. Rename fields, cast types, standardize timestamps, remove obvious technical duplicates, and document source quirks. Do not bury business metrics here.
In an intermediate or core layer, express business logic. This is where you define entities, events, statuses, lifecycle milestones, and relationships across systems.
In a presentation layer, shape data for common consumption patterns such as dashboards, finance reporting, customer health views, or operational monitoring.
Layering helps pipeline reliability because failures are easier to isolate. If a source changes, the staging layer should show it quickly. If a definition changes, the business model should show it clearly. If a dashboard needs a different shape, the presentation layer can change without rewriting the whole system.
Model for Pipeline Reliability, Not Just Query Convenience
Pipeline reliability improves when models make assumptions visible. Brittle systems often fail because important assumptions are implicit: a field is never null, an ID is unique, a status only has five values, or events always arrive in order.
During migration, turn those assumptions into checks. Test uniqueness where a key is supposed to be unique. Test accepted values where a status drives reporting. Test relationships where a child record should connect to a parent. Test freshness where dashboards depend on daily or hourly updates.
Also model late-arriving, corrected, and deleted records deliberately. Many source systems update history. Others append corrections. Some hard-delete records or merge customers. If the model ignores those behaviors, metrics will drift and nobody will know why.
A reliable model is not one that never changes. It is one where change is detected, explained, and handled before the wrong number reaches the business.
| Modeling choice | Reliability benefit | Failure it helps prevent |
|---|---|---|
| Declare grain | Makes duplicates and joins easier to detect. | Inflated revenue or user counts from accidental many-to-many joins. |
| Use stable keys | Keeps relationships durable across refreshes. | Broken history when display names or emails change. |
| Separate staging and business logic | Isolates source changes from metric definitions. | Hard-to-debug failures buried in one large query. |
| Test accepted values | Catches unexpected source behavior. | New status values disappearing from reports. |
| Track freshness | Makes stale data visible. | Executives making decisions from yesterday’s failed load. |
| Document metric definitions | Reduces dashboard disputes. | Two teams using the same metric name for different logic. |
Use Automation After the Rules Are Clear
Automation can accelerate a migration, especially for repetitive tasks such as schema profiling, column documentation, test generation, lineage extraction, deployment, and scheduled validation. But automation cannot decide what revenue means, which timestamp is authoritative, or whether cancelled orders should count in demand.
The safest sequence is: define the model, document the mapping, agree on acceptance checks, then automate the repeatable work. Reversing that order creates fast uncertainty.
Automation is especially useful when the migration has many similar source tables or many dashboard dependencies. It can detect type changes, missing columns, freshness delays, row-count anomalies, and unexpected null spikes. These checks do not replace review, but they give reviewers better evidence.
Be careful with auto-generated models. They often reproduce source structure instead of business structure. That may be acceptable for a staging layer, but it is rarely enough for trusted reporting.
Automation can move unclear logic faster than humans can review it. Use it to enforce decisions, not to avoid making them.
Validate With Reconciliation, Not Vibes
A migrated model is not ready because the query runs. It is ready when it passes agreed validation checks and the business owner accepts the differences.
Reconciliation usually compares the new model against a legacy report, source system export, or finance-approved number. The goal is not always a perfect match. Sometimes the new model intentionally fixes old logic. In that case, document the expected difference.
Useful validation checks include:
- Row counts by day, month, source, status, and owner.
- Metric totals compared with legacy dashboards or source exports.
- Duplicate checks on declared keys.
- Null-rate checks on required fields.
- Accepted-value checks for statuses and categories.
- Referential checks between facts and dimensions.
- Freshness checks against business expectations.
- Sample record walkthroughs for edge cases.
The strongest validation combines aggregate checks with record-level examples. Aggregates catch large differences. Record examples catch semantic mistakes that totals can hide.
When the new number differs from the old number, classify the difference: source timing, corrected logic, missing records, duplicate records, filter mismatch, or true defect.
Roll Out Models With Ownership and a Cutover Plan
Migration does not end when the new model is deployed. It ends when downstream users have switched, old logic is retired, monitoring is active, and ownership is clear.
Each production model should have an owner, a purpose, a freshness expectation, known consumers, and a basic change process. Without ownership, the new warehouse will slowly become another legacy system.
Plan cutover in stages. First, run the new model in parallel with the legacy version. Second, reconcile and document differences. Third, move a small group of consumers. Fourth, update dashboards and operational jobs. Fifth, retire or freeze the old asset. Sixth, monitor for breakage and confusion after launch.
Do not let both versions remain active indefinitely unless there is a clear reason. Parallel systems create metric disputes, duplicated maintenance, and unclear incident response.
Beginner Checklist for a Modeling Migration
Use this checklist before declaring a migrated model ready for trusted use.
- The model has a plain-English purpose.
- The grain is written down and reviewed.
- Primary keys and join keys are defined.
- Source-to-target mapping exists for important fields.
- Business definitions are documented for key metrics and dimensions.
- Known source quirks are documented.
- Data quality tests cover uniqueness, nulls, accepted values, relationships, and freshness where relevant.
- Legacy and new outputs have been reconciled.
- Expected differences are documented.
- A business owner has accepted the result.
- Downstream dashboards or jobs have a cutover plan.
- The old asset has a retirement or freeze decision.
If several of these are missing, the model may still be useful for exploration, but it should not be treated as a trusted reporting foundation.
Key takeaways
- A migration is a chance to repair business meaning, not just move old tables into a new tool.
- The first modeling decision is grain: define what one row represents before building downstream logic.
- Inventory dashboards, spreadsheets, exports, and hidden formulas, not only warehouse tables.
- Use migrate, redesign, retire, and bridge decisions to reduce legacy clutter deliberately.
- Source-to-target mapping turns assumptions into reviewable implementation detail.
- Automation is valuable after definitions and acceptance checks are clear; it should not replace modeling judgment.
- Reliable migrated models need tests, reconciliation, ownership, and a cutover plan.
Next step
Pick one high-value legacy dashboard or reporting table. Write down its grain, owner, source tables, key metrics, downstream consumers, known trust issues, and a migrate/redesign/retire/bridge decision. Then build the source-to-target mapping before writing the final replacement model.
- Read Data Modeling: Common Mistake: A beginner-friendly guide to the source-shaped modeling mistake that makes dashboards unreliable and pipelines harder to automate.
- Read Data Modeling: Reliability Field Note: A practical note on using data models to make metrics, pipelines, and dashboards more trustworthy.