Migration

Customer data modeling is the work of deciding what a customer means in your business, which systems are allowed to describe that customer, and how conflicting records should be resolved. If you skip those decisions, migrations become risky, dashboards disagree, and every team quietly builds its own version of the customer.

Why Customer Data Models Fail

Most customer models do not fail because someone forgot a column. They fail because the business never made several uncomfortable decisions explicit.

A useful customer model answers basic operating questions: Is a customer a person, an account, a household, a company, a billing entity, or a product workspace? Can one person belong to multiple accounts? Can one company have multiple contracts? Which system wins when CRM, billing, support, and product usage disagree?

During a migration, these questions become urgent because old assumptions get copied into a new stack. If the old data model was unclear, the new platform will usually make the confusion faster, not better.

  • Symptom: Sales reports count accounts, finance reports count subscriptions, and product reports count workspaces.
  • Root cause: The model uses the word customer for multiple grains.
  • Operator fix: Separate customer concepts before rebuilding pipelines or dashboards.

Checklist 1: Define the Customer Grain

The first checklist item is grain. Grain means the exact thing represented by one row in a table. A customer data model should not rely on the vague noun customer until the team agrees what the row actually represents.

Use this checklist before naming a table customers:

  • Decide whether the core customer entity is a person, account, organization, household, location, subscription, workspace, or another business object.
  • Document cases where one real-world customer can have multiple records at the chosen grain.
  • Document cases where multiple people or entities can share one customer record.
  • Separate buyer, user, payer, admin, and legal entity if they are not always the same thing.
  • Use table names that expose grain, such as customer_accounts, customer_contacts, or billing_customers, instead of hiding differences under one generic table.

A beginner-friendly rule: if two teams count customers differently and both can defend their answer, you probably have more than one customer grain.

Operator rule

Never approve a customer table until you can say what one row represents without using the word customer.

Customer concept Common grain When it matters
Person One individual human Consumer products, user-level behavior, identity, marketing consent, support contacts
Account One commercial relationship or sales account B2B sales, account ownership, pipeline, customer success coverage
Organization One legal or real-world company Enterprise hierarchies, parent-child companies, territory planning
Billing customer One payer or billing profile Invoices, subscriptions, payment methods, revenue reconciliation
Workspace or tenant One product environment SaaS usage, activation, product-led growth, admin permissions

Checklist 2: Map Each Source System to the Customer Model

Customer data usually arrives from several systems: CRM, billing, product analytics, support, marketing automation, data imports, and manual spreadsheets. Each system was built for a different workflow. None of them automatically owns the full truth.

Create a source-to-model map before migration or major cleanup work begins. For each source, record what it contributes, what it should not control, and which identifiers connect it to other sources.

  • List every system that creates, updates, merges, or deletes customer-related records.
  • Identify the native customer identifier in each source.
  • Identify cross-system identifiers, such as email, domain, account ID, billing customer ID, or CRM account ID.
  • Mark which fields are operationally owned by each system.
  • Mark which fields are derived in the warehouse and should not be edited manually in source systems.
  • Record known gaps, such as missing historical IDs, reused email addresses, shared inboxes, duplicate accounts, or merged CRM records.

The goal is not to make every source perfect. The goal is to stop pretending that every source means the same thing by customer.

Source system Usually reliable for Usually risky for
CRM Sales ownership, pipeline, account hierarchy, relationship notes Product activity, billing truth, duplicate-free identity
Billing system Invoices, subscriptions, payment status, payer details Sales territory, product engagement, support status
Product database Users, workspaces, feature usage, activation events Legal account ownership, revenue, offline sales context
Support system Tickets, contacts, issues, service history Canonical customer identity, revenue status, lifecycle state
Marketing platform Campaign engagement, forms, acquisition source Customer status, deduplication, account hierarchy

Checklist 3: Separate Identity Rules from Customer Attributes

A durable customer model separates identity from description. Identity rules decide whether two records refer to the same customer. Attributes describe that customer after the identity decision has been made.

This distinction matters because many teams accidentally use unstable attributes as identity. Email, company name, domain, phone number, and address can help matching, but they can also change, collide, or be shared.

  • Define the durable primary key for each modeled customer entity.
  • Keep source-system IDs alongside warehouse IDs so records remain traceable.
  • Do not overwrite source IDs during deduplication; preserve them in bridge or mapping tables.
  • Define matching confidence levels instead of treating every fuzzy match as certain.
  • Keep identity resolution rules versioned, especially during a migration.
  • Store merge history when records are combined, so downstream reports can be reconciled.

For a small business, identity rules may be simple. For a marketplace, B2B SaaS company, healthcare workflow, or franchise business, identity can become one of the most important parts of the model.

Warning

Do not make email the only customer identifier unless the business can tolerate shared inboxes, changed emails, aliases, recycled addresses, and multiple accounts per person.

Checklist 4: Model the Customer Lifecycle Explicitly

Customer status is often one of the most disputed fields in a data warehouse. A sales team may call someone a customer when a deal closes. Finance may wait for a paid invoice. Product may wait for activation. Support may care whether the account is entitled to help.

Instead of forcing one overloaded status field to serve every team, model lifecycle states with explicit definitions.

  • Define prospect, lead, opportunity, customer, active customer, inactive customer, churned customer, and reactivated customer if those states matter to the business.
  • Attach each lifecycle state to a triggering event or rule.
  • Record the effective date of each state change.
  • Decide whether lifecycle states are current-only or historically tracked.
  • Separate commercial status from product usage status when needed.
  • Define what happens when a customer downgrades, pauses, merges, transfers ownership, or reactivates.

The model should make lifecycle logic inspectable. If a dashboard says active customers declined, an operator should be able to trace which customers changed state and why.

Checklist 5: Choose the Core Customer Tables

Beginners often try to solve customer data modeling with one wide customer table. That can work for a narrow business, but it becomes brittle when the company has multiple products, contracts, users, or billing arrangements.

A more resilient pattern is to separate the stable entities from events and relationships. You do not need a complicated architecture to do this. You need clear tables with clear jobs.

  • Create entity tables for stable business objects, such as accounts, contacts, organizations, users, subscriptions, or locations.
  • Create relationship tables when many-to-many relationships exist, such as contacts to accounts or users to workspaces.
  • Create event tables for changes over time, such as signups, activations, purchases, invoices, support tickets, renewals, and cancellations.
  • Create mapping tables for source IDs and identity resolution.
  • Create derived summary tables only after the underlying entities and events are reliable.

The simplest good model is not the fewest tables. It is the model where each table has one clear grain and one clear purpose.

Checklist 6: Migration Readiness for Customer Data

Customer data migrations fail when teams move records before deciding what the records mean. The migration may technically complete while the business loses trust because counts, owners, statuses, or histories changed without explanation.

Before migrating customer data, complete these checks:

  • Inventory all customer-related entities in the old system and the target system.
  • Map old fields to target fields, including fields that will be dropped, renamed, split, merged, or derived.
  • Identify required fields in the target model and how missing values will be handled.
  • Define deduplication rules before loading production data.
  • Decide whether historical inactive, deleted, test, or spam records will migrate.
  • Preserve legacy IDs for reconciliation and support lookup.
  • Run sample migrations and compare record counts by segment, source, status, and lifecycle stage.
  • Get business signoff on known exceptions instead of hiding them as technical cleanup.

A migration is also a modeling opportunity. If a field has no owner, no definition, and no user, it may not deserve to be carried forward.

Migration checkpoint

If the team cannot reconcile old and new customer counts by meaningful segments, the migration is not ready for business signoff.

Checklist 7: Add Quality Checks That Match Business Risk

Customer data quality checks should protect the decisions the business actually makes. A generic null check is useful, but it is not enough if the real risk is duplicated accounts, broken billing links, or incorrect lifecycle states.

Start with checks that catch trust-breaking defects:

  • Each modeled customer entity has a unique primary key.
  • Required relationships exist, such as every subscription linking to a billing customer or every product workspace linking to an account when the business expects that relationship.
  • Duplicate detection runs on the identifiers that matter for your business, not only exact names.
  • Status fields contain only allowed values.
  • Lifecycle dates occur in a valid order.
  • Critical foreign keys do not point to missing records.
  • Counts by source, region, segment, plan, or lifecycle state do not move unexpectedly after migration.
  • Test, internal, demo, and spam records are labeled rather than silently mixed into customer metrics.

Good checks do not prove the model is perfect. They make important failures visible early enough to fix.

Failure mode Typical signal Useful check
Duplicate customers Counts are inflated and teams argue over which record is real Duplicate detection by source IDs, email, domain, account name, and known matching rules
Broken relationships Subscriptions, tickets, or users cannot be tied to accounts Foreign key coverage and orphan record checks
Conflicting statuses One dashboard says active while another says churned Single lifecycle definition with effective dates and allowed transitions
Lost history Migration looks clean but old trends cannot be explained Legacy ID preservation and before-after reconciliation by period
Mixed test data Internal or demo records appear in customer metrics Explicit record classification and exclusion rules

Checklist 8: Assign Ownership and Change Control

A customer model is a shared business contract, not just a database structure. If nobody owns definitions, every urgent request can quietly rewrite the model.

Assign ownership at three levels:

  • Business owner: approves definitions such as active customer, churned customer, account owner, and customer segment.
  • Technical owner: maintains pipelines, tests, table structure, and migration logic.
  • Data consumer owner: represents the dashboards, downstream tools, reverse ETL jobs, or operational workflows that depend on the model.

Change control does not need to be bureaucratic. It does need to be visible. When a customer definition changes, the team should know which reports, automations, and historical metrics will be affected.

Practical governance

A lightweight owner list beats a perfect data dictionary that nobody is accountable for maintaining.

Checklist 9: Define Acceptance Tests Before Launch

Do not wait until dashboard users complain to decide whether the customer model works. Define acceptance tests before launching a repaired model or migrated system.

Useful acceptance tests include:

  • Can a support operator search by legacy ID and find the migrated customer?
  • Can finance reconcile migrated billing customers to invoices?
  • Can sales reconcile account ownership and opportunity history?
  • Can product analytics connect users or workspaces to the correct account?
  • Can leadership reproduce the old headline customer count and explain any intentional differences?
  • Can analysts trace a dashboard metric back to source records and transformation rules?
  • Can the team identify all records excluded from migration and explain why?

The best acceptance test is not only a row count. It is a realistic business workflow that proves the model can be trusted in daily use.

Acceptance area Pass condition Who should review
Identity Known duplicate and merged records behave as expected Data owner and business operations
Revenue linkage Billing customers connect to invoices and subscriptions Finance or revenue operations
Sales continuity Accounts, owners, and opportunities reconcile to the prior system Sales operations
Product linkage Users or workspaces connect to the correct customer entity Product analytics or engineering
Executive metrics Top-level counts are reproducible or differences are documented Data leader and business sponsor

Key takeaways

  • Customer data modeling starts with defining the grain of customer, not choosing a tool or building a wider table.
  • Separate person, account, organization, billing customer, and product workspace when they represent different business realities.
  • During migration, preserve legacy IDs and reconcile counts by meaningful segments before asking the business to trust the new model.
  • Identity rules, lifecycle states, and ownership rules should be explicit, versioned, and testable.
  • A good customer model supports daily workflows: finance reconciliation, sales ownership, support lookup, product analysis, and executive reporting.

Next step

Pick one important customer metric, such as active customers or paying accounts. Trace it back to the source records, identity rules, lifecycle definition, and table grain. Any step that cannot be explained is the next part of the customer model to repair.

Controlled internal links