How to ace and structure your Data Modelling Interview

Prescriptive guidance for conducting your Data Modelling Interview

Canadian Data Guy

Jun 18, 2025

1. Understand the Requirements (Functional and Non-Functional)

Ask for Use Cases: Start by understanding the primary use cases for the data model. Ask questions like:
- What kind of questions or analyses will the data model need to answer?
- Who will use the data (e.g., business users, analysts, data scientists)?
Clarify Non-Functional Requirements (NFRs): Determine the expectations around performance, latency, and data freshness.
- Is the data needed in real-time, near real-time, or in batches?
- What are the expected data volumes and retention periods?

2. Define Entities and Relationships

Identify the key entities (e.g., Customers, Transactions, Products) and their relationships.
Use Entity-Relationship (ER) diagrams or similar visual tools to illustrate the relationships.
Explain cardinality (e.g., one-to-many, many-to-one) between the entities.
Tip: Start representing the relationship as you progress and validate it with the interviewer to see if it makes sense. This will help establish a common understanding between you, and if you have made any bad assumptions, the interviewer can correct you.

3. Design Fact and Dimension Tables

Fact Tables:
- Explain what events or transactions the fact table will capture.
- Include details like granularity (e.g., transactions at a daily/hourly level).
- Mention the primary key and any measures (e.g., sales amount, quantities).
Dimension Tables:
- Identify dimensions that provide context (e.g., time, products, customers).
- Mention the primary key and the attributes of each dimension.
- Discuss if surrogate keys are used for maintaining consistency.

4. Normalization vs. Denormalization

Explain your approach to normalization or denormalization, depending on the use case:
- Normalized tables are suitable for transactional systems to avoid data redundancy.
- Denormalized tables are better for analytical systems to improve query performance.
Justify your choice based on the expected query patterns and performance needs.

5. Design Aggregation Tables (if needed)

For reporting purposes, you might need aggregate tables that summarize data.
Explain how you would create these tables and what metrics they will store.
Use naming conventions like agg_, dim_, and fact_ for clarity.

6. Discuss Partitioning Strategy

Choose partitioning columns based on query patterns and data distribution:
- For time-based queries, consider partitioning by date.
- Explain the expected data volumes and how partitioning will improve performance.
Include how you would handle archiving and data retention policies.

One Tap Could Make All the Difference

One surefire way to never see this content again? Scroll past without engaging. Search algorithms heavily rely on signals like likes and comments to decide whether a piece of content deserves to surface again, for you or anyone else. If you found this valuable, even in a small way, do consider hitting the like button or dropping a quick comment. It not only supports the content but helps others discover it too.

7. Demonstrate with Sample Queries

Show how the model would work by writing or describing example queries:
- These queries should answer the business questions you gathered in Step 1.
- Highlight how your model minimizes joins and ensures efficient querying.
Aim for zero or minimal joins, especially in denormalized models.

8. Discuss Data Quality and Governance

Explain how you would maintain data quality in your model:
- What kind of checks would you apply at different stages (e.g., data integrity, row count checks)?
- Would you use tools like DBT, Airflow, or others for orchestration?
Address data governance considerations like data lineage, privacy, and compliance (e.g., GDPR).

9. Explain How the Model Can Scale

Discuss how the model will handle increasing data volumes and user queries:
- Consider indexing strategies, sharding, or using cloud-based data warehouses like Snowflake or BigQuery.
Explain if and how you would optimize performance through techniques like partitioning, caching, or indexing.

10. Wrap Up with Recap and Iteration

Summarize your approach and how it addresses the initial requirements.
Discuss potential areas for iteration or improvement if the requirements change.
Be open to feedback from the interviewer and discuss how you would adapt the design based on their inputs.

Template for Each Table Design

For each table you design, mention:

Name: Use naming conventions (e.g., dim_, fact_, agg_).
Primary Key: Specify the key.
Columns: List the key attributes and explain their purpose.
Partitioning: If applicable, explain why you chose a specific column for partitioning.
Estimated Row Count: Provide an estimate based on expected data volumes.
Sample Query: Show how to query the table to answer a business question.

This structure helps convey both your technical skills and your ability to think critically and design solutions that meet business needs.

Leave something rememberable

Before you wrap up, give your interviewer something they can revisit long after the conversation ends. Pair your polished ER diagram with crisp, layered documentation—entity definitions, key attributes, and the rationale behind every relationship. Treat it as a living blueprint: clear enough for newcomers, detailed enough for architects, and structured to mirror the depth of your own experience.

Humans forget roughly 80 % of new information within a day, so the most reliable way to stay top-of-mind is to hand them a reference they can’t ignore. The sharper your vision, the richer your notes, the louder your expertise will echo when the hiring panel reviews candidates. Turn your model into a one-page memory hook that brings your name back to the top of their list.

CanadianDataGuy’s No Fluff Newsletter

Discussion about this post

Ready for more?