CanadianDataGuy’s No Fluff Newsletter
Subscribe
Sign in
Home
Notes
TL;DR
Deep Dive
Blogs on Medium
Youtube
Whatsapp Community
About
Latest
Top
Discussions
Why I Materialize Delta History for Debugging
Just a Quick Tip
Nov 27
•
Canadian Data Guy
1
Stop Waiting for Connectors: Stream ANYTHING into Spark (It's 4 Functions)
Listen now | How to ingest data from any source into Apache Spark — demystified with real-world example of BlockChain Ingestion
Nov 3
•
Canadian Data Guy
and
Yogita Nesargi
3
1
1
26:05
October 2025
How to write your first Spark application with Stream-Stream Joins with working code
A Practical, Hands-On Guide to Joining Real-Time Data Streams in Spark Structured Streaming
Oct 15
•
Canadian Data Guy
5
How Spark Structured Streaming Recovers After Failures
A deep dive into fault tolerance, checkpointing, and exactly-once semantics with Delta Lake
Oct 3
•
Canadian Data Guy
1
6:18
September 2025
Build an Ethereum ETL Pipeline for Free Using Databricks Free Edition
Build a zero-infrastructure streaming pipeline: Step-by-step Ethereum data ingestion, schema evolution, and Delta storage
Sep 23
•
Yogita Nesargi
4
1
July 2025
How Many Spark Streaming Jobs Can You REALLY Run on One Cluster?
Discover how to run 100 concurrent Spark Structured Streaming jobs on 1 machine. Learn best practices, trigger intervals, and cost-saving tips—all with…
Jul 24
•
Canadian Data Guy
7
11:33
June 2025
How to ace and structure your Data Modelling Interview
Prescriptive guidance for conducting your Data Modelling Interview
Jun 18
•
Canadian Data Guy
9
2
1
A Deep Dive into Skewed Joins, GroupBy Bottlenecks, and Smart Strategies to Keep Your Spark Jobs Flying
Unlock comprehensive, practical solutions to conquer data skew in Apache Spark—step-by-step from basics to advanced strategies for perfectly balanced…
Jun 6
•
Canadian Data Guy
6
1
May 2025
Decode the Join: A Spark Data Engineer’s Visual Handbook
Understand when and why to use Broadcast, Shuffle, or Sort-Merge Joins in Spark— with clear visuals, real-world use cases, and strategy tips tailored…
May 9
•
Canadian Data Guy
and
Harathi Pasam
15
4
How to Read Delta Log Statistics (and Why You Should)
Learn how to extract and validate column-level stats from your Delta Lake logs to optimize performance and debug configurations
May 2
•
Canadian Data Guy
7
2:47
April 2025
When Data Engineering Met AI
Teaching AI to Play Nice in Data Engineering
Apr 26
•
Canadian Data Guy
1
5:03
Why Your PySpark UDF Is Slowing Everything Down
An in-depth exploration of architecture, execution flow, bottlenecks, and optimization strategies for PySpark UDFs
Apr 24
•
Canadian Data Guy
4
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts