0PricingLogin
SQL Interview Prep · Lesson

Deduplicating Rows Safely

Removing exact and near-duplicate rows while keeping one canonical record.

The Deduplication Problem

"This table has duplicate rows. Remove them but keep one copy of each." Almost every data-engineering interview includes some flavor of this. The challenge is doing it safely: keeping exactly one canonical row and not accidentally deleting distinct records that merely look similar.

We will cover detecting duplicates, choosing which copy to keep, deduplicating in a SELECT, and physically deleting duplicates from a table.

Define Duplicate First

The first question to ask the interviewer: "What makes two rows duplicates?" Options include:

  • Exact duplicates: every column is identical.
  • Key duplicates: same business key (e.g. same email) but other columns may differ.

The technique differs for each. Never assume; clarifying the duplicate definition is the single most important step and interviewers expect you to ask.

All lessons in this course

  1. Top-N Rows Per Group With ROW_NUMBER
  2. Handling Ties in Top-N
  3. Deduplicating Rows Safely
  4. Keeping the Latest Row Per Key
← Back to SQL Interview Prep