You import a CSV. Names are in ALL CAPS. Emails have trailing spaces. Dates are in three different formats. Phone numbers have dashes, parentheses, and spaces mixed randomly. Half the rows are duplicates.
Before you can VLOOKUP, QUERY, or dashboard anything, you need clean data. This guide covers every data cleanup technique in Google Sheets.
Remove Duplicates
Built-in tool
Select your data range, then: Data > Data cleanup > Remove duplicates. Choose which columns to check for duplicates.
Formula: Find duplicates first
=IF(COUNTIF(A$2:A2, A2) > 1, "DUPLICATE", "")
This marks the second (and subsequent) occurrences as "DUPLICATE". Put it in a helper column, then filter and delete the flagged rows.
Formula: Get unique values
=UNIQUE(A2:A)
=UNIQUE(A2:D)
Trim Whitespace
Extra spaces are the #1 cause of VLOOKUP failures. A cell that looks like "John Smith" might actually be " John Smith " with invisible leading/trailing spaces.
=TRIM(A2)
=TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " ")))
CHAR(160) is the non-breaking space character that often comes from web copy-paste. TRIM doesn't catch it by default.
=ARRAYFORMULA(IF(A2:A="", "", TRIM(A2:A)))
Standardize Text Case
=UPPER(A2) → "JOHN SMITH"
=LOWER(A2) → "john smith"
=PROPER(A2) → "John Smith"
=PROPER(TRIM(LOWER(A2)))
=ARRAYFORMULA(IF(C2:C="", "", LOWER(TRIM(C2:C))))
Split Text into Columns
Built-in: Split text to columns
Select a column, then: Data > Split text to columns. Choose comma, space, or custom delimiter.
Formula: SPLIT
=SPLIT(A2, " ")
=SPLIT(C2, "@")
Extract specific parts with REGEXEXTRACT
=REGEXEXTRACT(C2, "@(.+)")
=REGEXEXTRACT(A2, "\d+")
=REGEXEXTRACT(D2, "\(?(\d{3})")
Fix Dates
Dates stored as text are the second most common data problem. If your date column is left-aligned, it's text, not a real date.
// "03/15/2026" (text) → real date
=DATEVALUE(A2)
// "March 15, 2026" (text) → real date
=DATEVALUE(A2)
// "2026-03-15" (ISO format text) → real date
=DATEVALUE(A2)
=TEXT(DATEVALUE(A2), "YYYY-MM-DD")
=IF(REGEXMATCH(A2, "^\d{4}-"),
DATEVALUE(A2),
IF(REGEXMATCH(A2, "^\d{1,2}/"),
DATEVALUE(A2),
"UNKNOWN FORMAT"
)
)
Clean Phone Numbers
=REGEXREPLACE(D2, "\D", "")
=IF(LEN(REGEXREPLACE(D2,"\D",""))=10,
"("&MID(REGEXREPLACE(D2,"\D",""),1,3)&") "
&MID(REGEXREPLACE(D2,"\D",""),4,3)&"-"
&MID(REGEXREPLACE(D2,"\D",""),7,4),
D2
)
Find and Replace with SUBSTITUTE
// Remove commas from numbers
=SUBSTITUTE(A2, ",", "")
// Replace "N/A" with empty string
=IF(A2="N/A", "", A2)
// Replace multiple values
=SUBSTITUTE(SUBSTITUTE(A2, "Inc.", ""), "LLC", "")
Validate Data
=REGEXMATCH(C2,
"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
)
=ISNUMBER(A2)
=IF(OR(A2="", B2="", C2=""),
"INCOMPLETE", "OK")
The Data Cleanup Workflow
- Duplicate your data. Work on a copy. Never clean the original.
- Remove duplicates. Data > Data cleanup > Remove duplicates.
- Trim whitespace. ARRAYFORMULA + TRIM on all text columns.
- Standardize case. LOWER for emails, PROPER for names.
- Fix dates. DATEVALUE + format as YYYY-MM-DD.
- Standardize formats. Phone numbers, addresses, codes.
- Validate. Flag incomplete rows, invalid emails, out-of-range numbers.
- Paste values. When done, Ctrl+Shift+V to paste formulas as values.
Clean data isn't glamorous, but it's the foundation of every formula, dashboard, and report that works. Garbage in, garbage out. Clean data in, insights out.