Find duplicate files

Apr 13, 2014 · 140 words · 1 minute read

Find all duplicate files in current and sub-directories with bash.

find -not -empty -type f -printf '%s\n' | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Breakdown 🔗

Find all non empty files and print out size.
Do a numeric sort on size list.
Print out only duplicate sizes.
One at a time run find on size and print file names.
Find md5sum of all files.
Alphabetical sort md5sums and file names.
Find all md5sums which repeats and print them in groups.

Alternatively 🔗

Or do it the easy way and install a tool for finding duplicates files. This tool is much faster than the oneliner above.

apt-get install fdupes

This does more or less the same thing as the oneliner.

fdupes -r .

bash oneliner