duplicate-image-finder
A duplicate image finder scans a set of images and groups exact and visually similar duplicates together β using both byte-level hashing (catches identical files copied to multiple folders) and perceptual hashing (catches resized, recompressed, slightly cropped, or rotated copies) β so you can reclaim disk space, deduplicate a photo library, or audit a content folder before uploading it to a CMS or marketplace. The ZTools Duplicate Image Finder runs entirely in the browser using a fast perceptual hash (pHash / dHash) over a sampled grid of every image, surfaces clusters of near-duplicates with a similarity score, and lets you preview side-by-side before deciding which copies to keep.
Use casesβ
Cleaning a phone photo backupβ
Years of phone backups accumulate burst-shots and forwarded WhatsApp copies of the same picture. Find the clusters and keep one master per group.
Catalogue audit before marketplace uploadβ
Sellers sometimes accidentally upload the same product photo twice or a slightly cropped variant. Run the duplicate finder before the catalogue goes live.
Reclaiming disk spaceβ
Designers keep multiple versions of the same hero image. Find duplicates, delete redundant copies, save 50%+ of folder size.
Plagiarism / re-use auditsβ
Compare a folder of submissions against a reference library to spot copied or lightly-edited reuse.
How it worksβ
- Drop a folder of images β JPG, PNG, WebP, HEIC. The browser reads files locally β nothing uploads.
- Compute hashes β Byte-level SHA-256 for exact-match detection. pHash / dHash on a 32x32 greyscale sample for perceptual similarity.
- Cluster by similarity β Hamming distance between perceptual hashes groups near-duplicates. Threshold slider chooses how strict similarity must be.
- Review clusters β Each group shows thumbnails side-by-side with file size, dimensions, and similarity %.
- Mark and act β Tick which copies to keep / delete. Export a CSV of decisions; or download a ZIP of survivors.
Examplesβ
Input: 500 phone photos including 80 burst-shots
Output: ~30 clusters; e.g. one cluster of 7 near-duplicate burst-shots of the same scene
Input: Marketplace catalogue, 200 product photos
Output: 5 clusters of accidentally re-uploaded products
Input: Original + resized + recompressed copy
Output: All three flagged as the same cluster at >95% similarity
Frequently asked questionsβ
How does it detect resized or rotated duplicates?
Perceptual hashing reduces each image to a small grayscale grid and hashes the pattern. Resize, recompression, and small rotations preserve the pattern, so the hashes stay close.
What similarity threshold should I use?
Hamming distance β€ 5 (about 95% similar) is safe for "almost certainly the same". 6β10 catches more aggressive crops; > 10 starts including merely "thematically similar" photos.
Will it detect mirrored copies?
Optional β a "check flips" toggle hashes mirrored variants too. Slows the scan.
Does it actually delete files?
No β the tool only flags duplicates and lets you mark decisions. You delete locally on your file system. Safer that way.
How big a folder can it handle?
Tested up to ~5,000 images on a modern laptop. Larger libraries should be batched per folder.
Are RAW files supported?
JPG previews extracted from RAW are usually compared. Pure RAW (CR2, NEF, ARW) decoding is limited; convert to JPG first for best results.
Tipsβ
- Run on a copy of the folder until you trust the tool β then run on the master once.
- Keep the highest-resolution / most-recent copy; delete the rest.
- Tighten the threshold to β€ 5 for safety; loosen only when you actively want to find re-edits.
- For catalogue audits, export the CSV and review before deleting β accidental deletes are painful.
- Re-run periodically β duplicates accumulate quietly over time.
Try it nowβ
The full duplicate-image-finder runs in your browser at https://ztools.zaions.com/duplicate-image-finder β no signup, no upload, no data leaves your device.
Last updated: 2026-05-05 Β· Author: Ahsan Mahmood Β· Edit this page on GitHub