I’ve recently been tasked with finding duplicate files across multiple shares on a network. I’ve looked at a few different free options out there that can do this and in the end found that PowerShell can do this for me, for free. Might want to grab a cup of coffee, or a pot, depending on how many directories and large files you have.
Get-DuplicateFiles.ps1
# USAGE: # .Get-DuplicateFiles.ps1 param ([string] $Path = (Get-Location)) Add-Type -AssemblyName System function Get-MD5 ( [System.IO.FileInfo] $file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]') ) { # This Get-MD5 function sourced from: # http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx $stream = $null $cryptoProvider = [System.Security.Cryptography.MD5CryptoServiceProvider]; $hashAlgorithm = New-Object $cryptoProvider $stream = $file.OpenRead() $hashByteArray = $hashAlgorithm.ComputeHash($stream) $stream.Close() ## We have to be sure that we close the file stream if any exceptions are thrown. trap { if ($stream -ne $null) { $stream.Close() } break } return [string]$hashByteArray } $fileGroups = Get-ChildItem -Path $Path -Recurse | Where-Object { $_.Length -gt 0 } | Group-Object Length | Where-Object { $_.Count -gt 1 } foreach ($fileGroup in $fileGroups) { foreach ($file in $fileGroup.Group) { Add-Member -MemberType NoteProperty -Name ContentHash -Value (Get-MD5 $file) -InputObject $file } $fileGroup.Group | Group-Object ContentHash | Where-Object { $_.Count -gt 1 } }
Output Example
Count Name Group ----- ---- ----- 2 187 49 165 178 166 151... {Ann-DS1.imr, Ann-DS1.imr} 2 230 243 103 209 89 0 1... {MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr, MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr} 2 129 199 76 16 247 255 ... {ANN-Folder-Subfolder Listing.imr, ANN-Folder-Subfolder Listing.imr} 2 16 249 151 209 119 71 ... {Annotation Report.imr, Annotation Report.imr} 2 114 146 127 255 123 23... {Annotation.mdb, Annotation.mdb}