I’ve recently been tasked with finding duplicate files across multiple shares on a network. I’ve looked at a few different free options out there that can do this and in the end found that PowerShell can do this for me, for free. Might want to grab a cup of coffee, or a pot, depending on how many directories and large files you have.
Get-DuplicateFiles.ps1
# USAGE:
# .Get-DuplicateFiles.ps1
param ([string] $Path = (Get-Location))
Add-Type -AssemblyName System
function Get-MD5 (
[System.IO.FileInfo]
$file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]')
) {
# This Get-MD5 function sourced from:
# http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx
$stream = $null
$cryptoProvider = [System.Security.Cryptography.MD5CryptoServiceProvider];
$hashAlgorithm = New-Object $cryptoProvider
$stream = $file.OpenRead()
$hashByteArray = $hashAlgorithm.ComputeHash($stream)
$stream.Close()
## We have to be sure that we close the file stream if any exceptions are thrown.
trap {
if ($stream -ne $null) { $stream.Close() }
break
}
return [string]$hashByteArray
}
$fileGroups = Get-ChildItem -Path $Path -Recurse |
Where-Object { $_.Length -gt 0 } |
Group-Object Length |
Where-Object { $_.Count -gt 1 }
foreach ($fileGroup in $fileGroups) {
foreach ($file in $fileGroup.Group) {
Add-Member -MemberType NoteProperty -Name ContentHash -Value (Get-MD5 $file) -InputObject $file
}
$fileGroup.Group |
Group-Object ContentHash |
Where-Object { $_.Count -gt 1 }
}
Output Example
Count Name Group
----- ---- -----
2 187 49 165 178 166 151... {Ann-DS1.imr, Ann-DS1.imr}
2 230 243 103 209 89 0 1... {MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr, MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr}
2 129 199 76 16 247 255 ... {ANN-Folder-Subfolder Listing.imr, ANN-Folder-Subfolder Listing.imr}
2 16 249 151 209 119 71 ... {Annotation Report.imr, Annotation Report.imr}
2 114 146 127 255 123 23... {Annotation.mdb, Annotation.mdb}