Find File Duplicates with PowerShell

I’ve recently been tasked with finding duplicate files across multiple shares on a network. I’ve looked at a few different free options out there that can do this and in the end found that PowerShell can do this for me, for free. Might want to grab a cup of coffee, or a pot, depending on how many directories and large files you have.
Get-DuplicateFiles.ps1

# USAGE:
# .Get-DuplicateFiles.ps1
param ([string] $Path = (Get-Location))
Add-Type -AssemblyName System
function Get-MD5 (
    [System.IO.FileInfo]
    $file = $(throw 'Usage: Get-MD5 [System.IO.FileInfo]')
) {
    # This Get-MD5 function sourced from:
    # http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx
    $stream = $null
    $cryptoProvider = [System.Security.Cryptography.MD5CryptoServiceProvider];
    $hashAlgorithm = New-Object $cryptoProvider
    $stream = $file.OpenRead()
    $hashByteArray = $hashAlgorithm.ComputeHash($stream)
    $stream.Close()
    ## We have to be sure that we close the file stream if any exceptions are thrown.
    trap {
        if ($stream -ne $null) { $stream.Close() }
        break
    }
    return [string]$hashByteArray
}
$fileGroups = Get-ChildItem -Path $Path -Recurse |
    Where-Object { $_.Length -gt 0 } |
    Group-Object Length |
    Where-Object { $_.Count -gt 1 }
foreach ($fileGroup in $fileGroups) {
    foreach ($file in $fileGroup.Group) {
        Add-Member -MemberType NoteProperty -Name ContentHash -Value (Get-MD5 $file) -InputObject $file
    }
    $fileGroup.Group |
        Group-Object ContentHash |
        Where-Object { $_.Count -gt 1 }
}

Output Example

Count Name                      Group
----- ----                      -----
    2 187 49 165 178 166 151... {Ann-DS1.imr, Ann-DS1.imr}
    2 230 243 103 209 89 0 1... {MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr, MQ02 Mailing Labels Laser for Pat Birthday by Post Dates.imr}
    2 129 199 76 16 247 255 ... {ANN-Folder-Subfolder Listing.imr, ANN-Folder-Subfolder Listing.imr}
    2 16 249 151 209 119 71 ... {Annotation Report.imr, Annotation Report.imr}
    2 114 146 127 255 123 23... {Annotation.mdb, Annotation.mdb}

Leave a comment