Deleting duplicate contacts in MDaemon address book

Somehow the contacts for an email account amassed almost 7k entries. Upon manual review, it seemed there were at least 3 entries for each contact. Here’s a powershell script to create a new MDaemon AddrBook.mrk file with removed duplicates. The script as written compares if the first name, last name, and full name are concatenated together into a single key. This means it requires all three fields to match to consider a contact as a duplicate. It isn’t 100% but it helped significantly reduce the duplicates to a manageable number for reviewing manually.

Here’s an abbreviated format of the AddrBook.mrk file:

<?xml version="1.0" encoding="UTF-8"?>  
<addressBook version="9.5.5" encoding="utf-8" lastModified="2024-07-08T18:05:22.885Z" fid="{1d30e328-d4f6-4e98-bd22-c99fd9523ab3}" NextID="1">  
  <contact>  
    <guid><![CDATA[73e3293156c74f8195dac9d1625645e2]]></guid>  
    <modified>2022-06-13 13:59:33</modified>  
    <firstName><![CDATA[John]]></firstName>  
    <lastName><![CDATA[Doe]]></lastName>  
    <fullName><![CDATA[John Doe]]></fullName>  
    <busCompany><![CDATA[Acme Widgets]]></busCompany>  
    <homeMobile><![CDATA[+1 (555) 555-5555]]></homeMobile>  
    <comment><![CDATA[  
]]></comment>  
  </contact>  
  <contact>  
    <guid><![CDATA[b17e84fa82562e63eef1623ed8475063]]></guid>  
    <modified>2022-06-13 13:59:33</modified>  
    <firstName><![CDATA[Jane]]></firstName>  
    <lastName><![CDATA[Doe]]></lastName>  
    <fullName><![CDATA[Jane Doe]]></fullName>  
    <homeMobile><![CDATA[(555) 555-5555]]></homeMobile>  
    <comment><![CDATA[  
]]></comment>  
  </contact>  
</addressBook>

Here’s the powershell code to extract non duplicate entries into a new file:

# Load the XML file from MDaemon (\users\domain.tld\username\Contacts.IMAP\AddrBook.mrk)  
[xml]$xml = Get-Content -Path "addrbook.mrk"  
  
# Create a hash table to store unique contacts  
$uniqueContacts = @{}  
  
# Function to generate a unique key for each contact  
function Get-ContactKey {  
    param (  
        $firstName,  
        $lastName,  
        $fullName  
    )  
    return "$firstName|$lastName|$fullName"  
}  
  
# Access the contact nodes directly  
$contacts = $xml.addressBook.contact  
  
# Get the total number of contacts before removing duplicates  
$totalContactsBefore = $contacts.Count  
  
# Create a list to store the contacts to remove  
$contactsToRemove = @()  
  
# Iterate through each contact and identify duplicates  
foreach ($contact in $contacts) {  
    $firstName = $contact.firstName.'#cdata-section'  
    $lastName = $contact.lastName.'#cdata-section'  
    $fullName = $contact.fullName.'#cdata-section'  
  
    $key = Get-ContactKey -firstName $firstName -lastName $lastName -fullName $fullName  
  
    if (-not $uniqueContacts.ContainsKey($key)) {  
        # Add contact to unique contacts  
        $uniqueContacts[$key] = $contact  
    } else {  
        # Duplicate found, add to list to remove later  
        $contactsToRemove += $contact  
    }  
}  
  
# Remove duplicate contacts  
foreach ($contact in $contactsToRemove) {  
    $contact.ParentNode.RemoveChild($contact) > $null  
}  
  
# Get the total number of contacts after removing duplicates  
$totalContactsAfter = $xml.addressBook.contact.Count  
  
# Save the updated XML file  
$xml.Save("updated_contacts.xml")  
  
# Write the total numbers to the console  
Write-Output "Total contacts before removing duplicates: $totalContactsBefore"  
Write-Output "Total contacts after removing duplicates: $totalContactsAfter"  

Total contacts before removing duplicates: 6871
Total contacts after removing duplicates: 2053

Published by

Rich

Just another IT guy.

Leave a Reply

Your email address will not be published. Required fields are marked *