Somehow the contacts for an email account amassed almost 7k entries. Upon manual review, it seemed there were at least 3 entries for each contact. Here’s a powershell script to create a new MDaemon AddrBook.mrk file with removed duplicates. The script as written compares if the first name, last name, and full name are concatenated together into a single key. This means it requires all three fields to match to consider a contact as a duplicate. It isn’t 100% but it helped significantly reduce the duplicates to a manageable number for reviewing manually.
Here’s an abbreviated format of the AddrBook.mrk file:
<?xml version="1.0" encoding="UTF-8"?>
<addressBook version="9.5.5" encoding="utf-8" lastModified="2024-07-08T18:05:22.885Z" fid="{1d30e328-d4f6-4e98-bd22-c99fd9523ab3}" NextID="1">
<contact>
<guid><![CDATA[73e3293156c74f8195dac9d1625645e2]]></guid>
<modified>2022-06-13 13:59:33</modified>
<firstName><![CDATA[John]]></firstName>
<lastName><![CDATA[Doe]]></lastName>
<fullName><![CDATA[John Doe]]></fullName>
<busCompany><![CDATA[Acme Widgets]]></busCompany>
<homeMobile><![CDATA[+1 (555) 555-5555]]></homeMobile>
<comment><![CDATA[
]]></comment>
</contact>
<contact>
<guid><![CDATA[b17e84fa82562e63eef1623ed8475063]]></guid>
<modified>2022-06-13 13:59:33</modified>
<firstName><![CDATA[Jane]]></firstName>
<lastName><![CDATA[Doe]]></lastName>
<fullName><![CDATA[Jane Doe]]></fullName>
<homeMobile><![CDATA[(555) 555-5555]]></homeMobile>
<comment><![CDATA[
]]></comment>
</contact>
</addressBook>
Here’s the powershell code to extract non duplicate entries into a new file:
# Load the XML file from MDaemon (\users\domain.tld\username\Contacts.IMAP\AddrBook.mrk)
[xml]$xml = Get-Content -Path "addrbook.mrk"
# Create a hash table to store unique contacts
$uniqueContacts = @{}
# Function to generate a unique key for each contact
function Get-ContactKey {
param (
$firstName,
$lastName,
$fullName
)
return "$firstName|$lastName|$fullName"
}
# Access the contact nodes directly
$contacts = $xml.addressBook.contact
# Get the total number of contacts before removing duplicates
$totalContactsBefore = $contacts.Count
# Create a list to store the contacts to remove
$contactsToRemove = @()
# Iterate through each contact and identify duplicates
foreach ($contact in $contacts) {
$firstName = $contact.firstName.'#cdata-section'
$lastName = $contact.lastName.'#cdata-section'
$fullName = $contact.fullName.'#cdata-section'
$key = Get-ContactKey -firstName $firstName -lastName $lastName -fullName $fullName
if (-not $uniqueContacts.ContainsKey($key)) {
# Add contact to unique contacts
$uniqueContacts[$key] = $contact
} else {
# Duplicate found, add to list to remove later
$contactsToRemove += $contact
}
}
# Remove duplicate contacts
foreach ($contact in $contactsToRemove) {
$contact.ParentNode.RemoveChild($contact) > $null
}
# Get the total number of contacts after removing duplicates
$totalContactsAfter = $xml.addressBook.contact.Count
# Save the updated XML file
$xml.Save("updated_contacts.xml")
# Write the total numbers to the console
Write-Output "Total contacts before removing duplicates: $totalContactsBefore"
Write-Output "Total contacts after removing duplicates: $totalContactsAfter"
Total contacts before removing duplicates: 6871
Total contacts after removing duplicates: 2053