Somehow the contacts for an email account amassed almost 7k entries. Upon manual review, it seemed there were at least 3 entries for each contact. Here’s a powershell script to create a new MDaemon AddrBook.mrk file with removed duplicates. The script as written compares if the first name, last name, and full name are concatenated together into a single key. This means it requires all three fields to match to consider a contact as a duplicate. It isn’t 100% but it helped significantly reduce the duplicates to a manageable number for reviewing manually.
Here’s an abbreviated format of the AddrBook.mrk file:
<?xml version="1.0" encoding="UTF-8"?> <addressBook version="9.5.5" encoding="utf-8" lastModified="2024-07-08T18:05:22.885Z" fid="{1d30e328-d4f6-4e98-bd22-c99fd9523ab3}" NextID="1"> <contact> <guid><![CDATA[73e3293156c74f8195dac9d1625645e2]]></guid> <modified>2022-06-13 13:59:33</modified> <firstName><![CDATA[John]]></firstName> <lastName><![CDATA[Doe]]></lastName> <fullName><![CDATA[John Doe]]></fullName> <busCompany><![CDATA[Acme Widgets]]></busCompany> <homeMobile><![CDATA[+1 (555) 555-5555]]></homeMobile> <comment><![CDATA[ ]]></comment> </contact> <contact> <guid><![CDATA[b17e84fa82562e63eef1623ed8475063]]></guid> <modified>2022-06-13 13:59:33</modified> <firstName><![CDATA[Jane]]></firstName> <lastName><![CDATA[Doe]]></lastName> <fullName><![CDATA[Jane Doe]]></fullName> <homeMobile><![CDATA[(555) 555-5555]]></homeMobile> <comment><![CDATA[ ]]></comment> </contact> </addressBook>
Here’s the powershell code to extract non duplicate entries into a new file:
# Load the XML file from MDaemon (\users\domain.tld\username\Contacts.IMAP\AddrBook.mrk) [xml]$xml = Get-Content -Path "addrbook.mrk" # Create a hash table to store unique contacts $uniqueContacts = @{} # Function to generate a unique key for each contact function Get-ContactKey { param ( $firstName, $lastName, $fullName ) return "$firstName|$lastName|$fullName" } # Access the contact nodes directly $contacts = $xml.addressBook.contact # Get the total number of contacts before removing duplicates $totalContactsBefore = $contacts.Count # Create a list to store the contacts to remove $contactsToRemove = @() # Iterate through each contact and identify duplicates foreach ($contact in $contacts) { $firstName = $contact.firstName.'#cdata-section' $lastName = $contact.lastName.'#cdata-section' $fullName = $contact.fullName.'#cdata-section' $key = Get-ContactKey -firstName $firstName -lastName $lastName -fullName $fullName if (-not $uniqueContacts.ContainsKey($key)) { # Add contact to unique contacts $uniqueContacts[$key] = $contact } else { # Duplicate found, add to list to remove later $contactsToRemove += $contact } } # Remove duplicate contacts foreach ($contact in $contactsToRemove) { $contact.ParentNode.RemoveChild($contact) > $null } # Get the total number of contacts after removing duplicates $totalContactsAfter = $xml.addressBook.contact.Count # Save the updated XML file $xml.Save("updated_contacts.xml") # Write the total numbers to the console Write-Output "Total contacts before removing duplicates: $totalContactsBefore" Write-Output "Total contacts after removing duplicates: $totalContactsAfter"
Total contacts before removing duplicates: 6871
Total contacts after removing duplicates: 2053