Comparing email addresses is Boolean – same or not – only at a first glance. In detailed comparison there are shades of grey.
The Optimaize Matcher distinguishes the categories:
EQUAL, EQUIVALENT, MATCHING, SIMILAR, NOTSIMILAR, DIFFERENT.
Different use cases ask for different strictness
Common use cases for comparing email addresses are:
- login; the email address serves as username
- preventing duplicate accounts; the email address serves as a unique key
- search; finding a person record
Depending on the case, more or less strict equality checking should be applied.
Caseless
An email address is made up of the local-part and the domain, left and right separated by the @ sign.
Example: John.Doe@Example.com
local-part => John.Doe, domain => Example.com
The domain is caseless by definition, while the local-part is caseless in practice. And that’s the one and only simplification that almost every application applies before matching: converting to lower case.
That’s still black and white, and where the grey tones begin.
Equivalent
Reasons for why two email addresses may deliver to the same inbox:
- Ignored variation in local-part.
For example Gmail ignores dots, john.doe@gmail is the same as johndoe@gmail and j.o.h.n.d.o.e@gmail. - Ignored part in local-part.
For example virtual folders. john.doe+foldername@provider. - Alias domains.
For example @gmail and @googlemail.
These types of equality require knowledge of the individual email providers, and to keep up to date with changes.
Matching
Full name vs. partial name in local-part on a private or business domain has a high likeliness of being the same inbox: john.doe@example.com, john@example.com, jd@example.com
Similar
There is similarity in having the same private domain. There is similarity in having the same local-part on another domain, given certain factors.
This component is an integral part of our SearchCluster solution.