Email Matcher


Comparing email addresses is Boolean – same or not – only at a first glance. In detailed comparison there are shades of grey.

The Optimaize Matcher distinguishes the categories:

EQUAL, EQUIVALENT, MATCHING, SIMILAR, NOTSIMILAR, DIFFERENT.

Different use cases ask for different strictness

Common use cases for comparing email addresses are:

  • login; the email address serves as username
  • preventing duplicate accounts; the email address serves as a unique key
  • search; finding a person record

Depending on the case, more or less strict equality checking should be applied.

Caseless

An email address is made up of the local-part and the domain, left and right separated by the @ sign.

Example: John.Doe@Example.com
local-part => John.Doe, domain => Example.com

The domain is caseless by definition, while the local-part is caseless in practice. And that’s the one and only simplification that almost every application applies before matching: converting to lower case.

That’s still black and white, and where the grey tones begin.

Equivalent

Reasons for why two email addresses may deliver to the same inbox:

  • Ignored variation in local-part.
    For example Gmail ignores dots, john.doe@gmail is the same as johndoe@gmail and j.o.h.n.d.o.e@gmail.
  • Ignored part in local-part.
    For example virtual folders. john.doe+foldername@provider.
  • Alias domains.
    For example @gmail and @googlemail.

These types of equality require knowledge of the individual email providers, and to keep up to date with changes.

Matching

Full name vs. partial name in local-part on a private or business domain has a high likeliness of being the same inbox: john.doe@example.com, john@example.com, jd@example.com

Similar

There is similarity in having the same private domain. There is similarity in having the same local-part on another domain, given certain factors.

This component is an integral part of our SearchCluster solution.