Data Research


Classifying name data since 2007

World map with pins of native freelancer origins.
World map with pins of native freelancer origins.

The Optimaize PII Platform processes a wide set of Personal Identifiable Information (PII), which refers to any data that can be used to identify or distinguish an individual. This includes but is not limited to personal namesaddressesemail addressestitlesprofessions.

Our data research team, a group of specialized researchers and native collaborators, are collecting and classifying all parts of personal and legal names from around the world, including given namessurnames and their hypocorisms (short, diminutive, augmentative forms and nicknames). These are linked with additional information, including culture (language), gender, and sources. The resulting graph database reflects a comprehensive coverage of global naming conventions. 

With almost two decades of dedication, we’ve built the largest dictionary of names, providing an invaluable resource for our business.


Data Research

Manual Data

Expert data research, dictionaries. High quality, verified. 3 million terms.

Statistical Data

Large-scale statistical data, billions of data points.

Engineering

Rule-based logic

Software engineers write code

AI

Machine learning, neural networks

PII Platform

Graph Dictionaries, Statistical DBs, Neural Networks, Parsing rules and logic, Matching rules and logic, transcription and phonetic DBs and logic.


A World of Names: Every Language, Every Culture

Our graph dictionary contains over 3 million names and related terms, manually classified with sources, from all the cultures around the world. Through ongoing research, we strive to cover all names from every corner of the globe.

From the elegant curves of Arabic script to the intricate strokes of Hanzi characters, Optimaize embraces the richness of global naming traditions. Whether it’s the Cyrillic alphabet used across Slavic nations, or the unique systems like Hebrew, Greek, Devanagari (found in Hindi and Sanskrit), Burmese, Hangul (Korean), and countless others, our research transcends language barriers, ensuring our APIs can process names in all writing systems.

But names are just the beginning. We go beyond names, capturing all the details we need to understand personal identification. This includes titles (“Mr.” or “Ms.”) and professional designationsnicknames, both playful and formal, and relations between names, like patronymics and matronymics. Furthermore, we account for variations like gender-opposite variants and transliterations used across languages. Additionally, qualifiers like “Sr.” or “Jr.” and the building blocks of names themselves – prefixessuffixes, and name elements – are all stored and managed within our graph.

An International Team

To guarantee cultural accuracy, we prioritize collaboration with native speakers for data validation and data enhancement.

At the core of our approach is a strong commitment to accuracy. While thorough research is essential, the key to our methodology is the native review process, and cross-checking with multiple sources. Over the years, we’ve worked with around 100 native speakers from 60 different nations. Working with these experts goes beyond professional interaction; it’s an enriching experience that gives us a chance to explore their native languages, understand subtle expressions, and appreciate the cultural intricacies shaping them.

These experts help reviewing our data, discover new variant names, and linguistic relationships, enhancing the authenticity and richness of our data. Their contributions are indispensable when it comes to less frequently used names and more exotic cultures.