RiskDetector - Optimaize

The Risk Detector flags various types of possibly fake and dummy data in person records.

Live Demo

Risk Detector

Examples by culture:

Examples:

How it works

The methods used to classify input data are:

Positive validation

by entity and data type.

Person’s name: does it parse (parse tree validation), term and culture confidence consideration.

Physical address: does it parse (parse tree validation), can we confirm the existence of the parts, can we geocode it. For example does the street address exist within the given postal code and place name.

Telephone number: does it parse correctly into a number for that place, does the area code exist.

Email address: syntax validation, does the domain name exist, are there MX records and do they respond (is mail set up for that domain). Known formats per domain name, domain classification into risk scores.

There are more datatype specific validators, such as country-specific social security numbers, personal IDs, company registration numbers, IBAN, and many more.

Negative checks

by entity and data type.

AI neural network trained on classified good and bad data.
Dictionary lookups for manually categorized terms.
Rule-based computations, for random typing input and other nonsense.

Types of invalid input

Random typing. Example “asdf asdf”
Placeholders entities, examples “John Doe”, “Anytown”
Famous and fictional entities, examples “James Bond”, “Barak Obama”
Humorous, invalid, vulgar input. Examples: “Sandy Beach”, “Timbuckthree”, “None of your business”, “firstname lastname”, “test1 test2”

DisguiseD input

Such mangled input is used to circumvent machine processing, such as a credit check.

Padding, adding content to the left/right of a value. Example: “XXXJohnXXX“
Stutter typing, example: “Petttttttttterson“
Spaced typing, example: “P e t e r M i l l e r“

Input/Output

The Risk Detector uses the common Ontology as input objects to its REST API. This allows the integrator to send anything from single fields to complete records.

Possible input values are the person’s name, telephone numbers and email addresses, and physical addresses. The API is flexible in how the data is fed, for example the name can come in as a single string “full name” or separated into specific name fields.

The output contains an overall risk score in the range -1 to +1, plus detailed information about every detected risk.

A score > 0 means there’s a risk. Zero is the neutral value; nothing bad detected yet there’s nothing good either. A negative value means that the record does look genuine.