The address parser extracts the semantic components of a physical address such as a domicile, delivery, or invoice address.
The challenge
Unstructured data: Real-world address data is semi-structured. It follows common patterns, but is not separated into fully detailed attributes. For example street name and house number commonly appear in one designated field. Address data may also appear in plaintext form in a single field. Customers tend to mix things up, and squeeze additional information into the available space.
Internationality: Address formats aren’t universally the same.
The solution
Optimaize uses open-source software and open data for address parsing and geocoding, and has built functionality on top of that. Trained neural networks, address databases, address term dictionaries, specialized parsers with syntactic parsing rules.
Semantic data types
Postal code, place name, country name, country code, county/state name, county/state code, street name, building identifier (house number), block/staircase/floor/apartment, delivery note, postbox.
Applications
Address parsing is used in SearchCluster for correctly and efficiently indexing and finding records, and for performing in-detail record matching.
Geocoder
Geocoding is the process of taking a textual address location and finding the latitude and longitude coordinates. SearchCluster uses this to enhance records for geospacial search, and to provide additional context about a record.