Schema


Your existing data works with SearchCluster. Your records are mapped to a document-style schema, with nested entities, and semantic types.

Step 1/3: Your Original Data

Your data store or data feed looks something like these.

  • Tabular Data
  • Relational Data
  • Document Data

Tabular data is flat, with columns and rows. Some sources are short and simple, others with dozens of fields. It usually comes from an SQL database or CSV feed, and looks like the data in a spreadsheet.

Relational data is the collection of multiple sets of tabular data, usually with links based on keys and ids.

Document data can be simple, or with nested objects, or complex with deeply nested arrays and objects. This is how natural data looks, if not made flat to squeeze into a database table, or split and normalized into multiple relational database tables.

Step 2/3: Data Record Feed

The record as you send it to SearchCluster for create, update and search operations is what you receive back in search and read API calls. It can be in JSON, XML or CSV format.

  • Simple CSV
  • Grouped CSV
  • Flat JSON
  • nested structure

That’s how data looks coming from an SQL database table. Spreadsheet-like.

ID;FIRSTNAME;LASTNAME;GENDER;DATEOFBIRTH;EMAIL;TEL;ZIP;CITY;STREET
1234;John;Doe;M;12/31/2002;john.doe@example.com;(555) 123-4567;12345;Testville;456 Demo Avenue

Emulated namespace in column names using prefixes, to group fields into entities.

FN;MN;LN;DELIVERY_ZIP;DELIVERY_CITY;DELIVERY_STREET;INVOICE_ZIP;INVOICE_CITY;INVOICE_STREET
John;P.;Doe;12345;Testville;456 Test Road;54321;Sampletown;789 Sample Street

A flat JSON structure is the equivalent to a CSV.

{
    "id": "1234",
    "firstname": "John",
    "lastname": "Doe",
    "gender": "M",
    "dateofbirth": "12/31/2002",
    "email": "john.doe@example.com",
    "tel": "(555) 123-4567",
    "zip": "12345",
    "city": "Testville",
    "street": "456 Demo Avenue"
}

This is how a data structure looks when joining the data of multiple SQL tables to form one combined index. In reality the data can be more deeply nested, with more fields.

{
    "id": "1234",
    "created": "2022-01-31 15:45:23",
    "category": "CLASS_B",
    "customer": {
        "firstname": "John",
        "lastname": "Doe",
        "gender": "M",
        "dateofbirth": "2002-12-31"
    },
    "contact": {
        "email": "john.doe@example.com",
        "tel": "(555) 123-4567"
    },
    "bank_account": {
        "iban": "DE89370400440532013000",
        "account_owner": "John Daniel Doe"
    },
    "addresses": [{
        "type": "DELIVERY",
        "historical": false,
        "zip": "AB1 2CD",
        "city": "Testville",
        "street": "456 Test Road",
        "country": "UK"
    }, {
        "type": "INVOICE",
        "historical": false,
        "zip": "AB2 3CD",
        "city": "Sample City",
        "street": "789 Sample Street",
        "country": "UK"
    }, {
        "type": "INVOICE",
        "historical": true,
        "zip": "AB3 4CD",
        "city": "Old Town",
        "street": "789 Ancient Avenue",
        "country": "UK"
    }],
    "payments": [{
        "ts": "2024-10-26 14:12:32",
        "iban": "GB29NWBK60161331926819",
        "from": "John Daniel Doe",
        "amount": 3333.30,
        "currency": "GBP"
    },{
        "ts": "2024-04-26 09:05:59",
        "iban": "GB24BKEN10000031510604",
        "from": "Jane Smith-Doe",
        "amount": 3333.30,
        "currency": "GBP"
    }]
}

Step 3/3: Index Schema

SearchCluster uses an internal schema based on your Data Record format. Entities arranged into structures, Semantic data types.

A simple example to illustrate:

{
    "id": {
        "type": "FIELD",
        "dataType": "RECORDID"
    },
    "customer": {
        "type": "ENTITY",
        "entityType": "PERSON",
		"items": [{
			"name": "firstname",
			"type": "FIELD",
			"dataType": "GIVENNAME"
		},{
			"name": "lastname",
			"type": "FIELD",
			"dataType": "SURNAME"
		},{
			"name": "dob",
			"type": "FIELD",
			"dataType": "DATEOFBIRTH"
		}]
    },
    "address": {
        "type": "ENTITY",
        "entityType": "ADDRESS",
		"items": [{
			"name": "street",
			"type": "FIELD",
			"dataType": "STREET"
		},{
			"name": "zip",
			"type": "FIELD",
			"dataType": "POSTALCODE"
		},{
			"name": "city",
			"type": "FIELD",
			"dataType": "PLACENAME"
		},{
			"name": "country",
			"type": "FIELD",
			"dataType": "COUNTRY"
		}]
    }
}

Record versioning

Your indexed records (in the form of documents) may either contain only the latest versions of the records, or keep a history of edits. See the record versioning feature.

Multiple values per field

SearchCluster allows you to store multiple values per field or per structure.

Example for fields: an email address field accepts multiple values, either as a string comma separated “john.doe@example.com;jane.johnson@example.org” or list in JSON format [“john.doe@example.com”, “jane.johnson@user1]

Example for objects: an address structure can take in multiple instances using a JSON array.

<- Back to SearchCluster page.