Overview
Syllabus
Intro
WHAT IS ELASTICSEARCH You know, for search
WHY DO WE USE ELASTICSEARCH • To store the full telemetry of all our MDM devices Position, actual speed, batteries metrics, alarms, temperature . To store business events to let operational know what happens in their service Events when a customer takes a car akandamentos facturation process
SCHEMA-LESS? I've said Elasticsearch is schema-less. Yes, the schema is not enforced by the database, but you can define a mapping, a definition on the type of the data integer, String. Dates. Datetimes So data are organized in Indices, like a database in a classical RDEM. Each Index has one or more type which can be interpreted as a table You can ask the Elasticsearch for the mapping in a specific index, and man mapping
DOCUMENT AND MAPPING I clearly enjoin you to prepare your index and create and manage the mapping yourself
Elasticsearch stores data in Shard. Basically you have 5 shards per Nodes. Under the hood, Es will duplicate shards to one or multiple nodes. Also, Shards will balance shards on nodes to spread the search load
INSERTING DATA . Using a river. A river is a kind of link between Elasticsearch and a service, for instance RabbitMQ, or Twitter . Using the index API. The Index API is globally a POST onto an index, and a PUT to modify an existing document. Note: There is also a bulk mechanism to index a lot of data
Queries will be more complicated than just an exact match on a field to a value Query and Filter are the same, but query specify how well the results is good with a ranking
Faceting can be imagined as a Group By with a count. It's extensively used in online store, when you fiter products by colors, brands, sizes... The page displays a counter of available products. The query will return both products results AND the facets (counts per brands for example)
Elasticsearch use Lucene to extract information using tokenizers, and analize the text to store it correctly. Thus, ES is able to search if a document match other documents, and return results with ranking score
ADMINISTRATIVE TASKS If your mapping changes, the data won't be updated, and you need to reindex all your data. Es provides an API to reindex your actual data to a new one: elasticsearch.helpers.reindex
Elasticsearch comes with a lots of plugins you should use (HEAD). Rivers are also plugins (RabbitMQ River, Twitter River, Kafka river) There is a python script plugin to do python in your queries Elastic.co provides Shield and Marvel to adds security and monitoring to your cluster
Taught by
EuroPython Conference