Python and Elasticsearch 101

Overview

Explore Python integration with Elasticsearch in this conference talk from EuroPython 2015. Learn how Elasticsearch evolved from a secondary database and search engine to a full-featured database with snapshot/restore capabilities. Discover various use cases, including storing business events and IoT device metrics. Gain insights into Elasticsearch's schema-less nature, document structure, and mapping concepts. Understand data organization in indices and types, as well as the importance of proper index preparation and mapping management. Explore data insertion methods, including rivers and the index API. Delve into query and filter mechanisms, faceting for aggregations, and Lucene's role in text analysis and search ranking. Learn about administrative tasks such as reindexing, available plugins, and security features. Master the integration of Elasticsearch into your Python stack for efficient data storage and retrieval.

Syllabus

Intro
WHAT IS ELASTICSEARCH You know, for search
WHY DO WE USE ELASTICSEARCH • To store the full telemetry of all our MDM devices Position, actual speed, batteries metrics, alarms, temperature . To store business events to let operational know what happens in their service Events when a customer takes a car akandamentos facturation process
SCHEMA-LESS? I've said Elasticsearch is schema-less. Yes, the schema is not enforced by the database, but you can define a mapping, a definition on the type of the data integer, String. Dates. Datetimes So data are organized in Indices, like a database in a classical RDEM. Each Index has one or more type which can be interpreted as a table You can ask the Elasticsearch for the mapping in a specific index, and man mapping
DOCUMENT AND MAPPING I clearly enjoin you to prepare your index and create and manage the mapping yourself
Elasticsearch stores data in Shard. Basically you have 5 shards per Nodes. Under the hood, Es will duplicate shards to one or multiple nodes. Also, Shards will balance shards on nodes to spread the search load
INSERTING DATA . Using a river. A river is a kind of link between Elasticsearch and a service, for instance RabbitMQ, or Twitter . Using the index API. The Index API is globally a POST onto an index, and a PUT to modify an existing document. Note: There is also a bulk mechanism to index a lot of data
Queries will be more complicated than just an exact match on a field to a value Query and Filter are the same, but query specify how well the results is good with a ranking
Faceting can be imagined as a Group By with a count. It's extensively used in online store, when you fiter products by colors, brands, sizes... The page displays a counter of available products. The query will return both products results AND the facets (counts per brands for example)
Elasticsearch use Lucene to extract information using tokenizers, and analize the text to store it correctly. Thus, ES is able to search if a document match other documents, and return results with ranking score
ADMINISTRATIVE TASKS If your mapping changes, the data won't be updated, and you need to reindex all your data. Es provides an API to reindex your actual data to a new one: elasticsearch.helpers.reindex
Elasticsearch comes with a lots of plugins you should use (HEAD). Rivers are also plugins (RabbitMQ River, Twitter River, Kafka river) There is a python script plugin to do python in your queries Elastic.co provides Shield and Marvel to adds security and monitoring to your cluster

Taught by

EuroPython Conference

Reviews

Start your review of Python and Elasticsearch 101

Taught by

Elasticsearch Tutorial for Beginners

Never Stop Learning.