Maximizing Elasticsearch's performance and cost efficiency

by Antoni Olendzki

Elasticsearch is a powerful text data search and analytics platform, but managing performance and costs becomes critical as your data grows. Here’s how to get the most out of your Elasticsearch deployment.

1. Optimized indexing strategy

Use _bulk API when indexing your data

Using the _bulk API when indexing your data is crucial for ensuring the best write performance. The _bulk API allows you to send multiple indexing or update operations in a single request, significantly reducing the overhead associated with otherwise used individual writes (one document per API call). This approach minimizes the network and Elasticsearch cluster load, leading to faster data ingestion and lower latency.

Use appropriate shard sizes

Why shard size matters: shards are the basic unit of storage and indexing in Elasticsearch. They allow you to distribute data and search load across multiple nodes in a cluster. However, improper shard sizing can lead to significant performance bottlenecks.

Optimal shard size: aim for shard sizes between 20 and 40 GB. Many small shards can lead to excessive overhead in cluster management, which can overwhelm the cluster state management and lead to performance degradation. Conversely, huge shards can cause slow search and indexing performance, increasing the I/O operations required for read and write tasks.

Monitoring and adjustment: regularly monitor shard states (even the simplest Elasticsearch _cat/shards API will provide you with enough information) to keep track of shard distributions and sizes. If shards are too small or too large, you can always use the _shrink or _split API to resize them. The _shrink API can consolidate small shards into fewer, larger ones, while the _split API can break down large shards into smaller, more manageable sizes. However (and it’s a big gotcha however), these operations do not modify the existing index - they just reindex all the data into a completely new Elasticsearch index - so it might not be applicable as effectively as it may put downtime on your services.

Manage mappings and settings

The importance of lean mappings: mappings in Elasticsearch define how documents and fields are indexed and stored. Overly complex mappings can introduce unnecessary overhead and slow down indexing operations. Keep mappings lean by defining only the fields you need and setting appropriate data types.

Use “text” type only when necessary: as much as enabling full-text search capabilities for all textual data is tempting, this isn’t free. Entries of type “text” need to be tokenized, which involves extra processing and storage, so you might end up paying extra for functionality that’s effectively not in use. Consider “match_only_text” and “keyword” types instead.

Using dynamic templates: dynamic templates allow you to define rules that Elasticsearch uses to dynamically apply mappings to fields based on their names or data types. This ensures consistency and reduces manual effort. For instance, you can set up a dynamic template to automatically index all fields that start with “log_” as `match_only_text` fields. This will help you avoid inferring unwanted/incorrect data types on ingestion (which will require costly reindexing to be corrected).

Field data types and properties: choose the right data type for each field. For example, use keywords for structured data like tags or IDs, text for full-text search fields, date for timestamps, and boolean for true/false values. Using appropriate data types improves both indexing and search performance. Remember that you can also set field-specific properties and types to enhance indexing performance and storage size further. You can watch the presentation Przemek — our founding engineer — gave at OpenSearchCon 2024 in Berlin about this [LINK].

Review and clean up: regularly review and clean up your mappings. Remove unused fields and mappings that no longer serve a purpose. This keeps your indices lightweight and reduces storage costs. However, remember that this operation requires reindexing your data and may result in downtime.

Implementing these data organization strategies can significantly improve the performance and cost-efficiency of your Elasticsearch cluster. Remember to continuously monitor and adjust your configurations as your data and use cases evolve. Let’s now look at another side of the equation—how to retrieve our now well-structured data properly.

2. Efficient querying techniques

Leverage filters

Why filters are faster: filters are used to narrow down search results based on specific criteria without scoring. Since they don’t involve relevance scoring, they are faster and can be cached. Once a filter is computed, Elasticsearch can reuse the result for subsequent searches, significantly speeding up query performance.

Use cases for filters: filters are ideal for operations where document relevance is irrelevant. For example, filtering logs by date range or status code doesn’t require scoring; it simply needs to include or exclude documents based on the criteria.

Implementing Filters: apply filters using the filter clause within the bool query. This ensures that the filter is cached and reused.

Keep in mind that Elasticsearch can leverage its internal query cache. You can structure your queries so that they can take advantage of this feature - there’s an excellent write-up on ES caching mechanics here.

3. Resource management

Monitor cluster health

Importance of monitoring: regular monitoring of your Elasticsearch cluster is crucial for maintaining performance and stability. Elasticsearch provides APIs to help you monitor cluster health, node performance, and overall resource usage.

Common issues to monitor:

Unassigned shards: these can indicate problems with node availability or resource allocation. Use the _cat/shards API to identify and reallocate unassigned shards.

High resource usage: monitor CPU, memory, and disk I/O to identify nodes under heavy load. Use the _nodes/stats API for detailed node statistics.

Node state: promptly address node failures to maintain cluster integrity and prevent data loss. Use the _cat/nodes API to check node status.

Optimize node configurations

JVM heap size: set the JVM heap size to about 50% of your system’s RAM. You can read more about how to set it up here.
Disk allocation settings: ensure that your nodes have sufficient disk space and configure high and low watermarks to prevent nodes from running out of disk space. Remember to consider your shard sizes here, as shard size represents the smallest unit of replication.
Regular maintenance: to keep your cluster running smoothly, perform regular maintenance tasks such as clearing old snapshots, pruning unnecessary indices, and optimizing shard allocation.

By actively monitoring your cluster health and optimizing node configurations, you can ensure your Elasticsearch deployment remains robust, responsive, and scalable.

4. Cost optimization

Use index lifecycle management (ILM)

Automate index lifecycle policies: Elasticsearch’s Index Lifecycle Management (ILM) helps you manage the lifecycle of your indices by automatically transitioning them through different phases based on age or usage patterns. This allows you to optimize both storage costs and performance.

ILM phases

Hot phase: for actively written and queried data, stored on the fastest (and most expensive) storage.
Warm phase: for read-only data that is less frequently accessed, moved to slower, less expensive storage.
Cold phase: for rarely accessed data, stored on the slowest and cheapest storage available.
Delete phase: it is scheduled for deletion for data that no longer needs to be deleted.

Implementing these cost optimization strategies will ensure that your Elasticsearch cluster remains efficient and cost-effective, allowing you to scale your operations without incurring unnecessary expenses. You can also consider offloading parts of all of your data to even cheaper and more effective storage, like columnar SQL databases. Quesma can integrate this new engine with your Elastic stack. This approach is especially effective if the nature of your searches and dashboards is more aggregation/analytical than full-text search-oriented.

Benefits of ILM

Cost efficiency: Transitioning data through different storage tiers can significantly reduce storage costs while maintaining performance.
Operational simplicity: Automating the lifecycle management of indices reduces the need for manual intervention, allowing your team to focus on other critical tasks.

5. Scalability planning

Scale horizontally

Why horizontal scaling: as your data and query volume grow, adding more nodes to your Elasticsearch cluster distributes the load, ensuring that performance remains stable. Elasticsearch is designed to scale horizontally, so you can seamlessly add new nodes to the cluster without affecting the existing setup.

Regularly review cluster topology: review your cluster topology periodically to ensure it meets current demand. Monitor metrics such as node load, query latency, and index size to identify when scaling is necessary. Use the _cat/nodes and _cat/indices APIs to gather information about node status and index/shard distribution.

Best practices for adding nodes

Balanced shard allocation: ensure that shards are evenly distributed across all nodes to avoid bottlenecks.
Node types: use dedicated master, data, and coordinating nodes to optimize cluster performance and stability.

Implementing these best practices can significantly enhance the performance and cost efficiency of your Elasticsearch deployment. For more in-depth strategies and personalized support, discover how Quesma can help optimize your data management solutions. Contact us via this link to unlock the full potential of your Elasticsearch setup and ensure your operations run smoothly and effectively.

Table of Contents

Title