Elasticsearch is the go-to open-source search engine for logs, application search, analytics and more. This Elasticsearch tutorial walks you from the basic concepts — index, document, shard — to practical queries, aggregations, scaling tips and real-world patterns. If you’re wondering how to index data, run full-text queries, or tune a cluster for production, this guide gives clear examples and honest advice from experience. Expect short, useful snippets and a few opinions (I think mapping early saves headaches).
What is Elasticsearch and why use it?
At its core, Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It lets you store, search, and analyze large volumes of data quickly. Think logs, product catalogs, or any data where full-text search and aggregations matter.
For a concise background, see the project overview on Wikipedia. For official specs and API reference, the Elastic documentation is essential.
Core concepts (quick, practical)
- Cluster: One or more nodes working together (Elasticsearch cluster).
- Node: A single running instance of Elasticsearch.
- Index: A logical namespace that holds documents (like a database).
- Document: A JSON object stored in an index (like a row).
- Shard: A slice of an index. Sharding enables scaling horizontally.
- Replica: Copies of shards for high availability.
Quick start: install and run
On macOS, use Homebrew; on Linux follow the official packages. Once installed, start a single-node cluster for development:
# start elasticsearch (example on macOS with Homebrew)
brew services start elastic/tap/elasticsearch-full
Verify with the REST API:
curl -sS http://localhost:9200/ | jq
Indexing data: practical examples
Mapping matters. If you don’t set types, Elasticsearch guesses — and it often guesses wrong. From what I’ve seen, define mappings for fields you query often.
Example: create an index with a mapping for a product catalog:
PUT /products
{
“mappings”: {
“properties”: {
“name”: { “type”: “text” },
“category”: { “type”: “keyword” },
“price”: { “type”: “float” },
“in_stock”: { “type”: “boolean” },
“release_date”: { “type”: “date” }
}
}
}
Then index a document:
POST /products/_doc
{ “name”: “Wireless Mouse”, “category”: “electronics”, “price”: 29.99, “in_stock”: true }
Querying: match, term, and boolean logic
Simple full-text search:
POST /products/_search
{ “query”: { “match”: { “name”: “wireless mouse” } } }
Exact value (keyword fields):
POST /products/_search
{ “query”: { “term”: { “category”: “electronics” } } }
Combine with boolean queries for real-world filters:
POST /products/_search
{
“query”: {
“bool”: {
“must”: [{ “match”: { “name”: “wireless” }}],
“filter”: [{ “term”: { “in_stock”: true }}, { “range”: { “price”: { “lte”: 50 }}}]
}
}
}
Aggregations: analytics at scale
Aggregations give you metrics, histograms, top-values — very handy for dashboards and analytics.
POST /products/_search
{
“size”: 0,
“aggs”: {
“by_category”: { “terms”: { “field”: “category” }},
“price_stats”: { “stats”: { “field”: “price” }}
}
}
Monitoring, health, and common commands
- Cluster health: GET /_cluster/health
- List indices: GET /_cat/indices?v
- Node stats: GET /_nodes/stats
Tip: Use Kibana for visual monitoring and a better developer UX for queries; Kibana integrates tightly with Elasticsearch.
Scaling and architecture patterns
When moving to production, consider:
- Shard sizing: a shard too small wastes resources; too large increases recovery time.
- Replica count: replicas provide fault tolerance and improve read throughput.
- Hot-warm architecture: keep recent data on fast storage, older data on cheaper nodes.
Managed services (like AWS OpenSearch Service) can simplify operations — see AWS docs for details: Amazon OpenSearch Service.
Comparison: Elasticsearch vs OpenSearch vs Solr
| Feature | Elasticsearch | OpenSearch | Solr |
|---|---|---|---|
| License | Elastic license (proprietary additions) | Apache 2.0 (fork) | Apache 2.0 |
| Ecosystem | Rich with Kibana, Beats, Logstash | Compatible fork with OpenSearch Dashboards | Mature, Java-based |
| Use cases | Search & analytics | Search & analytics | Search, enterprise search |
Security best practices
- Enable TLS for node-to-node and client communication.
- Use role-based access control and the built-in security features.
- Isolate management interfaces and restrict IP access.
Note: Production clusters should never expose port 9200 publicly without strong controls.
Common pitfalls and troubleshooting
- Wrong mappings: leads to poor relevance or slow queries.
- Too many shards: wastes memory and slows cluster state operations.
- Not monitoring JVM and GC: can cause node instability.
If something breaks, start with GET /_cluster/health and node logs; many problems show up there first.
Practical example: building a simple app search
Steps I usually follow:
- Define clear mappings for searchable fields.
- Index sample data and iterate relevance with match and multi_match.
- Add filters and aggregations for facets and analytics.
- Monitor query latency and tune analyzers (stopwords, synonyms).
Resources and docs
Authoritative docs and references you’ll want to bookmark:
- Official Elasticsearch Reference — API and configuration details.
- Elasticsearch on Wikipedia — history and overview.
- Amazon OpenSearch Service docs — managed alternatives and migration notes.
Next steps and learning path
If you’re just starting: index sample datasets, play with Kibana, and test queries. For intermediate users: explore ingest pipelines, custom analyzers, and scaling strategies. What I’ve noticed: small experiments reveal a lot about mapping and relevance.
Final thought: Elasticsearch is powerful but opinionated. Spend time on mappings and monitoring early — you’ll thank yourself later.
Frequently Asked Questions
Elasticsearch stores JSON documents in indices, splits each index into shards for distribution, and uses inverted indices to perform fast full-text searches across shards.
An index is a logical namespace that holds documents with similar structure — think of it like a database table optimized for search and analytics.
Use Elasticsearch for full-text search, analytics, and high-cardinality aggregations. Use relational databases for transactional integrity and complex joins.
Scale by adding nodes, tuning shard counts, using replicas for read throughput, and implementing hot-warm node tiers for storage and performance optimization.
Recent versions include built-in security features, but you should enable TLS, authentication, and role-based access control for production deployments.