Elasticsearch 5 - New features

Say hello to the new Elastic Stack (ELK). I’m going to focus on the new Elasticsearch 5 features and changes as it is the product from the stack I use the most.

New features

Lucene 6.2

Elasticsearch 5.0.0 will use Lucene 6.2, a major version upgrade of Lucene.

This means big performance improvements!

Faster indexing (x2 from Elasticsearch 2)

In version 5.0, Elasticsearch gives shards that have heavy indexing a larger portion of the indexing buffer in the JVM heap.

New script language : Painless

Elasticsearch 5.0 includes a new scripting language designed to be fast and secure called “Painless”.

int total = 0;
for (int i = 0; i < doc['things'].length; ++i) {
  total += doc['things'][i];
}
return total;

Pipeline / Ingest node

To pre-process documents before indexing, you define a pipeline that specifies a series of processors. Each processor transforms the document in some way. For example, you may have a pipeline that consists of one processor that removes a field from the document followed by another processor that renames a field.

Here’s an example of a simple pipeline that dynamically adds a field to every indexed document.

PUT _ingest/pipeline/my-pipeline-id
{
  "my-pipeline-id" : {
    "description" : "describe pipeline",
    "processors" : [
      {
        "set" : {
          "field" : "foo",
          "value" : "bar"
        }
      }
    ]
  }
}

Index Shrinking

The shrink index API allows you to shrink an existing index into a new index with fewer primary shards.

POST my_source_index/_shrink/my_target_index
{
  "settings": {
    "index.number_of_replicas": 1,
    "index.number_of_shards": 1, 
    "index.codec": "best_compression" 
  },
  "aliases": {
    "my_search_indices": {}
  }
}

Rollover index

The rollover index API rolls an alias over to a new index when the existing index is considered to be too large or too old.

PUT /logs-000001 
{
  "aliases": {
    "logs_write": {}
  }
}

# Add > 1000 documents to logs-000001

POST /logs_write/_rollover 
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000
  }
}

# Response
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": "logs-000001",
  "new_index": "logs-000002",
  "rolled_over": true, 
  "dry_run": false, 
  "conditions": { 
    "[max_age: 7d]": false,
    "[max_docs: 1000]": true
  }
}

Re-index from remote

The most basic form of _reindex just copies documents from one index to another. This will copy documents from the twitter index into the new_twitter index

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

Reindex does not attempt to set up the destination index. It does not copy the settings of the source index. You should set up the destination index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.

A new completion suggester

The completion suggester has gained a lot of new features!

Near-real time
Deleted document filtering (previously it included deleted documents)
Multiple context support for filtering
Regular expression and typo tolerance via regex and fuzzy
Context boosting at query time
Can now return the entire document in additional phase (payload unneeded)

Query changes

New search_after query for pagination

Mapping changes

Use either text or keyword instead of “string” for string fields in mappings

text means the field is used for full-text search
keyword means the field will be not analyzed and can be used in aggs or exact matches.

Published 3 Nov 2016