Remove a field from a Elasticsearch document

The previous answers did'nt worked for me.

I had to add the keyword "inline":

POST /my_index/_update_by_query
{
  "script": {
    "inline": "ctx._source.remove(\"myfield\")"
  },
  "query" : {
      "exists": { "field": "myfield" }
  }
}

Elasticsearch added update_by_query in 2.3. This experimental interface allows you to do the update against all the documents that match a query.

Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.

Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.

Look into setting conflicts=proceed if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError when one of the records is updated underneath one of the batches.

Similarly setting wait_for_completion=false will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.

url:

http://localhost:9200/INDEX/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed

POST body:

{
  "script": "ctx._source.remove('name_of_field')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "name_of_field"
          }
        }
      ]
    }
  }
}

As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true to your config file.

Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"} format.


What @backtrack told is true , but then there is a very convenient way of doing this in Elasticsearch. Elasticsearch will abstract out the internal complexity of the deletion. You need to use update API to achieve this -

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.remove(\"name_of_field\")"
}'

You can find more documentation here.

Note: As of Elastic Search 6 you are required to include a content-type header:

-H 'Content-Type: application/json'

You can use _update_by_query

Example 1

index: my_index

field: user.email

POST my_index/_update_by_query?conflicts=proceed
{
    "script" : "ctx._source.user.remove('email')",
    "query" : {
        "exists": { "field": "user.email" }
    }
}

Example 2

index: my_index

field: total_items

POST my_index/_update_by_query?conflicts=proceed
{
    "script" : "ctx._source.remove('total_items')",
    "query" : {
        "exists": { "field": "total_items" }
    }
}