Using asyncio with Elasticsearch

The elasticsearch package supports async/await with asyncio and aiohttp. You can either install aiohttp directly or use the [async] extra:

$ python -m pip install elasticsearch aiohttp

# - OR -

$ python -m pip install elasticsearch[async]

Getting Started with Async

After installation all async API endpoints are available via AsyncElasticsearch and are used in the same way as other APIs, just with an extra await:

import asyncio
from elasticsearch import AsyncElasticsearch

client = AsyncElasticsearch()

async def main():
    resp = await client.search(
        index="documents",
        body={"query": {"match_all": {}}},
        size=20,
    )
    print(resp)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

All APIs that are available under the sync client are also available under the async client.

ASGI Applications and Elastic APM

ASGI (Asynchronous Server Gateway Interface) is a new way to serve Python web applications making use of async I/O to achieve better performance. Some examples of ASGI frameworks include FastAPI, Django 3.0+, and Starlette. If you’re using one of these frameworks along with Elasticsearch then you should be using AsyncElasticsearch to avoid blocking the event loop with synchronous network calls for optimal performance.

Elastic APM also supports tracing of async Elasticsearch queries just the same as synchronous queries. For an example on how to configure AsyncElasticsearch with a popular ASGI framework FastAPI and APM tracing there is a pre-built example in the examples/fastapi-apm directory.

Frequently Asked Questions

ValueError when initializing AsyncElasticsearch?

If when trying to use AsyncElasticsearch you receive ValueError: You must have 'aiohttp' installed to use AiohttpHttpNode you should ensure that you have aiohttp installed in your environment (check with $ python -m pip freeze | grep aiohttp). Otherwise, async support won’t be available.

What about the elasticsearch-async package?

Previously asyncio was supported separately via the elasticsearch-async package. The elasticsearch-async package has been deprecated in favor of AsyncElasticsearch provided by the elasticsearch package in v7.8 and onwards.

Receiving ‘Unclosed client session / connector’ warning?

This warning is created by aiohttp when an open HTTP connection is garbage collected. You’ll typically run into this when closing your application. To resolve the issue ensure that close() is called before the AsyncElasticsearch instance is garbage collected.

For example if using FastAPI that might look like this:

import os
from contextlib import asynccontextmanager

from fastapi import FastAPI
from elasticsearch import AsyncElasticsearch

ELASTICSEARCH_URL = os.environ["ELASTICSEARCH_URL"]
client = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    global client
    client = AsyncElasticsearch(ELASTICSEARCH_URL)
    yield
    await client.close()

app = FastAPI(lifespan=lifespan)

@app.get("/")
async def main():
    return await client.info()

You can run this example by saving it to main.py and executing ELASTICSEARCH_URL=http://localhost:9200 uvicorn main:app.

Async Helpers

Async variants of all helpers are available in elasticsearch.helpers and are all prefixed with async_*. You’ll notice that these APIs are identical to the ones in the sync Helpers documentation.

All async helpers that accept an iterator or generator also accept async iterators and async generators.

Bulk and Streaming Bulk

async elasticsearch.helpers.async_bulk(client, actions, stats_only=False, ignore_status=(), *args, **kwargs)

Helper for the bulk() api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True. Note that by default we raise a BulkIndexError when we encounter an error so options like stats_only only+ apply when raise_on_error is set to False.

When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage. If you need to process a lot of data and want to ignore/collect errors please consider using the async_streaming_bulk() helper which will just return the errors and not store them in memory.

Parameters:
Return type:

Tuple[int, int | List[Any]]

Any additional keyword arguments will be passed to async_streaming_bulk() which is used to execute the operation, see async_streaming_bulk() for more accepted parameters.

import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_bulk

client = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "doc": {"word": word},
        }

async def main():
    await async_bulk(client, gendata())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
async elasticsearch.helpers.async_streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=104857600, raise_on_error=True, expand_action_callback=<function expand_action>, raise_on_exception=True, max_retries=0, initial_backoff=2, max_backoff=600, yield_ok=True, ignore_status=(), retry_on_status=(429, ), *args, **kwargs)

Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use async_bulk() which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent.

If you specify max_retries it will also retry any documents that were rejected with a 429 status code. Use retry_on_status to configure which status codes will be retried. To do this it will wait (by calling asyncio.sleep which will block) for initial_backoff seconds and then, every subsequent rejection for the same chunk, for double the time every time up to max_backoff seconds.

Parameters:
  • client (AsyncElasticsearch) – instance of AsyncElasticsearch to use

  • actions (Iterable[bytes | str | Dict[str, Any]] | AsyncIterable[bytes | str | Dict[str, Any]]) – iterable or async iterable containing the actions to be executed

  • chunk_size (int) – number of docs in one chunk sent to es (default: 500)

  • max_chunk_bytes (int) – the maximum size of the request in bytes (default: 100MB)

  • raise_on_error (bool) – raise BulkIndexError containing errors (as .errors) from the execution of the last chunk when some occur. By default we raise.

  • raise_on_exception (bool) – if False then don’t propagate exceptions from call to bulk and just report the items that failed as failed.

  • expand_action_callback (Callable[[bytes | str | Dict[str, Any]], Tuple[Dict[str, Any], None | bytes | Dict[str, Any]]]) – callback executed on each action passed in, should return a tuple containing the action line and the data line (None if data line should be omitted).

  • retry_on_status (int | Collection[int]) – HTTP status code that will trigger a retry. (if None is specified only status 429 will retry).

  • max_retries (int) – maximum number of times a document will be retried when retry_on_status (defaulting to 429) is received, set to 0 (default) for no retries

  • initial_backoff (float) – number of seconds we should wait before the first retry. Any subsequent retries will be powers of initial_backoff * 2**retry_number

  • max_backoff (float) – maximum number of seconds a retry will wait

  • yield_ok (bool) – if set to False will skip successful documents in the output

  • ignore_status (int | Collection[int]) – list of HTTP status code that you want to ignore

  • client

  • actions

  • chunk_size

  • max_chunk_bytes

  • raise_on_error

  • expand_action_callback

  • raise_on_exception

  • max_retries

  • initial_backoff

  • max_backoff

  • yield_ok

  • ignore_status

  • retry_on_status

  • args (Any)

  • kwargs (Any)

Return type:

AsyncIterable[Tuple[bool, Dict[str, Any]]]

import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_streaming_bulk

client = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "word": word,
        }

async def main():
    async for ok, result in async_streaming_bulk(client, gendata()):
        action, result = result.popitem()
        if not ok:
            print("failed to %s document %s" % ())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Scan

async elasticsearch.helpers.async_scan(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, size=1000, request_timeout=None, clear_scroll=True, scroll_kwargs=None, **kwargs)

Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan.

Parameters:
  • client (AsyncElasticsearch) – instance of AsyncElasticsearch to use

  • query (Any | None) – body for the search() api

  • scroll (str) – Specify how long a consistent view of the index should be maintained for scrolled search

  • raise_on_error (bool) – raises an exception (ScanError) if an error is encountered (some shards fail to execute). By default we raise.

  • preserve_order (bool) – don’t set the search_type to scan - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.

  • size (int) – size (per shard) of the batch send at each iteration.

  • request_timeout (float | None) – explicit timeout for each call to scan

  • clear_scroll (bool) – explicitly calls delete on the scroll id via the clear scroll API at the end of the method on completion or error, defaults to true.

  • scroll_kwargs (MutableMapping[str, Any] | None) – additional kwargs to be passed to scroll()

  • client

  • query

  • scroll

  • raise_on_error

  • preserve_order

  • size

  • request_timeout

  • clear_scroll

  • scroll_kwargs

  • kwargs (Any)

Return type:

AsyncIterable[Dict[str, Any]]

Any additional keyword arguments will be passed to the initial search() call:

async_scan(
    client,
    query={"query": {"match": {"title": "python"}}},
    index="orders-*"
)
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_scan

client = AsyncElasticsearch()

async def main():
    async for doc in async_scan(
        client=client,
        query={"query": {"match": {"title": "python"}}},
        index="orders-*"
    ):
        print(doc)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Reindex

async elasticsearch.helpers.async_reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', op_type=None, scan_kwargs={}, bulk_kwargs={})

Reindex all documents from one index that satisfy a given query to another, potentially (if target_client is specified) on a different cluster. If you don’t specify the query you will reindex all the documents.

Since 2.3 a reindex() api is available as part of elasticsearch itself. It is recommended to use the api instead of this helper wherever possible. The helper is here mostly for backwards compatibility and for situations where more flexibility is needed.

Note

This helper doesn’t transfer mappings, just the data.

Parameters:
  • client (AsyncElasticsearch) – instance of AsyncElasticsearch to use (for read if target_client is specified as well)

  • source_index (str | Collection[str]) – index (or list of indices) to read documents from

  • target_index (str) – name of the index in the target cluster to populate

  • query (Any) – body for the search() api

  • target_client (AsyncElasticsearch | None) – optional, is specified will be used for writing (thus enabling reindex between clusters)

  • chunk_size (int) – number of docs in one chunk sent to es (default: 500)

  • scroll (str) – Specify how long a consistent view of the index should be maintained for scrolled search

  • op_type (str | None) – Explicit operation type. Defaults to ‘_index’. Data streams must be set to ‘create’. If not specified, will auto-detect if target_index is a data stream.

  • scan_kwargs (MutableMapping[str, Any]) – additional kwargs to be passed to async_scan()

  • bulk_kwargs (MutableMapping[str, Any]) – additional kwargs to be passed to async_bulk()

  • client

  • source_index

  • target_index

  • query

  • target_client

  • chunk_size

  • scroll

  • op_type

  • scan_kwargs

  • bulk_kwargs

Return type:

Tuple[int, int | List[Any]]

API Reference

The API of AsyncElasticsearch is nearly identical to the API of Elasticsearch with the exception that every API call like search() is an async function and requires an await to properly return the response body.

AsyncElasticsearch

Note

To reference Elasticsearch APIs that are namespaced like .indices.create() refer to the sync API reference. These APIs are identical between sync and async.

class elasticsearch.AsyncElasticsearch

Elasticsearch low-level client. Provides a straightforward mapping from Python to Elasticsearch REST APIs.

The client instance has additional attributes to update APIs in different namespaces such as async_search, indices, security, and more:

client = Elasticsearch("http://localhost:9200")

# Get Document API
client.get(index="*", id="1")

# Get Index API
client.indices.get(index="*")

Transport options can be set on the client constructor or using the options() method:

# Set 'api_key' on the constructor
client = Elasticsearch(
    "http://localhost:9200",
    api_key="api_key",
)
client.search(...)

# Set 'api_key' per request
client.options(api_key="api_key").search(...)
__init__(hosts=None, *, cloud_id=None, api_key=None, basic_auth=None, bearer_auth=None, opaque_id=None, headers=<DEFAULT>, connections_per_node=<DEFAULT>, http_compress=<DEFAULT>, verify_certs=<DEFAULT>, ca_certs=<DEFAULT>, client_cert=<DEFAULT>, client_key=<DEFAULT>, ssl_assert_hostname=<DEFAULT>, ssl_assert_fingerprint=<DEFAULT>, ssl_version=<DEFAULT>, ssl_context=<DEFAULT>, ssl_show_warn=<DEFAULT>, transport_class=<class 'elastic_transport.AsyncTransport'>, request_timeout=<DEFAULT>, node_class=<DEFAULT>, node_pool_class=<DEFAULT>, randomize_nodes_in_pool=<DEFAULT>, node_selector_class=<DEFAULT>, dead_node_backoff_factor=<DEFAULT>, max_dead_node_backoff=<DEFAULT>, serializer=None, serializers=<DEFAULT>, default_mimetype='application/json', max_retries=<DEFAULT>, retry_on_status=<DEFAULT>, retry_on_timeout=<DEFAULT>, sniff_on_start=<DEFAULT>, sniff_before_requests=<DEFAULT>, sniff_on_node_failure=<DEFAULT>, sniff_timeout=<DEFAULT>, min_delay_between_sniffing=<DEFAULT>, sniffed_node_callback=None, meta_header=<DEFAULT>, timeout=<DEFAULT>, randomize_hosts=<DEFAULT>, host_info_callback=None, sniffer_timeout=<DEFAULT>, sniff_on_connection_fail=<DEFAULT>, http_auth=<DEFAULT>, maxsize=<DEFAULT>, _transport=None)
Parameters:
Return type:

None

bulk(*, operations=None, body=None, index=None, error_trace=None, filter_path=None, human=None, list_executed_pipelines=None, pipeline=None, pretty=None, refresh=None, require_alias=None, require_data_stream=None, routing=None, source=None, source_excludes=None, source_includes=None, timeout=None, wait_for_active_shards=None)

Bulk index or delete documents. Perform multiple index, create, delete, and update actions in a single request. This reduces overhead and can greatly increase indexing speed.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

  • To use the create action, you must have the create_doc, create, index, or write index privilege. Data streams support only the create action.
  • To use the index action, you must have the create, index, or write index privilege.
  • To use the delete action, you must have the delete or write index privilege.
  • To use the update action, you must have the index or write index privilege.
  • To automatically create a data stream or index with a bulk API request, you must have the auto_configure, create_index, or manage index privilege.
  • To make the result of a bulk operation visible to search using the refresh parameter, you must have the maintenance or manage index privilege.

Automatic data stream creation requires a matching index template with data stream enabled.

The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

The index and create actions expect a source on the next line and have the same semantics as the op_type parameter in the standard index API. A create action fails if a document with the same ID already exists in the target An index action adds or replaces a document as necessary.

NOTE: Data streams support only the create action. To update or delete a document in a data stream, you must target the backing index containing the document.

An update action expects that the partial doc, upsert, and script and its options are specified on the next line.

A delete action does not expect a source on the next line and has the same semantics as the standard delete API.

NOTE: The final line of data must end with a newline character (\n). Each newline character may be preceded by a carriage return (\r). When sending NDJSON data to the _bulk endpoint, use a Content-Type header of application/json or application/x-ndjson. Because this format uses literal newline characters (\n) as delimiters, make sure that the JSON actions and sources are not pretty printed.

If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index argument.

A note on the format: the idea here is to make processing as fast as possible. As some of the actions are redirected to other shards on other nodes, only action_meta_data is parsed on the receiving node side.

Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.

There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.

Client suppport for bulk requests

Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:

  • Go: Check out esutil.BulkIndexer
  • Perl: Check out Search::Elasticsearch::Client::5_0::Bulk and Search::Elasticsearch::Client::5_0::Scroll
  • Python: Check out elasticsearch.helpers.*
  • JavaScript: Check out client.helpers.*
  • .NET: Check out BulkAllObservable
  • PHP: Check out bulk indexing.

Submitting bulk requests with cURL

If you're providing text file input to curl, you must use the --data-binary flag instead of plain -d. The latter doesn't preserve newlines. For example:

$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}

Optimistic concurrency control

Each index and delete action within a bulk API call may include the if_seq_no and if_primary_term parameters in their respective action and meta data lines. The if_seq_no and if_primary_term parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.

Versioning

Each bulk item can include the version value using the version field. It automatically follows the behavior of the index or delete operation based on the _version mapping. It also support the version_type.

Routing

Each bulk item can include the routing value using the routing field. It automatically follows the behavior of the index or delete operation based on the _routing mapping.

NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing setting enabled in the template.

Wait for active shards

When making bulk calls, you can set the wait_for_active_shards parameter to require a minimum number of shard copies to be active before starting to process the bulk request.

Refresh

Control when the changes made by this request are visible to search.

NOTE: Only the shards that receive the bulk request will be affected by refresh. Imagine a _bulk?refresh=wait_for request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards that make up the index do not participate in the _bulk request at all.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-bulk.html

Parameters:
  • operations (Sequence[Mapping[str, Any]] | None)

  • index (str | None) – The name of the data stream, index, or index alias to perform bulk actions on.

  • list_executed_pipelines (bool | None) – If true, the response will include the ingest pipelines that were run for each index or create.

  • pipeline (str | None) – The pipeline identifier to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to _none turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter.

  • refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, wait for a refresh to make this operation visible to search. If false, do nothing with refreshes. Valid values: true, false, wait_for.

  • require_alias (bool | None) – If true, the request’s actions must target an index alias.

  • require_data_stream (bool | None) – If true, the request’s actions must target a data stream (existing or to be created).

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – Indicates whether to return the _source field (true or false) or contains a list of fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter. If the _source parameter is false, this parameter is ignored.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. The default is 1m (one minute), which guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default is 1, which waits for each primary shard to be active.

  • body (Sequence[Mapping[str, Any]] | None)

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

clear_scroll(*, error_trace=None, filter_path=None, human=None, pretty=None, scroll_id=None, body=None)

Clear a scrolling search. Clear the search context and results for a scrolling search.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/clear-scroll-api.html

Parameters:
Return type:

ObjectApiResponse[Any]

async close()

Closes the Transport and all internal connections

Return type:

None

close_point_in_time(*, id=None, error_trace=None, filter_path=None, human=None, pretty=None, body=None)

Close a point in time. A point in time must be opened explicitly before being used in search requests. The keep_alive parameter tells Elasticsearch how long it should persist. A point in time is automatically closed when the keep_alive period has elapsed. However, keeping points in time has a cost; close them as soon as they are no longer required for search requests.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/point-in-time-api.html

Parameters:
Return type:

ObjectApiResponse[Any]

count(*, index=None, allow_no_indices=None, analyze_wildcard=None, analyzer=None, default_operator=None, df=None, error_trace=None, expand_wildcards=None, filter_path=None, human=None, ignore_throttled=None, ignore_unavailable=None, lenient=None, min_score=None, preference=None, pretty=None, q=None, query=None, routing=None, terminate_after=None, body=None)

Count search results. Get the number of documents matching a query.

The query can be provided either by using a simple query string as a parameter, or by defining Query DSL within the request body. The query is optional. When no query is provided, the API uses match_all to count all the documents.

The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices.

The operation is broadcast across all shards. For each shard ID group, a replica is chosen and the search is run against it. This means that replicas increase the scalability of the count.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-count.html

Parameters:
  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams and indices, omit this parameter or use * or _all.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • analyze_wildcard (bool | None) – If true, wildcard and prefix queries are analyzed. This parameter can be used only when the q query string parameter is specified.

  • analyzer (str | None) – The analyzer to use for the query string. This parameter can be used only when the q query string parameter is specified.

  • default_operator (str | Literal['and', 'or'] | None) – The default operator for query string query: AND or OR. This parameter can be used only when the q query string parameter is specified.

  • df (str | None) – The field to use as a default when no field prefix is given in the query string. This parameter can be used only when the q query string parameter is specified.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as open,hidden.

  • ignore_throttled (bool | None) – If true, concrete, expanded, or aliased indices are ignored when frozen.

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • lenient (bool | None) – If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when the q query string parameter is specified.

  • min_score (float | None) – The minimum _score value that documents must have to be included in the result.

  • preference (str | None) – The node or shard the operation should be performed on. By default, it is random.

  • q (str | None) – The query in Lucene query string syntax. This parameter cannot be used with a request body.

  • query (Mapping[str, Any] | None) – Defines the search query using Query DSL. A request body query cannot be used with the q query string parameter.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • terminate_after (int | None) – The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

create(*, index, id, document=None, body=None, error_trace=None, filter_path=None, human=None, pipeline=None, pretty=None, refresh=None, routing=None, timeout=None, version=None, version_type=None, wait_for_active_shards=None)

Create a new document in the index.

You can index a new JSON document with the /<target>/_doc/ or /<target>/_create/<_id> APIs Using _create guarantees that the document is indexed only if it does not already exist. It returns a 409 response when a document with a same ID already exists in the index. To update an existing document, you must use the /<target>/_doc/ API.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

  • To add a document using the PUT /<target>/_create/<_id> or POST /<target>/_create/<_id> request formats, you must have the create_doc, create, index, or write index privilege.
  • To automatically create a data stream or index with this API request, you must have the auto_configure, create_index, or manage index privilege.

Automatic data stream creation requires a matching index template with data stream enabled.

Automatically create data streams and indices

If the request's target doesn't exist and matches an index template with a data_stream definition, the index operation automatically creates the data stream.

If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.

NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.

If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.

Automatic index creation is controlled by the action.auto_create_index setting. If it is true, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with + or - to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.

NOTE: The action.auto_create_index setting affects the automatic creation of indices only. It does not affect the creation of data streams.

Routing

By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter.

When setting up explicit mapping, you can also use the _routing field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing setting enabled in the template.

** Distributed**

The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Active shards

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards is 1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, use the wait_for_active_shards request parameter.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you set wait_for_active_shards to all (or to 4, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the API response reveals the number of shard copies on which replication succeeded and failed.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-index_.html

Parameters:
  • index (str) – The name of the data stream or index to target. If the target doesn’t exist and matches the name or wildcard (*) pattern of an index template with a data_stream definition, this request creates the data stream. If the target doesn’t exist and doesn’t match a data stream template, this request creates the index.

  • id (str) – A unique identifier for the document. To automatically generate a document ID, use the POST /<target>/_doc/ request format.

  • document (Mapping[str, Any] | None)

  • pipeline (str | None) – The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to _none turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter.

  • refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, it waits for a refresh to make this operation visible to search. If false, it does nothing with refreshes.

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Elasticsearch waits for at least the specified timeout period before failing. The actual wait time could be longer, particularly when multiple waits occur. This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.

  • version (int | None) – The explicit version number for concurrency control. It must be a non-negative long number.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. You can set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value of 1 means it waits for each primary shard to be active.

  • body (Mapping[str, Any] | None)

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

delete(*, index, id, error_trace=None, filter_path=None, human=None, if_primary_term=None, if_seq_no=None, pretty=None, refresh=None, routing=None, timeout=None, version=None, version_type=None, wait_for_active_shards=None)

Delete a document.

Remove a JSON document from the specified index.

NOTE: You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.

Optimistic concurrency control

Delete operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409.

Versioning

Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document you are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation run on a document, deletes included, causes its version to be incremented. The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations. The length of time for which a deleted document's version remains available is determined by the index.gc_deletes index setting.

Routing

If routing is used during indexing, the routing value also needs to be specified to delete a document.

If the _routing mapping is set to required and no routing value is specified, the delete API throws a RoutingMissingException and rejects the request.

For example:

DELETE /my-index-000001/_doc/1?routing=shard-1

This request deletes the document with ID 1, but it is routed based on the user. The document is not deleted if the correct routing is not specified.

Distributed

The delete operation gets hashed into a specific shard ID. It then gets redirected into the primary shard within that ID group and replicated (if needed) to shard replicas within that ID group.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-delete.html

Parameters:
  • index (str) – The name of the target index.

  • id (str) – A unique identifier for the document.

  • if_primary_term (int | None) – Only perform the operation if the document has this primary term.

  • if_seq_no (int | None) – Only perform the operation if the document has this sequence number.

  • refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, it waits for a refresh to make this operation visible to search. If false, it does nothing with refreshes.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for active shards. This parameter is useful for situations where the primary shard assigned to perform the delete operation might not be available when the delete operation runs. Some reasons for this might be that the primary shard is currently recovering from a store or undergoing relocation. By default, the delete operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error.

  • version (int | None) – An explicit version number for concurrency control. It must match the current version of the document for the request to succeed.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The minimum number of shard copies that must be active before proceeding with the operation. You can set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value of 1 means it waits for each primary shard to be active.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

delete_by_query(*, index, allow_no_indices=None, analyze_wildcard=None, analyzer=None, conflicts=None, default_operator=None, df=None, error_trace=None, expand_wildcards=None, filter_path=None, from_=None, human=None, ignore_unavailable=None, lenient=None, max_docs=None, preference=None, pretty=None, q=None, query=None, refresh=None, request_cache=None, requests_per_second=None, routing=None, scroll=None, scroll_size=None, search_timeout=None, search_type=None, slice=None, slices=None, sort=None, stats=None, terminate_after=None, timeout=None, version=None, wait_for_active_shards=None, wait_for_completion=None, body=None)

Delete documents.

Deletes documents that match the specified query.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:

  • read
  • delete or write

You can specify the query criteria in the request URI or the request body using the same syntax as the search API. When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning. If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.

NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.

While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A bulk delete request is performed for each batch of matching documents. If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off. If the maximum retry limit is reached, processing halts and all failed requests are returned in the response. Any delete requests that completed successfully still stick, they are not rolled back.

You can opt to count version conflicts instead of halting and returning by setting conflicts to proceed. Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source than max_docs until it has successfully deleted max_docs documents, or it has gone through every document in the source query.

Throttling delete requests

To control the rate at which delete by query issues batches of delete operations, you can set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.

Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:

target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds

Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".

Slicing

Delete by query supports sliced scroll to parallelize the delete process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.

Setting slices to auto lets Elasticsearch choose the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards. Adding slices to the delete by query operation creates sub-requests which means it has some quirks:

  • You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
  • Fetching the status of the task for the request with slices only contains the status of completed slices.
  • These sub-requests are individually addressable for things like cancellation and rethrottling.
  • Rethrottling the request with slices will rethrottle the unfinished sub-request proportionally.
  • Canceling the request with slices will cancel each sub-request.
  • Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
  • Parameters like requests_per_second and max_docs on a request with slices are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that using max_docs with slices might not result in exactly max_docs documents being deleted.
  • Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.

If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:

  • Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
  • Delete performance scales linearly across available resources with the number of slices.

Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources.

Cancel a delete by query operation

Any delete by query can be canceled using the task cancel API. For example:

POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel

The task ID can be found by using the get tasks API.

Cancellation should happen quickly but might take a few seconds. The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-delete-by-query.html

Parameters:
  • index (str | Sequence[str]) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams or indices, omit this parameter or use * or _all.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • analyze_wildcard (bool | None) – If true, wildcard and prefix queries are analyzed. This parameter can be used only when the q query string parameter is specified.

  • analyzer (str | None) – Analyzer to use for the query string. This parameter can be used only when the q query string parameter is specified.

  • conflicts (str | Literal['abort', 'proceed'] | None) – What to do if delete by query hits version conflicts: abort or proceed.

  • default_operator (str | Literal['and', 'or'] | None) – The default operator for query string query: AND or OR. This parameter can be used only when the q query string parameter is specified.

  • df (str | None) – The field to use as default where no field prefix is given in the query string. This parameter can be used only when the q query string parameter is specified.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as open,hidden.

  • from – Starting offset (default: 0)

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • lenient (bool | None) – If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when the q query string parameter is specified.

  • max_docs (int | None) – The maximum number of documents to delete.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • q (str | None) – A query in the Lucene query string syntax.

  • query (Mapping[str, Any] | None) – The documents to delete specified with Query DSL.

  • refresh (bool | None) – If true, Elasticsearch refreshes all shards involved in the delete by query after the request completes. This is different than the delete API’s refresh parameter, which causes just the shard that received the delete request to be refreshed. Unlike the delete API, it does not support wait_for.

  • request_cache (bool | None) – If true, the request cache is used for this request. Defaults to the index-level setting.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – The period to retain the search context for scrolling.

  • scroll_size (int | None) – The size of the scroll request that powers the operation.

  • search_timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The explicit timeout for each search request. It defaults to no timeout.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – The type of the search operation. Available options include query_then_fetch and dfs_query_then_fetch.

  • slice (Mapping[str, Any] | None) – Slice the request manually using the provided slice ID and total number of slices.

  • slices (int | str | Literal['auto'] | None) – The number of slices this task should be divided into.

  • sort (Sequence[str] | None) – A comma-separated list of <field>:<direction> pairs.

  • stats (Sequence[str] | None) – The specific tag of the request for logging and statistical purposes.

  • terminate_after (int | None) – The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period each deletion request waits for active shards.

  • version (bool | None) – If true, returns the document version as part of a hit.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The timeout value controls how long each write request waits for unavailable shards to become available.

  • wait_for_completion (bool | None) – If true, the request blocks until the operation is complete. If false, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at .tasks/task/${taskId}. When you are done with a task, you should delete the task document so Elasticsearch can reclaim the space.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • from_ (int | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

delete_by_query_rethrottle(*, task_id, error_trace=None, filter_path=None, human=None, pretty=None, requests_per_second=None)

Throttle a delete by query operation.

Change the number of requests per second for a particular delete by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-delete-by-query.html#docs-delete-by-query-rethrottle

Parameters:
  • task_id (int | str) – The ID for the task.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second. To disable throttling, set it to -1.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

delete_script(*, id, error_trace=None, filter_path=None, human=None, master_timeout=None, pretty=None, timeout=None)

Delete a script or search template. Deletes a stored script or search template.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/delete-stored-script-api.html

Parameters:
  • id (str) – The identifier for the stored script or search template.

  • master_timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

exists(*, index, id, error_trace=None, filter_path=None, human=None, preference=None, pretty=None, realtime=None, refresh=None, routing=None, source=None, source_excludes=None, source_includes=None, stored_fields=None, version=None, version_type=None)

Check a document.

Verify that a document exists. For example, check to see if a document with the _id 0 exists:

HEAD my-index-000001/_doc/0

If the document exists, the API returns a status code of 200 - OK. If the document doesn’t exist, the API returns 404 - Not Found.

Versioning support

You can use the version parameter to check the document only if its current version is equal to the specified one.

Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-get.html

Parameters:
  • index (str) – A comma-separated list of data streams, indices, and aliases. It supports wildcards (*).

  • id (str) – A unique document identifier.

  • preference (str | None) – The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas. If it is set to _local, the operation will prefer to be run on a local allocated shard when possible. If it is set to a custom value, the value is used to guarantee that the same shards will be used for the same custom value. This can help with “jumping values” when hitting different shards in different refresh states. A sample value can be something like the web session ID or the user name.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • refresh (bool | None) – If true, the request refreshes the relevant shards before retrieving the document. Setting it to true should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing).

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – Indicates whether to return the _source field (true or false) or lists the fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter. If the _source parameter is false, this parameter is ignored.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stored_fields (str | Sequence[str] | None) – A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source parameter defaults to false.

  • version (int | None) – Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

HeadApiResponse

exists_source(*, index, id, error_trace=None, filter_path=None, human=None, preference=None, pretty=None, realtime=None, refresh=None, routing=None, source=None, source_excludes=None, source_includes=None, version=None, version_type=None)

Check for a document source.

Check whether a document source exists in an index. For example:

HEAD my-index-000001/_source/1

A document's source is not available if it is disabled in the mapping.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-get.html

Parameters:
  • index (str) – A comma-separated list of data streams, indices, and aliases. It supports wildcards (*).

  • id (str) – A unique identifier for the document.

  • preference (str | None) – The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • refresh (bool | None) – If true, the request refreshes the relevant shards before retrieving the document. Setting it to true should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing).

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – Indicates whether to return the _source field (true or false) or lists the fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude in the response.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response.

  • version (int | None) – The version number for concurrency control. It must match the current version of the document for the request to succeed.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

HeadApiResponse

explain(*, index, id, analyze_wildcard=None, analyzer=None, default_operator=None, df=None, error_trace=None, filter_path=None, human=None, lenient=None, preference=None, pretty=None, q=None, query=None, routing=None, source=None, source_excludes=None, source_includes=None, stored_fields=None, body=None)

Explain a document match result. Get information about why a specific document matches, or doesn't match, a query. It computes a score explanation for a query and a specific document.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-explain.html

Parameters:
  • index (str) – Index names that are used to limit the request. Only a single index name can be provided to this parameter.

  • id (str) – The document identifier.

  • analyze_wildcard (bool | None) – If true, wildcard and prefix queries are analyzed. This parameter can be used only when the q query string parameter is specified.

  • analyzer (str | None) – The analyzer to use for the query string. This parameter can be used only when the q query string parameter is specified.

  • default_operator (str | Literal['and', 'or'] | None) – The default operator for query string query: AND or OR. This parameter can be used only when the q query string parameter is specified.

  • df (str | None) – The field to use as default where no field prefix is given in the query string. This parameter can be used only when the q query string parameter is specified.

  • lenient (bool | None) – If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when the q query string parameter is specified.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • q (str | None) – The query in the Lucene query string syntax.

  • query (Mapping[str, Any] | None) – Defines the search definition using the Query DSL.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – True or false to return the _source field or not or a list of fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter. If the _source parameter is false, this parameter is ignored.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stored_fields (str | Sequence[str] | None) – A comma-separated list of stored fields to return in the response.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

field_caps(*, index=None, allow_no_indices=None, error_trace=None, expand_wildcards=None, fields=None, filter_path=None, filters=None, human=None, ignore_unavailable=None, include_empty_fields=None, include_unmapped=None, index_filter=None, pretty=None, runtime_mappings=None, types=None, body=None)

Get the field capabilities.

Get information about the capabilities of fields among multiple indices.

For data streams, the API returns field capabilities among the stream’s backing indices. It returns runtime fields like any other field. For example, a runtime field with a type of keyword is returned the same as any other field that belongs to the keyword family.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-field-caps.html

Parameters:
  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (*). To target all data streams and indices, omit this parameter or use * or _all.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden.

  • fields (str | Sequence[str] | None) – A list of fields to retrieve capabilities for. Wildcard (*) expressions are supported.

  • filters (str | None) – A comma-separated list of filters to apply to the response.

  • ignore_unavailable (bool | None) – If true, missing or closed indices are not included in the response.

  • include_empty_fields (bool | None) – If false, empty fields are not included in the response.

  • include_unmapped (bool | None) – If true, unmapped fields are included in the response.

  • index_filter (Mapping[str, Any] | None) – Filter indices if the provided query rewrites to match_none on every shard. IMPORTANT: The filtering is done on a best-effort basis, it uses index statistics and mappings to rewrite queries to match_none instead of fully running the request. For instance a range query over a date field can rewrite to match_none if all documents within a shard (including deleted documents) are outside of the provided range. However, not all queries can rewrite to match_none so this API may return an index even if the provided filter matches no document.

  • runtime_mappings (Mapping[str, Mapping[str, Any]] | None) – Define ad-hoc runtime fields in the request similar to the way it is done in search requests. These fields exist only as part of the query and take precedence over fields defined with the same name in the index mappings.

  • types (Sequence[str] | None) – A comma-separated list of field types to include. Any fields that do not match one of these types will be excluded from the results. It defaults to empty, meaning that all field types are returned.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

get(*, index, id, error_trace=None, filter_path=None, force_synthetic_source=None, human=None, preference=None, pretty=None, realtime=None, refresh=None, routing=None, source=None, source_excludes=None, source_includes=None, stored_fields=None, version=None, version_type=None)

Get a document by its ID.

Get a document and its source or stored fields from an index.

By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search). In the case where stored fields are requested with the stored_fields parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields. To turn off realtime behavior, set the realtime parameter to false.

Source filtering

By default, the API returns the contents of the _source field unless you have used the stored_fields parameter or the _source field is turned off. You can turn off _source retrieval by using the _source parameter:

GET my-index-000001/_doc/0?_source=false

If you only need one or two fields from the _source, use the _source_includes or _source_excludes parameters to include or filter out particular fields. This can be helpful with large documents where partial retrieval can save on network overhead Both parameters take a comma separated list of fields or wildcard expressions. For example:

GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities

If you only want to specify includes, you can use a shorter notation:

GET my-index-000001/_doc/0?_source=*.id

Routing

If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:

GET my-index-000001/_doc/2?routing=user1

This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.

Distributed

The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.

Versioning support

You can use the version parameter to retrieve the document only if its current version is equal to the specified one.

Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-get.html

Parameters:
  • index (str) – The name of the index that contains the document.

  • id (str) – A unique document identifier.

  • force_synthetic_source (bool | None) – Indicates whether the request forces synthetic _source. Use this paramater to test if the mapping supports synthetic _source and to get a sense of the worst case performance. Fetches with this parameter enabled will be slower than enabling synthetic source natively in the index.

  • preference (str | None) – The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas. If it is set to _local, the operation will prefer to be run on a local allocated shard when possible. If it is set to a custom value, the value is used to guarantee that the same shards will be used for the same custom value. This can help with “jumping values” when hitting different shards in different refresh states. A sample value can be something like the web session ID or the user name.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • refresh (bool | None) – If true, the request refreshes the relevant shards before retrieving the document. Setting it to true should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing).

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – Indicates whether to return the _source field (true or false) or lists the fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter. If the _source parameter is false, this parameter is ignored.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stored_fields (str | Sequence[str] | None) – A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source parameter defaults to false. Only leaf fields can be retrieved with the stored_field option. Object fields can’t be returned;​if specified, the request fails.

  • version (int | None) – The version number for concurrency control. It must match the current version of the document for the request to succeed.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

get_script(*, id, error_trace=None, filter_path=None, human=None, master_timeout=None, pretty=None)

Get a script or search template. Retrieves a stored script or search template.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/modules-scripting.html

Parameters:
  • id (str) – The identifier for the stored script or search template.

  • master_timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

get_script_context(*, error_trace=None, filter_path=None, human=None, pretty=None)

Get script contexts.

Get a list of supported script contexts and their methods.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/get-script-contexts-api.html

Parameters:
Return type:

ObjectApiResponse[Any]

get_script_languages(*, error_trace=None, filter_path=None, human=None, pretty=None)

Get script languages.

Get a list of available script types, languages, and contexts.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/get-script-languages-api.html

Parameters:
Return type:

ObjectApiResponse[Any]

get_source(*, index, id, error_trace=None, filter_path=None, human=None, preference=None, pretty=None, realtime=None, refresh=None, routing=None, source=None, source_excludes=None, source_includes=None, stored_fields=None, version=None, version_type=None)

Get a document's source.

Get the source of a document. For example:

GET my-index-000001/_source/1

You can use the source filtering parameters to control which parts of the _source are returned:

GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-get.html

Parameters:
  • index (str) – The name of the index that contains the document.

  • id (str) – A unique document identifier.

  • preference (str | None) – The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • refresh (bool | None) – If true, the request refreshes the relevant shards before retrieving the document. Setting it to true should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing).

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – Indicates whether to return the _source field (true or false) or lists the fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude in the response.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response.

  • stored_fields (str | Sequence[str] | None) – A comma-separated list of stored fields to return as part of a hit.

  • version (int | None) – The version number for concurrency control. It must match the current version of the document for the request to succeed.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

health_report(*, feature=None, error_trace=None, filter_path=None, human=None, pretty=None, size=None, timeout=None, verbose=None)

Get the cluster health. Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.

Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.

The cluster’s status is controlled by the worst indicator status.

In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.

Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.

NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/health-api.html

Parameters:
  • feature (str | Sequence[str] | None) – A feature of the cluster, as returned by the top-level health report API.

  • size (int | None) – Limit the number of affected resources the health report API returns.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – Explicit operation timeout.

  • verbose (bool | None) – Opt-in for more information about the health of the system.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

index(*, index, document=None, body=None, id=None, error_trace=None, filter_path=None, human=None, if_primary_term=None, if_seq_no=None, op_type=None, pipeline=None, pretty=None, refresh=None, require_alias=None, routing=None, timeout=None, version=None, version_type=None, wait_for_active_shards=None)

Create or update a document in an index.

Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.

NOTE: You cannot use this API to send update requests for existing documents in a data stream.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

  • To add or overwrite a document using the PUT /<target>/_doc/<_id> request format, you must have the create, index, or write index privilege.
  • To add a document using the POST /<target>/_doc/ request format, you must have the create_doc, create, index, or write index privilege.
  • To automatically create a data stream or index with this API request, you must have the auto_configure, create_index, or manage index privilege.

Automatic data stream creation requires a matching index template with data stream enabled.

NOTE: Replica shards might not all be started when an indexing operation returns successfully. By default, only the primary is required. Set wait_for_active_shards to change this default behavior.

Automatically create data streams and indices

If the request's target doesn't exist and matches an index template with a data_stream definition, the index operation automatically creates the data stream.

If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.

NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.

If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.

Automatic index creation is controlled by the action.auto_create_index setting. If it is true, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with + or - to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.

NOTE: The action.auto_create_index setting affects the automatic creation of indices only. It does not affect the creation of data streams.

Optimistic concurrency control

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409.

Routing

By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter.

When setting up explicit mapping, you can also use the _routing field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing setting enabled in the template.

  • ** Distributed**

The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Active shards

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards is 1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, use the wait_for_active_shards request parameter.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you set wait_for_active_shards to all (or to 4, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the API response reveals the number of shard copies on which replication succeeded and failed.

No operation (noop) updates

When updating a document by using this API, a new version of the document is always created even if the document hasn't changed. If this isn't acceptable use the _update API with detect_noop set to true. The detect_noop option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.

There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

Versioning

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.

NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.

When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:

PUT my-index-000001/_doc/1?version=2&version_type=external
{
  "user": {
    "id": "elkbee"
  }
}

In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.
If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).

A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-index_.html

Parameters:
  • index (str) – The name of the data stream or index to target. If the target doesn’t exist and matches the name or wildcard (*) pattern of an index template with a data_stream definition, this request creates the data stream. If the target doesn’t exist and doesn’t match a data stream template, this request creates the index. You can check for existing targets with the resolve index API.

  • document (Mapping[str, Any] | None)

  • id (str | None) – A unique identifier for the document. To automatically generate a document ID, use the POST /<target>/_doc/ request format and omit this parameter.

  • if_primary_term (int | None) – Only perform the operation if the document has this primary term.

  • if_seq_no (int | None) – Only perform the operation if the document has this sequence number.

  • op_type (str | Literal['create', 'index'] | None) – Set to create to only index the document if it does not already exist (put if absent). If a document with the specified _id already exists, the indexing operation will fail. The behavior is the same as using the <index>/_create endpoint. If a document ID is specified, this paramater defaults to index. Otherwise, it defaults to create. If the request targets a data stream, an op_type of create is required.

  • pipeline (str | None) – The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to _none disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter.

  • refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, it waits for a refresh to make this operation visible to search. If false, it does nothing with refreshes.

  • require_alias (bool | None) – If true, the destination must be an index alias.

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.

  • version (int | None) – An explicit version number for concurrency control. It must be a non-negative long number.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. You can set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value of 1 means it waits for each primary shard to be active.

  • body (Mapping[str, Any] | None)

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

info(*, error_trace=None, filter_path=None, human=None, pretty=None)

Get cluster info. Get basic build, version, and cluster information.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/rest-api-root.html

Parameters:
Return type:

ObjectApiResponse[Any]

Run a knn search.

NOTE: The kNN search API has been replaced by the knn option in the search API.

Perform a k-nearest neighbor (kNN) search on a dense_vector field and return the matching documents. Given a query vector, the API finds the k closest vectors and returns those documents as search hits.

Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.

The kNN search API supports restricting the search using a filter. The search will return the top k documents that also match the filter query.

A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:

  • The document _score is determined by the similarity between the query and document vector.
  • The hits.total object contains the total number of nearest neighbor candidates considered, which is num_candidates * num_shards. The hits.total.relation will always be eq, indicating an exact value.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/knn-search-api.html

Parameters:
  • index (str | Sequence[str]) – A comma-separated list of index names to search; use _all or to perform the operation on all indices.

  • knn (Mapping[str, Any] | None) – The kNN query to run.

  • docvalue_fields (Sequence[Mapping[str, Any]] | None) – The request returns doc values for field names matching these patterns in the hits.fields property of the response. It accepts wildcard (*) patterns.

  • fields (str | Sequence[str] | None) – The request returns values for field names matching these patterns in the hits.fields property of the response. It accepts wildcard (*) patterns.

  • filter (Mapping[str, Any] | Sequence[Mapping[str, Any]] | None) – A query to filter the documents that can match. The kNN search will return the top k documents that also match this filter. The value can be a single query or a list of queries. If filter isn’t provided, all documents are allowed to match.

  • routing (str | None) – A comma-separated list of specific routing values.

  • source (bool | Mapping[str, Any] | None) – Indicates which source fields are returned for matching documents. These fields are returned in the hits._source property of the search response.

  • stored_fields (str | Sequence[str] | None) – A list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source parameter defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

mget(*, index=None, docs=None, error_trace=None, filter_path=None, force_synthetic_source=None, human=None, ids=None, preference=None, pretty=None, realtime=None, refresh=None, routing=None, source=None, source_excludes=None, source_includes=None, stored_fields=None, body=None)

Get multiple documents.

Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.

Filter source fields

By default, the _source field is returned for every document (if stored). Use the _source and _source_include or source_exclude attributes to filter what fields are returned for a particular document. You can include the _source, _source_includes, and _source_excludes query parameters in the request URI to specify the defaults to use when there are no per-document instructions.

Get stored fields

Use the stored_fields attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include the stored_fields query parameter in the request URI to specify the defaults to use when there are no per-document instructions.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-multi-get.html

Parameters:
  • index (str | None) – Name of the index to retrieve documents from when ids are specified, or when a document in the docs array does not specify an index.

  • docs (Sequence[Mapping[str, Any]] | None) – The documents you want to retrieve. Required if no index is specified in the request URI.

  • force_synthetic_source (bool | None) – Should this request force synthetic _source? Use this to test if the mapping supports synthetic _source and to get a sense of the worst case performance. Fetches with this enabled will be slower the enabling synthetic source natively in the index.

  • ids (str | Sequence[str] | None) – The IDs of the documents you want to retrieve. Allowed when the index is specified in the request URI.

  • preference (str | None) – Specifies the node or shard the operation should be performed on. Random by default.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • refresh (bool | None) – If true, the request refreshes relevant shards before retrieving documents.

  • routing (str | None) – Custom value used to route operations to a specific shard.

  • source (bool | str | Sequence[str] | None) – True or false to return the _source field or not, or a list of fields to return.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stored_fields (str | Sequence[str] | None) – If true, retrieves the document fields stored in the index rather than the document _source.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

msearch(*, searches=None, body=None, index=None, allow_no_indices=None, ccs_minimize_roundtrips=None, error_trace=None, expand_wildcards=None, filter_path=None, human=None, ignore_throttled=None, ignore_unavailable=None, include_named_queries_score=None, max_concurrent_searches=None, max_concurrent_shard_requests=None, pre_filter_shard_size=None, pretty=None, rest_total_hits_as_int=None, routing=None, search_type=None, typed_keys=None)

Run multiple searches.

The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:

header\n
body\n
header\n
body\n

This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.

IMPORTANT: The final line of data must end with a newline character \n. Each newline character may be preceded by a carriage return \r. When sending requests to this endpoint the Content-Type header should be set to application/x-ndjson.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-multi-search.html

Parameters:
  • searches (Sequence[Mapping[str, Any]] | None)

  • index (str | Sequence[str] | None) – Comma-separated list of data streams, indices, and index aliases to search.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • ccs_minimize_roundtrips (bool | None) – If true, network roundtrips between the coordinating node and remote clusters are minimized for cross-cluster search requests.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – Type of index that wildcard expressions can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams.

  • ignore_throttled (bool | None) – If true, concrete, expanded or aliased indices are ignored when frozen.

  • ignore_unavailable (bool | None) – If true, missing or closed indices are not included in the response.

  • include_named_queries_score (bool | None) – Indicates whether hit.matched_queries should be rendered as a map that includes the name of the matched query associated with its score (true) or as an array containing the name of the matched queries (false) This functionality reruns each named query on every hit in a search response. Typically, this adds a small overhead to a request. However, using computationally expensive named queries on a large number of hits may add significant overhead.

  • max_concurrent_searches (int | None) – Maximum number of concurrent searches the multi search API can execute.

  • max_concurrent_shard_requests (int | None) – Maximum number of concurrent shard requests that each sub-search request executes per node.

  • pre_filter_shard_size (int | None) – Defines a threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method i.e., if date filters are mandatory to match but the shard bounds and the query are disjoint.

  • rest_total_hits_as_int (bool | None) – If true, hits.total are returned as an integer in the response. Defaults to false, which returns an object.

  • routing (str | None) – Custom routing value used to route search operations to a specific shard.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – Indicates whether global term and document frequencies should be used when scoring returned documents.

  • typed_keys (bool | None) – Specifies whether aggregation and suggester names should be prefixed by their respective types in the response.

  • body (Sequence[Mapping[str, Any]] | None)

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

msearch_template(*, search_templates=None, body=None, index=None, ccs_minimize_roundtrips=None, error_trace=None, filter_path=None, human=None, max_concurrent_searches=None, pretty=None, rest_total_hits_as_int=None, search_type=None, typed_keys=None)

Run multiple templated searches.

Run multiple templated searches with a single request. If you are providing a text file or text input to curl, use the --data-binary flag instead of -d to preserve newlines. For example:

$ cat requests
{ "index": "my-index" }
{ "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 }}
{ "index": "my-other-index" }
{ "id": "my-other-search-template", "params": { "query_type": "match_all" }}

$ curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch/template --data-binary "@requests"; echo

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/multi-search-template.html

Parameters:
  • search_templates (Sequence[Mapping[str, Any]] | None)

  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams and indices, omit this parameter or use *.

  • ccs_minimize_roundtrips (bool | None) – If true, network round-trips are minimized for cross-cluster search requests.

  • max_concurrent_searches (int | None) – The maximum number of concurrent searches the API can run.

  • rest_total_hits_as_int (bool | None) – If true, the response returns hits.total as an integer. If false, it returns hits.total as an object.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – The type of the search operation.

  • typed_keys (bool | None) – If true, the response prefixes aggregation and suggester names with their respective types.

  • body (Sequence[Mapping[str, Any]] | None)

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

mtermvectors(*, index=None, docs=None, error_trace=None, field_statistics=None, fields=None, filter_path=None, human=None, ids=None, offsets=None, payloads=None, positions=None, preference=None, pretty=None, realtime=None, routing=None, term_statistics=None, version=None, version_type=None, body=None)

Get multiple term vectors.

Get multiple term vectors with a single request. You can specify existing documents by index and ID or provide artificial documents in the body of the request. You can specify the index in the request body or request URI. The response contains a docs array with all the fetched termvectors. Each element has the structure provided by the termvectors API.

Artificial documents

You can also use mtermvectors to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified _index.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-multi-termvectors.html

Parameters:
  • index (str | None) – The name of the index that contains the documents.

  • docs (Sequence[Mapping[str, Any]] | None) – An array of existing or artificial documents.

  • field_statistics (bool | None) – If true, the response includes the document count, sum of document frequencies, and sum of total term frequencies.

  • fields (str | Sequence[str] | None) – A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.

  • ids (Sequence[str] | None) – A simplified syntax to specify documents by their ID if they’re in the same index.

  • offsets (bool | None) – If true, the response includes term offsets.

  • payloads (bool | None) – If true, the response includes term payloads.

  • positions (bool | None) – If true, the response includes term positions.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • term_statistics (bool | None) – If true, the response includes term frequency and document frequency.

  • version (int | None) – If true, returns the document version as part of a hit.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

open_point_in_time(*, index, keep_alive, allow_partial_search_results=None, error_trace=None, expand_wildcards=None, filter_path=None, human=None, ignore_unavailable=None, index_filter=None, preference=None, pretty=None, routing=None, body=None)

Open a point in time.

A search request by default runs against the most recent visible data of the target indices, which is called point in time. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. In some cases, it’s preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time.

A point in time must be opened explicitly before being used in search requests.

A subsequent search request with the pit parameter must not specify index, routing, or preference values as these parameters are copied from the point in time.

Just like regular searches, you can use from and size to page through point in time search results, up to the first 10,000 hits. If you want to retrieve more hits, use PIT with search_after.

IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.

When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a NoShardAvailableActionException exception. To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.

Keeping point in time alive

The keep_alive parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time. The value does not need to be long enough to process all data — it just needs to be long enough for the next request.

Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. However, open point-in-times prevent the old segments from being deleted since they are still in use.

TIP: Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles.

Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. Note that a point-in-time doesn't prevent its associated indices from being deleted. You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/point-in-time-api.html

Parameters:
  • index (str | Sequence[str]) – A comma-separated list of index names to open point in time; use _all or empty string to perform the operation on all indices

  • keep_alive (str | Literal[-1] | ~typing.Literal[0]) – Extend the length of time that the point in time persists.

  • allow_partial_search_results (bool | None) – Indicates whether the point in time tolerates unavailable shards or shard failures when initially creating the PIT. If false, creating a point in time request when a shard is missing or unavailable will throw an exception. If true, the point in time will contain all the shards that are available at the time of the request.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • index_filter (Mapping[str, Any] | None) – Filter indices if the provided query rewrites to match_none on every shard.

  • preference (str | None) – The node or shard the operation should be performed on. By default, it is random.

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

ping(*, error_trace=None, filter_path=None, human=None, pretty=None)

Returns True if a successful response returns from the info() API, otherwise returns False. This API call can fail either at the transport layer (due to connection errors or timeouts) or from a non-2XX HTTP response (due to authentication or authorization issues).

If you want to discover why the request failed you should use the info() API.

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Parameters:
Return type:

bool

put_script(*, id, script=None, context=None, error_trace=None, filter_path=None, human=None, master_timeout=None, pretty=None, timeout=None, body=None)

Create or update a script or search template. Creates or updates a stored script or search template.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/create-stored-script-api.html

Parameters:
  • id (str) – The identifier for the stored script or search template. It must be unique within the cluster.

  • script (Mapping[str, Any] | None) – The script or search template, its parameters, and its language.

  • context (str | None) – The context in which the script or search template should run. To prevent errors, the API immediately compiles the script or template in this context.

  • master_timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

rank_eval(*, requests=None, index=None, allow_no_indices=None, error_trace=None, expand_wildcards=None, filter_path=None, human=None, ignore_unavailable=None, metric=None, pretty=None, search_type=None, body=None)

Evaluate ranked search results.

Evaluate the quality of ranked search results over a set of typical search queries.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-rank-eval.html

Parameters:
  • requests (Sequence[Mapping[str, Any]] | None) – A set of typical search requests, together with their provided ratings.

  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and index aliases used to limit the request. Wildcard (*) expressions are supported. To target all data streams and indices in a cluster, omit this parameter or use _all or *.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – Whether to expand wildcard expression to concrete indices that are open, closed or both.

  • ignore_unavailable (bool | None) – If true, missing or closed indices are not included in the response.

  • metric (Mapping[str, Any] | None) – Definition of the evaluation metric to calculate.

  • search_type (str | None) – Search operation type

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

reindex(*, dest=None, source=None, conflicts=None, error_trace=None, filter_path=None, human=None, max_docs=None, pretty=None, refresh=None, requests_per_second=None, require_alias=None, script=None, scroll=None, size=None, slices=None, timeout=None, wait_for_active_shards=None, wait_for_completion=None, body=None)

Reindex documents.

Copy documents from a source to a destination. You can copy all documents to the destination index or reindex a subset of the documents. The source can be any existing index, alias, or data stream. The destination must differ from the source. For example, you cannot reindex a data stream into itself.

IMPORTANT: Reindex requires _source to be enabled for all documents in the source. The destination should be configured as wanted before calling the reindex API. Reindex does not copy the settings from the source or its associated template. Mappings, shard counts, and replicas, for example, must be configured ahead of time.

If the Elasticsearch security features are enabled, you must have the following security privileges:

  • The read index privilege for the source data stream, index, or alias.
  • The write index privilege for the destination data stream, index, or index alias.
  • To automatically create a data stream or index with a reindex API request, you must have the auto_configure, create_index, or manage index privilege for the destination data stream, index, or alias.
  • If reindexing from a remote cluster, the source.remote.user must have the monitor cluster privilege and the read index privilege for the source data stream, index, or alias.

If reindexing from a remote cluster, you must explicitly allow the remote host in the reindex.remote.whitelist setting. Automatic data stream creation requires a matching index template with data stream enabled.

The dest element can be configured like the index API to control optimistic concurrency control. Omitting version_type or setting it to internal causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.

Setting version_type to external causes Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.

Setting op_type to create causes the reindex API to create only missing documents in the destination. All existing documents will cause a version conflict.

IMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an op_type of create. A reindex can only add new documents to a destination data stream. It cannot update existing documents in a destination data stream.

By default, version conflicts abort the reindex process. To continue reindexing if there are conflicts, set the conflicts request body property to proceed. In this case, the response includes a count of the version conflicts that were encountered. Note that the handling of other error types is unaffected by the conflicts property. Additionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source than max_docs until it has successfully indexed max_docs documents into the target or it has gone through every document in the source query.

NOTE: The reindex API makes no effort to handle ID collisions. The last document written will "win" but the order isn't usually predictable so it is not a good idea to rely on this behavior. Instead, make sure that IDs are unique by using a script.

Running reindex asynchronously

If the request contains wait_for_completion=false, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at _tasks/<task_id>.

Reindex from multiple sources

If you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources. That way you can resume the process if there are any errors by removing the partially completed source and starting over. It also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.

For example, you can use a bash script like this:

for index in i1 i2 i3 i4 i5; do
  curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{
    "source": {
      "index": "'$index'"
    },
    "dest": {
      "index": "'$index'-reindexed"
    }
  }'
done

** Throttling**

Set requests_per_second to any positive decimal number (1.4, 6, 1000, for example) to throttle the rate at which reindex issues batches of index operations. Requests are throttled by padding each batch with a wait time. To turn off throttling, set requests_per_second to -1.

The throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:

target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds

Since the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth".

Slicing

Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.

NOTE: Reindexing from remote clusters does not support manual or automatic slicing.

You can slice a reindex request manually by providing a slice ID and total number of slices to each request. You can also let reindex automatically parallelize by using sliced scroll to slice on _id. The slices parameter specifies the number of slices to use.

Adding slices to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:

  • You can see these requests in the tasks API. These sub-requests are "child" tasks of the task for the request with slices.
  • Fetching the status of the task for the request with slices only contains the status of completed slices.
  • These sub-requests are individually addressable for things like cancellation and rethrottling.
  • Rethrottling the request with slices will rethrottle the unfinished sub-request proportionally.
  • Canceling the request with slices will cancel each sub-request.
  • Due to the nature of slices, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
  • Parameters like requests_per_second and max_docs on a request with slices are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that using max_docs with slices might not result in exactly max_docs documents being reindexed.
  • Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.

If slicing automatically, setting slices to auto will choose a reasonable number for most indices. If slicing manually or otherwise tuning automatic slicing, use the following guidelines.

Query performance is most efficient when the number of slices is equal to the number of shards in the index. If that number is large (for example, 500), choose a lower number as too many slices will hurt performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.

Indexing performance scales linearly across available resources with the number of slices.

Whether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.

Modify documents during reindexing

Like _update_by_query, reindex operations support a script that modifies the document. Unlike _update_by_query, the script is allowed to modify the document's metadata.

Just as in _update_by_query, you can set ctx.op to change the operation that is run on the destination. For example, set ctx.op to noop if your script decides that the document doesn’t have to be indexed in the destination. This "no operation" will be reported in the noop counter in the response body. Set ctx.op to delete if your script decides that the document must be deleted from the destination. The deletion will be reported in the deleted counter in the response body. Setting ctx.op to anything else will return an error, as will setting any other field in ctx.

Think of the possibilities! Just be careful; you are able to change:

  • _id
  • _index
  • _version
  • _routing

Setting _version to null or clearing it from the ctx map is just like not sending the version in an indexing request. It will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.

Reindex from remote

Reindex supports reindexing from a remote Elasticsearch cluster. The host parameter must contain a scheme, host, port, and optional path. The username and password parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication. Be sure to use HTTPS when using basic authentication or the password will be sent in plain text. There are a range of settings available to configure the behavior of the HTTPS connection.

When using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key. Remote hosts must be explicitly allowed with the reindex.remote.whitelist setting. It can be set to a comma delimited list of allowed remote host and port combinations. Scheme is ignored; only the host and port are used. For example:

reindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"]

The list of allowed hosts must be configured on any nodes that will coordinate the reindex. This feature should work with remote clusters of any version of Elasticsearch. This should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.

WARNING: Elasticsearch does not support forward compatibility across major versions. For example, you cannot reindex from a 7.x cluster into a 6.x cluster.

To enable queries sent to older versions of Elasticsearch, the query parameter is sent directly to the remote host without validation or modification.

NOTE: Reindexing from remote clusters does not support manual or automatic slicing.

Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you'll need to use a smaller batch size. It is also possible to set the socket read timeout on the remote connection with the socket_timeout field and the connection timeout with the connect_timeout field. Both default to 30 seconds.

Configuring SSL parameters

Reindex from remote supports configurable SSL settings. These must be specified in the elasticsearch.yml file, with the exception of the secure settings, which you add in the Elasticsearch keystore. It is not possible to configure SSL in the body of the reindex request.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-reindex.html

Parameters:
  • dest (Mapping[str, Any] | None) – The destination you are copying to.

  • source (Mapping[str, Any] | None) – The source you are copying from.

  • conflicts (str | Literal['abort', 'proceed'] | None) – Indicates whether to continue reindexing even when there are conflicts.

  • max_docs (int | None) – The maximum number of documents to reindex. By default, all documents are reindexed. If it is a value less then or equal to scroll_size, a scroll will not be used to retrieve the results for the operation. If conflicts is set to proceed, the reindex operation could attempt to reindex more documents from the source than max_docs until it has successfully indexed max_docs documents into the target or it has gone through every document in the source query.

  • refresh (bool | None) – If true, the request refreshes affected shards to make this operation visible to search.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second. By default, there is no throttle.

  • require_alias (bool | None) – If true, the destination must be an index alias.

  • script (Mapping[str, Any] | None) – The script to run to update the document source or metadata when reindexing.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – The period of time that a consistent view of the index should be maintained for scrolled search.

  • size (int | None)

  • slices (int | str | Literal['auto'] | None) – The number of slices this task should be divided into. It defaults to one slice, which means the task isn’t sliced into subtasks. Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts. NOTE: Reindexing from remote clusters does not support manual or automatic slicing. If set to auto, Elasticsearch chooses the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple sources, it will choose the number of slices based on the index or backing index with the smallest number of shards.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period each indexing waits for automatic index creation, dynamic mapping updates, and waiting for active shards. By default, Elasticsearch waits for at least one minute before failing. The actual wait time could be longer, particularly when multiple waits occur.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. Set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value is one, which means it waits for each primary shard to be active.

  • wait_for_completion (bool | None) – If true, the request blocks until the operation is complete.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

reindex_rethrottle(*, task_id, error_trace=None, filter_path=None, human=None, pretty=None, requests_per_second=None)

Throttle a reindex operation.

Change the number of requests per second for a particular reindex operation. For example:

POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1

Rethrottling that speeds up the query takes effect immediately. Rethrottling that slows down the query will take effect after completing the current batch. This behavior prevents scroll timeouts.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-reindex.html

Parameters:
  • task_id (str) – The task identifier, which can be found by using the tasks API.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second. It can be either -1 to turn off throttling or any decimal number like 1.7 or 12 to throttle to that level.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

render_search_template(*, id=None, error_trace=None, file=None, filter_path=None, human=None, params=None, pretty=None, source=None, body=None)

Render a search template.

Render a search template as a search request body.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/render-search-template-api.html

Parameters:
  • id (str | None) – The ID of the search template to render. If no source is specified, this or the id request body parameter is required.

  • file (str | None)

  • params (Mapping[str, Any] | None) – Key-value pairs used to replace Mustache variables in the template. The key is the variable name. The value is the variable value.

  • source (str | None) – An inline search template. It supports the same parameters as the search API’s request body. These parameters also support Mustache variables. If no id or <templated-id> is specified, this parameter is required.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

scripts_painless_execute(*, context=None, context_setup=None, error_trace=None, filter_path=None, human=None, pretty=None, script=None, body=None)

Run a script. Runs a script and returns a result.

https://www.elastic.co/guide/en/elasticsearch/painless/8.17/painless-execute-api.html

Parameters:
  • context (str | None) – The context that the script should run in.

  • context_setup (Mapping[str, Any] | None) – Additional parameters for the context.

  • script (Mapping[str, Any] | None) – The Painless script to execute.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

scroll(*, scroll_id=None, error_trace=None, filter_path=None, human=None, pretty=None, rest_total_hits_as_int=None, scroll=None, body=None)

Run a scrolling search.

IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).

The scroll API gets large sets of results from a single scrolling search request. To get the necessary scroll ID, submit a search API request that includes an argument for the scroll query parameter. The scroll parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the _scroll_id response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request. If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.

You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.

IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/scroll-api.html

Parameters:
  • scroll_id (str | None) – The scroll ID of the search.

  • rest_total_hits_as_int (bool | None) – If true, the API response’s hit.total property is returned as an integer. If false, the API response’s hit.total property is returned as an object.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – The period to retain the search context for scrolling.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

search(*, index=None, aggregations=None, aggs=None, allow_no_indices=None, allow_partial_search_results=None, analyze_wildcard=None, analyzer=None, batched_reduce_size=None, ccs_minimize_roundtrips=None, collapse=None, default_operator=None, df=None, docvalue_fields=None, error_trace=None, expand_wildcards=None, explain=None, ext=None, fields=None, filter_path=None, force_synthetic_source=None, from_=None, highlight=None, human=None, ignore_throttled=None, ignore_unavailable=None, include_named_queries_score=None, indices_boost=None, knn=None, lenient=None, max_concurrent_shard_requests=None, min_compatible_shard_node=None, min_score=None, pit=None, post_filter=None, pre_filter_shard_size=None, preference=None, pretty=None, profile=None, q=None, query=None, rank=None, request_cache=None, rescore=None, rest_total_hits_as_int=None, retriever=None, routing=None, runtime_mappings=None, script_fields=None, scroll=None, search_after=None, search_type=None, seq_no_primary_term=None, size=None, slice=None, sort=None, source=None, source_excludes=None, source_includes=None, stats=None, stored_fields=None, suggest=None, suggest_field=None, suggest_mode=None, suggest_size=None, suggest_text=None, terminate_after=None, timeout=None, track_scores=None, track_total_hits=None, typed_keys=None, version=None, body=None)

Run a search.

Get search hits that match the query defined in the request. You can provide search queries using the q query string parameter or the request body. If both are specified, only the query parameter is used.

If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges. To search a point in time (PIT) for an alias, you must have the read index privilege for the alias's data streams or indices.

Search slicing

When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the slice and pit properties. By default the splitting is done first on the shards, then locally on each shard. The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.

For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.

IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-search.html

Parameters:
  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams and indices, omit this parameter or use * or _all.

  • aggregations (Mapping[str, Mapping[str, Any]] | None) – Defines the aggregations that are run as part of the search request.

  • aggs (Mapping[str, Mapping[str, Any]] | None) – Defines the aggregations that are run as part of the search request.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • allow_partial_search_results (bool | None) – If true and there are shard request timeouts or shard failures, the request returns partial results. If false, it returns an error with no partial results. To override the default behavior, you can set the search.default_allow_partial_results cluster setting to false.

  • analyze_wildcard (bool | None) – If true, wildcard and prefix queries are analyzed. This parameter can be used only when the q query string parameter is specified.

  • analyzer (str | None) – The analyzer to use for the query string. This parameter can be used only when the q query string parameter is specified.

  • batched_reduce_size (int | None) – The number of shard results that should be reduced at once on the coordinating node. If the potential number of shards in the request can be large, this value should be used as a protection mechanism to reduce the memory overhead per search request.

  • ccs_minimize_roundtrips (bool | None) – If true, network round-trips between the coordinating node and the remote clusters are minimized when running cross-cluster search (CCS) requests.

  • collapse (Mapping[str, Any] | None) – Collapses search results the values of the specified field.

  • default_operator (str | Literal['and', 'or'] | None) – The default operator for the query string query: AND or OR. This parameter can be used only when the q query string parameter is specified.

  • df (str | None) – The field to use as a default when no field prefix is given in the query string. This parameter can be used only when the q query string parameter is specified.

  • docvalue_fields (Sequence[Mapping[str, Any]] | None) – An array of wildcard (*) field patterns. The request returns doc values for field names matching these patterns in the hits.fields property of the response.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values such as open,hidden.

  • explain (bool | None) – If true, the request returns detailed information about score computation as part of a hit.

  • ext (Mapping[str, Any] | None) – Configuration of search extensions defined by Elasticsearch plugins.

  • fields (Sequence[Mapping[str, Any]] | None) – An array of wildcard (*) field patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.

  • force_synthetic_source (bool | None) – Should this request force synthetic _source? Use this to test if the mapping supports synthetic _source and to get a sense of the worst case performance. Fetches with this enabled will be slower the enabling synthetic source natively in the index.

  • from – The starting document offset, which must be non-negative. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.

  • highlight (Mapping[str, Any] | None) – Specifies the highlighter to use for retrieving highlighted snippets from one or more fields in your search results.

  • ignore_throttled (bool | None) – If true, concrete, expanded or aliased indices will be ignored when frozen.

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • include_named_queries_score (bool | None) – If true, the response includes the score contribution from any named queries. This functionality reruns each named query on every hit in a search response. Typically, this adds a small overhead to a request. However, using computationally expensive named queries on a large number of hits may add significant overhead.

  • indices_boost (Sequence[Mapping[str, float]] | None) – Boost the _score of documents from specified indices. The boost value is the factor by which scores are multiplied. A boost value greater than 1.0 increases the score. A boost value between 0 and 1.0 decreases the score.

  • knn (Mapping[str, Any] | Sequence[Mapping[str, Any]] | None) – The approximate kNN search to run.

  • lenient (bool | None) – If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when the q query string parameter is specified.

  • max_concurrent_shard_requests (int | None) – The number of concurrent shard requests per node that the search runs concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests.

  • min_compatible_shard_node (str | None) – The minimum version of the node that can handle the request. Any handling node with a lower version will fail the request.

  • min_score (float | None) – The minimum _score for matching documents. Documents with a lower _score are not included in the search results.

  • pit (Mapping[str, Any] | None) – Limit the search to a point in time (PIT). If you provide a PIT, you cannot specify an <index> in the request path.

  • post_filter (Mapping[str, Any] | None) – Use the post_filter parameter to filter search results. The search hits are filtered after the aggregations are calculated. A post filter has no impact on the aggregation results.

  • pre_filter_shard_size (int | None) – A threshold that enforces a pre-filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method (if date filters are mandatory to match but the shard bounds and the query are disjoint). When unspecified, the pre-filter phase is executed if any of these conditions is met: * The request targets more than 128 shards. * The request targets one or more read-only index. * The primary sort of the query targets an indexed field.

  • preference (str | None) – The nodes and shards used for the search. By default, Elasticsearch selects from eligible nodes and shards using adaptive replica selection, accounting for allocation awareness. Valid values are: * _only_local to run the search only on shards on the local node; * _local to, if possible, run the search on shards on the local node, or if not, select shards using the default method; * _only_nodes:<node-id>,<node-id> to run the search on only the specified nodes IDs, where, if suitable shards exist on more than one selected node, use shards on those nodes using the default method, or if none of the specified nodes are available, select shards from any available node using the default method; * _prefer_nodes:<node-id>,<node-id> to if possible, run the search on the specified nodes IDs, or if not, select shards using the default method; * _shards:<shard>,<shard> to run the search only on the specified shards; * <custom-string> (any string that does not start with _) to route searches with the same <custom-string> to the same shards in the same order.

  • profile (bool | None) – Set to true to return detailed timing information about the execution of individual components in a search request. NOTE: This is a debugging tool and adds significant overhead to search execution.

  • q (str | None) – A query in the Lucene query string syntax. Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing. IMPORTANT: This parameter overrides the query parameter in the request body. If both parameters are specified, documents matching the query request body parameter are not returned.

  • query (Mapping[str, Any] | None) – The search definition using the Query DSL.

  • rank (Mapping[str, Any] | None) – The Reciprocal Rank Fusion (RRF) to use.

  • request_cache (bool | None) – If true, the caching of search results is enabled for requests where size is 0. It defaults to index level settings.

  • rescore (Mapping[str, Any] | Sequence[Mapping[str, Any]] | None) – Can be used to improve precision by reordering just the top (for example 100 - 500) documents returned by the query and post_filter phases.

  • rest_total_hits_as_int (bool | None) – Indicates whether hits.total should be rendered as an integer or an object in the rest search response.

  • retriever (Mapping[str, Any] | None) – A retriever is a specification to describe top documents returned from a search. A retriever replaces other elements of the search API that also return top documents such as query and knn.

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • runtime_mappings (Mapping[str, Mapping[str, Any]] | None) – One or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.

  • script_fields (Mapping[str, Mapping[str, Any]] | None) – Retrieve a script evaluation (based on different fields) for each hit.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – The period to retain the search context for scrolling. By default, this value cannot exceed 1d (24 hours). You can change this limit by using the search.max_keep_alive cluster-level setting.

  • search_after (Sequence[None | bool | float | int | str | Any] | None) – Used to retrieve the next page of hits using a set of sort values from the previous page.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – Indicates how distributed term frequencies are calculated for relevance scoring.

  • seq_no_primary_term (bool | None) – If true, the request returns sequence number and primary term of the last modification of each hit.

  • size (int | None) – The number of hits to return, which must not be negative. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after property.

  • slice (Mapping[str, Any] | None) – Split a scrolled search into multiple slices that can be consumed independently.

  • sort (Sequence[str | Mapping[str, Any]] | str | Mapping[str, Any] | None) – A comma-separated list of <field>:<direction> pairs.

  • source (bool | Mapping[str, Any] | None) – The source fields that are returned for matching documents. These fields are returned in the hits._source property of the search response. If the stored_fields property is specified, the _source property defaults to false. Otherwise, it defaults to true.

  • source_excludes (str | Sequence[str] | None) – A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter. If the _source parameter is false, this parameter is ignored.

  • source_includes (str | Sequence[str] | None) – A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stats (Sequence[str] | None) – The stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.

  • stored_fields (str | Sequence[str] | None) – A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source property defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.

  • suggest (Mapping[str, Any] | None) – Defines a suggester that provides similar looking terms based on a provided text.

  • suggest_field (str | None) – The field to use for suggestions.

  • suggest_mode (str | Literal['always', 'missing', 'popular'] | None) – The suggest mode. This parameter can be used only when the suggest_field and suggest_text query string parameters are specified.

  • suggest_size (int | None) – The number of suggestions to return. This parameter can be used only when the suggest_field and suggest_text query string parameters are specified.

  • suggest_text (str | None) – The source text for which the suggestions should be returned. This parameter can be used only when the suggest_field and suggest_text query string parameters are specified.

  • terminate_after (int | None) – The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. IMPORTANT: Use with caution. Elasticsearch applies this property to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this property for requests that target data streams with backing indices across multiple data tiers. If set to 0 (default), the query does not terminate early.

  • timeout (str | None) – The period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.

  • track_scores (bool | None) – If true, calculate and return document scores, even if the scores are not used for sorting.

  • track_total_hits (bool | int | None) – Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query.

  • typed_keys (bool | None) – If true, aggregation and suggester names are be prefixed by their respective types in the response.

  • version (bool | None) – If true, the request returns the document version as part of a hit.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • from_ (int | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

search_mvt(*, index, field, zoom, x, y, aggs=None, buffer=None, error_trace=None, exact_bounds=None, extent=None, fields=None, filter_path=None, grid_agg=None, grid_precision=None, grid_type=None, human=None, pretty=None, query=None, runtime_mappings=None, size=None, sort=None, track_total_hits=None, with_labels=None, body=None)

Search a vector tile.

Search a vector tile for geospatial values. Before using this API, you should be familiar with the Mapbox vector tile specification. The API returns results as a binary mapbox vector tile.

Internally, Elasticsearch translates a vector tile search API request into a search containing:

  • A geo_bounding_box query on the <field>. The query uses the <zoom>/<x>/<y> tile as a bounding box.
  • A geotile_grid or geohex_grid aggregation on the <field>. The grid_agg parameter determines the aggregation type. The aggregation uses the <zoom>/<x>/<y> tile as a bounding box.
  • Optionally, a geo_bounds aggregation on the <field>. The search only includes this aggregation if the exact_bounds parameter is true.
  • If the optional parameter with_labels is true, the internal search will include a dynamic runtime field that calls the getLabelPosition function of the geometry doc value. This enables the generation of new point features containing suggested geometry labels, so that, for example, multi-polygons will have only one label.

For example, Elasticsearch may translate a vector tile search API request with a grid_agg argument of geotile and an exact_bounds argument of true into the following search

GET my-index/_search
{
  "size": 10000,
  "query": {
    "geo_bounding_box": {
      "my-geo-field": {
        "top_left": {
          "lat": -40.979898069620134,
          "lon": -45
        },
        "bottom_right": {
          "lat": -66.51326044311186,
          "lon": 0
        }
      }
    }
  },
  "aggregations": {
    "grid": {
      "geotile_grid": {
        "field": "my-geo-field",
        "precision": 11,
        "size": 65536,
        "bounds": {
          "top_left": {
            "lat": -40.979898069620134,
            "lon": -45
          },
          "bottom_right": {
            "lat": -66.51326044311186,
            "lon": 0
          }
        }
      }
    },
    "bounds": {
      "geo_bounds": {
        "field": "my-geo-field",
        "wrap_longitude": false
      }
    }
  }
}

The API returns results as a binary Mapbox vector tile. Mapbox vector tiles are encoded as Google Protobufs (PBF). By default, the tile contains three layers:

  • A hits layer containing a feature for each <field> value matching the geo_bounding_box query.
  • An aggs layer containing a feature for each cell of the geotile_grid or geohex_grid. The layer only contains features for cells with matching data.
  • A meta layer containing:
    • A feature containing a bounding box. By default, this is the bounding box of the tile.
    • Value ranges for any sub-aggregations on the geotile_grid or geohex_grid.
    • Metadata for the search.

The API only returns features that can display at its zoom level. For example, if a polygon feature has no area at its zoom level, the API omits it. The API returns errors as UTF-8 encoded JSON.

IMPORTANT: You can specify several options for this API as either a query parameter or request body parameter. If you specify both parameters, the query parameter takes precedence.

Grid precision for geotile

For a grid_agg of geotile, you can use cells in the aggs layer as tiles for lower zoom levels. grid_precision represents the additional zoom levels available through these cells. The final precision is computed by as follows: <zoom> + grid_precision. For example, if <zoom> is 7 and grid_precision is 8, then the geotile_grid aggregation will use a precision of 15. The maximum final precision is 29. The grid_precision also determines the number of cells for the grid as follows: (2^grid_precision) x (2^grid_precision). For example, a value of 8 divides the tile into a grid of 256 x 256 cells. The aggs layer only contains features for cells with matching data.

Grid precision for geohex

For a grid_agg of geohex, Elasticsearch uses <zoom> and grid_precision to calculate a final precision as follows: <zoom> + grid_precision.

This precision determines the H3 resolution of the hexagonal cells produced by the geohex aggregation. The following table maps the H3 resolution for each precision. For example, if <zoom> is 3 and grid_precision is 3, the precision is 6. At a precision of 6, hexagonal cells have an H3 resolution of 2. If <zoom> is 3 and grid_precision is 4, the precision is 7. At a precision of 7, hexagonal cells have an H3 resolution of 3.

Precision Unique tile bins H3 resolution Unique hex bins Ratio
1 4 0 122 30.5
2 16 0 122 7.625
3 64 1 842 13.15625
4 256 1 842 3.2890625
5 1024 2 5882 5.744140625
6 4096 2 5882 1.436035156
7 16384 3 41162 2.512329102
8 65536 3 41162 0.6280822754
9 262144 4 288122 1.099098206
10 1048576 4 288122 0.2747745514
11 4194304 5 2016842 0.4808526039
12 16777216 6 14117882 0.8414913416
13 67108864 6 14117882 0.2103728354
14 268435456 7 98825162 0.3681524172
15 1073741824 8 691776122 0.644266719
16 4294967296 8 691776122 0.1610666797
17 17179869184 9 4842432842 0.2818666889
18 68719476736 10 33897029882 0.4932667053
19 274877906944 11 237279209162 0.8632167343
20 1099511627776 11 237279209162 0.2158041836
21 4398046511104 12 1660954464122 0.3776573213
22 17592186044416 13 11626681248842 0.6609003122
23 70368744177664 13 11626681248842 0.165225078
24 281474976710656 14 81386768741882 0.2891438866
25 1125899906842620 15 569707381193162 0.5060018015
26 4503599627370500 15 569707381193162 0.1265004504
27 18014398509482000 15 569707381193162 0.03162511259
28 72057594037927900 15 569707381193162 0.007906278149
29 288230376151712000 15 569707381193162 0.001976569537

Hexagonal cells don't align perfectly on a vector tile. Some cells may intersect more than one vector tile. To compute the H3 resolution for each precision, Elasticsearch compares the average density of hexagonal bins at each resolution with the average density of tile bins at each zoom level. Elasticsearch uses the H3 resolution that is closest to the corresponding geotile density.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-vector-tile-api.html

Parameters:
  • index (str | Sequence[str]) – Comma-separated list of data streams, indices, or aliases to search

  • field (str) – Field containing geospatial data to return

  • zoom (int) – Zoom level for the vector tile to search

  • x (int) – X coordinate for the vector tile to search

  • y (int) – Y coordinate for the vector tile to search

  • aggs (Mapping[str, Mapping[str, Any]] | None) – Sub-aggregations for the geotile_grid. It supports the following aggregation types: - avg - boxplot - cardinality - extended stats - max - median absolute deviation - min - percentile - percentile-rank - stats - sum - value count The aggregation names can’t start with _mvt_. The _mvt_ prefix is reserved for internal aggregations.

  • buffer (int | None) – The size, in pixels, of a clipping buffer outside the tile. This allows renderers to avoid outline artifacts from geometries that extend past the extent of the tile.

  • exact_bounds (bool | None) – If false, the meta layer’s feature is the bounding box of the tile. If true, the meta layer’s feature is a bounding box resulting from a geo_bounds aggregation. The aggregation runs on <field> values that intersect the <zoom>/<x>/<y> tile with wrap_longitude set to false. The resulting bounding box may be larger than the vector tile.

  • extent (int | None) – The size, in pixels, of a side of the tile. Vector tiles are square with equal sides.

  • fields (str | Sequence[str] | None) – The fields to return in the hits layer. It supports wildcards (*). This parameter does not support fields with array values. Fields with array values may return inconsistent results.

  • grid_agg (str | Literal['geohex', 'geotile'] | None) – The aggregation used to create a grid for the field.

  • grid_precision (int | None) – Additional zoom levels available through the aggs layer. For example, if <zoom> is 7 and grid_precision is 8, you can zoom in up to level 15. Accepts 0-8. If 0, results don’t include the aggs layer.

  • grid_type (str | Literal['centroid', 'grid', 'point'] | None) – Determines the geometry type for features in the aggs layer. In the aggs layer, each feature represents a geotile_grid cell. If grid, each feature is a polygon of the cells bounding box. If `point, each feature is a Point that is the centroid of the cell.

  • query (Mapping[str, Any] | None) – The query DSL used to filter documents for the search.

  • runtime_mappings (Mapping[str, Mapping[str, Any]] | None) – Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.

  • size (int | None) – The maximum number of features to return in the hits layer. Accepts 0-10000. If 0, results don’t include the hits layer.

  • sort (Sequence[str | Mapping[str, Any]] | str | Mapping[str, Any] | None) – Sort the features in the hits layer. By default, the API calculates a bounding box for each feature. It sorts features based on this box’s diagonal length, from longest to shortest.

  • track_total_hits (bool | int | None) – The number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query.

  • with_labels (bool | None) – If true, the hits and aggs layers will contain additional point features representing suggested label positions for the original features. * Point and MultiPoint features will have one of the points selected. * Polygon and MultiPolygon features will have a single point generated, either the centroid, if it is within the polygon, or another point within the polygon selected from the sorted triangle-tree. * LineString features will likewise provide a roughly central point selected from the triangle-tree. * The aggregation results will provide one central point for each aggregation bucket. All attributes from the original features will also be copied to the new label features. In addition, the new features will be distinguishable using the tag _mvt_label_position.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

BinaryApiResponse

search_shards(*, index=None, allow_no_indices=None, error_trace=None, expand_wildcards=None, filter_path=None, human=None, ignore_unavailable=None, local=None, preference=None, pretty=None, routing=None)

Get the search shards.

Get the indices and shards that a search request would be run against. This information can be useful for working out issues or planning optimizations with routing and shard preferences. When filtered aliases are used, the filter is returned as part of the indices section.

If the Elasticsearch security features are enabled, you must have the view_index_metadata or manage index privilege for the target data stream, index, or alias.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-shards.html

Parameters:
  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams and indices, omit this parameter or use * or _all.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • local (bool | None) – If true, the request retrieves information from the local node only.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]

search_template(*, index=None, allow_no_indices=None, ccs_minimize_roundtrips=None, error_trace=None, expand_wildcards=None, explain=None, filter_path=None, human=None, id=None, ignore_throttled=None, ignore_unavailable=None, params=None, preference=None, pretty=None, profile=None, rest_total_hits_as_int=None, routing=None, scroll=None, search_type=None, source=None, typed_keys=None, body=None)

Run a search with a search template.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-template-api.html

Parameters:
  • index (str | Sequence[str] | None) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*).

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • ccs_minimize_roundtrips (bool | None) – If true, network round-trips are minimized for cross-cluster search requests.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

  • explain (bool | None) – If true, returns detailed information about score calculation as part of each hit. If you specify both this and the explain query parameter, the API uses only the query parameter.

  • id (str | None) – The ID of the search template to use. If no source is specified, this parameter is required.

  • ignore_throttled (bool | None) – If true, specified concrete, expanded, or aliased indices are not included in the response when throttled.

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • params (Mapping[str, Any] | None) – Key-value pairs used to replace Mustache variables in the template. The key is the variable name. The value is the variable value.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • profile (bool | None) – If true, the query execution is profiled.

  • rest_total_hits_as_int (bool | None) – If true, hits.total is rendered as an integer in the response. If false, it is rendered as an object.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – Specifies how long a consistent view of the index should be maintained for scrolled search.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – The type of the search operation.

  • source (str | None) – An inline search template. Supports the same parameters as the search API’s request body. It also supports Mustache variables. If no id is specified, this parameter is required.

  • typed_keys (bool | None) – If true, the response prefixes aggregation and suggester names with their respective types.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

terms_enum(*, index, field=None, case_insensitive=None, error_trace=None, filter_path=None, human=None, index_filter=None, pretty=None, search_after=None, size=None, string=None, timeout=None, body=None)

Get terms in an index.

Discover terms that match a partial string in an index. This API is designed for low-latency look-ups used in auto-complete scenarios.

info The terms enum API may return terms from deleted documents. Deleted documents are initially only marked as deleted. It is not until their segments are merged that documents are actually deleted. Until that happens, the terms enum API will return terms from these documents.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/search-terms-enum.html

Parameters:
  • index (str) – A comma-separated list of data streams, indices, and index aliases to search. Wildcard (*) expressions are supported. To search all data streams or indices, omit this parameter or use * or _all.

  • field (str | None) – The string to match at the start of indexed terms. If not provided, all terms in the field are considered.

  • case_insensitive (bool | None) – When true, the provided search string is matched against index terms without case sensitivity.

  • index_filter (Mapping[str, Any] | None) – Filter an index shard if the provided query rewrites to match_none.

  • search_after (str | None) – The string after which terms in the index should be returned. It allows for a form of pagination if the last result from one request is passed as the search_after parameter for a subsequent request.

  • size (int | None) – The number of matching terms to return.

  • string (str | None) – The string to match at the start of indexed terms. If it is not provided, all terms in the field are considered. > info > The prefix string cannot be larger than the largest possible keyword value, which is Lucene’s term byte-length limit of 32766.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The maximum length of time to spend collecting results. If the timeout is exceeded the complete flag set to false in the response and the results may be partial or empty.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

termvectors(*, index, id=None, doc=None, error_trace=None, field_statistics=None, fields=None, filter=None, filter_path=None, human=None, offsets=None, payloads=None, per_field_analyzer=None, positions=None, preference=None, pretty=None, realtime=None, routing=None, term_statistics=None, version=None, version_type=None, body=None)

Get term vector information.

Get information and statistics about terms in the fields of a particular document.

You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the fields parameter or by adding the fields to the request body. For example:

GET /my-index-000001/_termvectors/1?fields=message

Fields can be specified using wildcards, similar to the multi match query.

Term vectors are real-time by default, not near real-time. This can be changed by setting realtime parameter to false.

You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.

Term information

  • term frequency in the field (always returned)
  • term positions (positions: true)
  • start and end offsets (offsets: true)
  • term payloads (payloads: true), as base64 encoded bytes

If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.

warn Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.

Behaviour

The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use routing only to hit a particular shard.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-termvectors.html

Parameters:
  • index (str) – The name of the index that contains the document.

  • id (str | None) – A unique identifier for the document.

  • doc (Mapping[str, Any] | None) – An artificial document (a document not present in the index) for which you want to retrieve term vectors.

  • field_statistics (bool | None) – If true, the response includes: * The document count (how many documents contain this field). * The sum of document frequencies (the sum of document frequencies for all terms in this field). * The sum of total term frequencies (the sum of total term frequencies of each term in this field).

  • fields (str | Sequence[str] | None) – A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.

  • filter (Mapping[str, Any] | None) – Filter terms based on their tf-idf scores. This could be useful in order find out a good characteristic vector of a document. This feature works in a similar manner to the second phase of the More Like This Query.

  • offsets (bool | None) – If true, the response includes term offsets.

  • payloads (bool | None) – If true, the response includes term payloads.

  • per_field_analyzer (Mapping[str, str] | None) – Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.

  • positions (bool | None) – If true, the response includes term positions.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • realtime (bool | None) – If true, the request is real-time as opposed to near-real-time.

  • routing (str | None) – A custom value that is used to route operations to a specific shard.

  • term_statistics (bool | None) – If true, the response includes: * The total term frequency (how often a term occurs in all documents). * The document frequency (the number of documents containing the current term). By default these values are not returned since term statistics can have a serious performance impact.

  • version (int | None) – If true, returns the document version as part of a hit.

  • version_type (str | Literal['external', 'external_gte', 'force', 'internal'] | None) – The version type.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

update(*, index, id, detect_noop=None, doc=None, doc_as_upsert=None, error_trace=None, filter_path=None, human=None, if_primary_term=None, if_seq_no=None, lang=None, pretty=None, refresh=None, require_alias=None, retry_on_conflict=None, routing=None, script=None, scripted_upsert=None, source=None, source_excludes=None, source_includes=None, timeout=None, upsert=None, wait_for_active_shards=None, body=None)

Update a document.

Update a document by running a script or passing a partial document.

If the Elasticsearch security features are enabled, you must have the index or write index privilege for the target index or index alias.

The script can update, delete, or skip modifying the document. The API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API. This operation:

  • Gets the document (collocated with the shard) from the index.
  • Runs the specified script.
  • Indexes the result.

The document must still be reindexed, but using this API removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.

The _source field must be enabled to use this API. In addition to _source, you can access the following variables through the ctx map: _index, _type, _id, _version, _routing, and _now (the current timestamp).

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-update.html

Parameters:
  • index (str) – The name of the target index. By default, the index is created automatically if it doesn’t exist.

  • id (str) – A unique identifier for the document to be updated.

  • detect_noop (bool | None) – If true, the result in the response is set to noop (no operation) when there are no changes to the document.

  • doc (Mapping[str, Any] | None) – A partial update to an existing document. If both doc and script are specified, doc is ignored.

  • doc_as_upsert (bool | None) – If true, use the contents of ‘doc’ as the value of ‘upsert’. NOTE: Using ingest pipelines with doc_as_upsert is not supported.

  • if_primary_term (int | None) – Only perform the operation if the document has this primary term.

  • if_seq_no (int | None) – Only perform the operation if the document has this sequence number.

  • lang (str | None) – The script language.

  • refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – If ‘true’, Elasticsearch refreshes the affected shards to make this operation visible to search. If ‘wait_for’, it waits for a refresh to make this operation visible to search. If ‘false’, it does nothing with refreshes.

  • require_alias (bool | None) – If true, the destination must be an index alias.

  • retry_on_conflict (int | None) – The number of times the operation should be retried when a conflict occurs.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • script (Mapping[str, Any] | None) – The script to run to update the document.

  • scripted_upsert (bool | None) – If true, run the script whether or not the document exists.

  • source (bool | Mapping[str, Any] | None) – If false, turn off source retrieval. You can also specify a comma-separated list of the fields you want to retrieve.

  • source_excludes (str | Sequence[str] | None) – The source fields you want to exclude.

  • source_includes (str | Sequence[str] | None) – The source fields you want to retrieve.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period to wait for the following operations: dynamic mapping updates and waiting for active shards. Elasticsearch waits for at least the timeout period before failing. The actual wait time could be longer, particularly when multiple waits occur.

  • upsert (Mapping[str, Any] | None) – If the document does not already exist, the contents of ‘upsert’ are inserted as a new document. If the document exists, the ‘script’ is run.

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of copies of each shard that must be active before proceeding with the operation. Set to ‘all’ or any positive integer up to the total number of shards in the index (number_of_replicas`+1). The default value of `1 means it waits for each primary shard to be active.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

update_by_query(*, index, allow_no_indices=None, analyze_wildcard=None, analyzer=None, conflicts=None, default_operator=None, df=None, error_trace=None, expand_wildcards=None, filter_path=None, from_=None, human=None, ignore_unavailable=None, lenient=None, max_docs=None, pipeline=None, preference=None, pretty=None, q=None, query=None, refresh=None, request_cache=None, requests_per_second=None, routing=None, script=None, scroll=None, scroll_size=None, search_timeout=None, search_type=None, slice=None, slices=None, sort=None, stats=None, terminate_after=None, timeout=None, version=None, version_type=None, wait_for_active_shards=None, wait_for_completion=None, body=None)

Update documents. Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:

  • read
  • index or write

You can specify the query criteria in the request URI or the request body using the same syntax as the search API.

When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning. When the versions match, the document is updated and the version number is incremented. If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails. You can opt to count version conflicts instead of halting and returning by setting conflicts to proceed. Note that if you opt to count version conflicts, the operation could attempt to update more documents from the source than max_docs until it has successfully updated max_docs documents or it has gone through every document in the source query.

NOTE: Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.

While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.

Throttling update requests

To control the rate at which update by query issues batches of update operations, you can set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to turn off throttling.

Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:

target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds

Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".

Slicing

Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.

Setting slices to auto chooses a reasonable number for most data streams and indices. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.

Adding slices to _update_by_query just automates the manual process of creating sub-requests, which means it has some quirks:

  • You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
  • Fetching the status of the task for the request with slices only contains the status of completed slices.
  • These sub-requests are individually addressable for things like cancellation and rethrottling.
  • Rethrottling the request with slices will rethrottle the unfinished sub-request proportionally.
  • Canceling the request with slices will cancel each sub-request.
  • Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
  • Parameters like requests_per_second and max_docs on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that using max_docs with slices might not result in exactly max_docs documents being updated.
  • Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.

If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:

  • Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
  • Update performance scales linearly across available resources with the number of slices.

Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.

Update the document source

Update by query supports scripts to update the document source. As with the update API, you can set ctx.op to change the operation that is performed.

Set ctx.op = "noop" if your script decides that it doesn't have to make any changes. The update by query operation skips updating the document and increments the noop counter.

Set ctx.op = "delete" if your script decides that the document should be deleted. The update by query operation deletes the document and increments the deleted counter.

Update by query supports only index, noop, and delete. Setting ctx.op to anything else is an error. Setting any other field in ctx is an error. This API enables you to only modify the source of matching documents; you cannot move them.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-update-by-query.html

Parameters:
  • index (str | Sequence[str]) – A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (*). To search all data streams or indices, omit this parameter or use * or _all.

  • allow_no_indices (bool | None) – If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • analyze_wildcard (bool | None) – If true, wildcard and prefix queries are analyzed. This parameter can be used only when the q query string parameter is specified.

  • analyzer (str | None) – The analyzer to use for the query string. This parameter can be used only when the q query string parameter is specified.

  • conflicts (str | Literal['abort', 'proceed'] | None) – The preferred behavior when update by query hits version conflicts: abort or proceed.

  • default_operator (str | Literal['and', 'or'] | None) – The default operator for query string query: AND or OR. This parameter can be used only when the q query string parameter is specified.

  • df (str | None) – The field to use as default where no field prefix is given in the query string. This parameter can be used only when the q query string parameter is specified.

  • expand_wildcards (Sequence[str | Literal['all', 'closed', 'hidden', 'none', 'open']] | str | ~typing.Literal['all', 'closed', 'hidden', 'none', 'open'] | None) – The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

  • from – Starting offset (default: 0)

  • ignore_unavailable (bool | None) – If false, the request returns an error if it targets a missing or closed index.

  • lenient (bool | None) – If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when the q query string parameter is specified.

  • max_docs (int | None) – The maximum number of documents to update.

  • pipeline (str | None) – The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to _none disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter.

  • preference (str | None) – The node or shard the operation should be performed on. It is random by default.

  • q (str | None) – A query in the Lucene query string syntax.

  • query (Mapping[str, Any] | None) – The documents to update using the Query DSL.

  • refresh (bool | None) – If true, Elasticsearch refreshes affected shards to make the operation visible to search after the request completes. This is different than the update API’s refresh parameter, which causes just the shard that received the request to be refreshed.

  • request_cache (bool | None) – If true, the request cache is used for this request. It defaults to the index-level setting.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second.

  • routing (str | None) – A custom value used to route operations to a specific shard.

  • script (Mapping[str, Any] | None) – The script to run to update the document source or metadata when updating.

  • scroll (str | Literal[-1] | ~typing.Literal[0] | None) – The period to retain the search context for scrolling.

  • scroll_size (int | None) – The size of the scroll request that powers the operation.

  • search_timeout (str | Literal[-1] | ~typing.Literal[0] | None) – An explicit timeout for each search request. By default, there is no timeout.

  • search_type (str | Literal['dfs_query_then_fetch', 'query_then_fetch'] | None) – The type of the search operation. Available options include query_then_fetch and dfs_query_then_fetch.

  • slice (Mapping[str, Any] | None) – Slice the request manually using the provided slice ID and total number of slices.

  • slices (int | str | Literal['auto'] | None) – The number of slices this task should be divided into.

  • sort (Sequence[str] | None) – A comma-separated list of <field>:<direction> pairs.

  • stats (Sequence[str] | None) – The specific tag of the request for logging and statistical purposes.

  • terminate_after (int | None) – The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.

  • timeout (str | Literal[-1] | ~typing.Literal[0] | None) – The period each update request waits for the following operations: dynamic mapping updates, waiting for active shards. By default, it is one minute. This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur.

  • version (bool | None) – If true, returns the document version as part of a hit.

  • version_type (bool | None) – Should the document increment the version number (internal) on hit or not (reindex)

  • wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The timeout parameter controls how long each write request waits for unavailable shards to become available. Both work exactly the way they work in the bulk API.

  • wait_for_completion (bool | None) – If true, the request blocks until the operation is complete. If false, Elasticsearch performs some preflight checks, launches the request, and returns a task ID that you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at .tasks/task/${taskId}.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • from_ (int | None)

  • human (bool | None)

  • pretty (bool | None)

  • body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

update_by_query_rethrottle(*, task_id, error_trace=None, filter_path=None, human=None, pretty=None, requests_per_second=None)

Throttle an update by query operation.

Change the number of requests per second for a particular update by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.

https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docs-update-by-query.html#docs-update-by-query-rethrottle

Parameters:
  • task_id (str) – The ID for the task.

  • requests_per_second (float | None) – The throttle for this request in sub-requests per second. To turn off throttling, set it to -1.

  • error_trace (bool | None)

  • filter_path (str | Sequence[str] | None)

  • human (bool | None)

  • pretty (bool | None)

Return type:

ObjectApiResponse[Any]