Using Asyncio with Elasticsearch¶

Starting in elasticsearch-py v7.8.0 for Python 3.6+ the elasticsearch package supports async/await with Asyncio and Aiohttp. You can either install aiohttp directly or use the [async] extra:

$ python -m pip install elasticsearch>=7.8.0 aiohttp

# - OR -

$ python -m pip install elasticsearch[async]>=7.8.0
Note

Async functionality is a new feature of this library in v7.8.0+ so please open an issue if you find an issue or have a question about async support.

Getting Started with Async¶

After installation all async API endpoints are available via AsyncElasticsearch and are used in the same way as other APIs, just with an extra await:

import asyncio
from elasticsearch import AsyncElasticsearch

es = AsyncElasticsearch()

async def main():
    resp = await es.search(
        index="documents",
        query={"match_all": {}},
        size=20,
    )
    print(resp)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

All APIs that are available under the sync client are also available under the async client.

ASGI Applications and Elastic APM¶

ASGI (Asynchronous Server Gateway Interface) is a new way to serve Python web applications making use of async I/O to achieve better performance. Some examples of ASGI frameworks include FastAPI, Django 3.0+, and Starlette. If you’re using one of these frameworks along with Elasticsearch then you should be using AsyncElasticsearch to avoid blocking the event loop with synchronous network calls for optimal performance.

Elastic APM also supports tracing of async Elasticsearch queries just the same as synchronous queries. For an example on how to configure AsyncElasticsearch with a popular ASGI framework FastAPI and APM tracing there is a pre-built example in the examples/fastapi-apm directory.

Frequently Asked Questions¶

NameError / ImportError when importing `AsyncElasticsearch`?¶

If when trying to use AsyncElasticsearch and you’re receiving a NameError or ImportError you should ensure that you’re running Python 3.6+ (check with $ python --version) and that you have aiohttp installed in your environment (check with $ python -m pip freeze | grep aiohttp). If either of the above conditions is not met then async support won’t be available.

What about the `elasticsearch-async` package?¶

Previously asyncio was supported separately via the elasticsearch-async package. The elasticsearch-async package has been deprecated in favor of AsyncElasticsearch provided by the elasticsearch package in v7.8 and onwards.

Receiving ‘Unclosed client session / connector’ warning?¶

This warning is created by aiohttp when an open HTTP connection is garbage collected. You’ll typically run into this when closing your application. To resolve the issue ensure that close() is called before the AsyncElasticsearch instance is garbage collected.

For example if using FastAPI that might look like this:

from fastapi import FastAPI
from elasticsearch import AsyncElasticsearch

app = FastAPI()
es = AsyncElasticsearch()

# This gets called once the app is shutting down.
@app.on_event("shutdown")
async def app_shutdown():
    await es.close()

Async Helpers¶

Async variants of all helpers are available in elasticsearch.helpers and are all prefixed with async_*. You’ll notice that these APIs are identical to the ones in the sync Helpers documentation.

All async helpers that accept an iterator or generator also accept async iterators and async generators.

Bulk and Streaming Bulk¶

elasticsearch.helpers.async_bulk(client, actions, stats_only=False, ignore_status=(), *args, **kwargs)¶

Helper for the bulk() api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True. Note that by default we raise a BulkIndexError when we encounter an error so options like stats_only only+ apply when raise_on_error is set to False.

When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage. If you need to process a lot of data and want to ignore/collect errors please consider using the async_streaming_bulk() helper which will just return the errors and not store them in memory.

Parameters:

client – instance of AsyncElasticsearch to use

actions – iterator containing the actions

stats_only – if True only report number of successful/failed operations instead of just number of successful and a list of error responses

ignore_status – list of HTTP status code that you want to ignore

Any additional keyword arguments will be passed to async_streaming_bulk() which is used to execute the operation, see async_streaming_bulk() for more accepted parameters.
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_bulk

es = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "doc": {"word": word},
        }

async def main():
    await async_bulk(es, gendata())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
elasticsearch.helpers.async_streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=104857600, raise_on_error=True, expand_action_callback=<function expand_action>, raise_on_exception=True, max_retries=0, initial_backoff=2, max_backoff=600, yield_ok=True, ignore_status=(), *args, **kwargs)¶

Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use async_bulk() which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent.

If you specify max_retries it will also retry any documents that were rejected with a 429 status code. To do this it will wait (by calling asyncio.sleep) for initial_backoff seconds and then, every subsequent rejection for the same chunk, for double the time every time up to max_backoff seconds.

Parameters:

client – instance of AsyncElasticsearch to use

actions – iterable or async iterable containing the actions to be executed

chunk_size – number of docs in one chunk sent to es (default: 500)

max_chunk_bytes – the maximum size of the request in bytes (default: 100MB)

raise_on_error – raise BulkIndexError containing errors (as .errors) from the execution of the last chunk when some occur. By default we raise.

raise_on_exception – if False then don’t propagate exceptions from call to bulk and just report the items that failed as failed.

expand_action_callback – callback executed on each action passed in, should return a tuple containing the action line and the data line (None if data line should be omitted).

max_retries – maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429

initial_backoff – number of seconds we should wait before the first retry. Any subsequent retries will be powers of initial_backoff * 2**retry_number

max_backoff – maximum number of seconds a retry will wait

yield_ok – if set to False will skip successful documents in the output

ignore_status – list of HTTP status code that you want to ignore
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_streaming_bulk

es = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "word": word,
        }

async def main():
    async for ok, result in async_streaming_bulk(es, gendata()):
        action, result = result.popitem()
        if not ok:
            print("failed to %s document %s" % ())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Scan¶

elasticsearch.helpers.async_scan(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, size=1000, request_timeout=None, clear_scroll=True, scroll_kwargs=None, **kwargs)¶
Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan.

Parameters:

client – instance of AsyncElasticsearch to use

query – body for the search() api

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

raise_on_error – raises an exception (ScanError) if an error is encountered (some shards fail to execute). By default we raise.

preserve_order – don’t set the search_type to scan - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.

size – size (per shard) of the batch send at each iteration.

request_timeout – explicit timeout for each call to scan

clear_scroll – explicitly calls delete on the scroll id via the clear scroll API at the end of the method on completion or error, defaults to true.

scroll_kwargs – additional kwargs to be passed to scroll()

Any additional keyword arguments will be passed to the initial search() call:
async_scan(es,
    query={"query": {"match": {"title": "python"}}},
    index="orders-*",
    doc_type="books"
)
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_scan

es = AsyncElasticsearch()

async def main():
    async for doc in async_scan(
        client=es,
        query={"query": {"match": {"title": "python"}}},
        index="orders-*"
    ):
        print(doc)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Reindex¶

elasticsearch.helpers.async_reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', op_type=None, scan_kwargs={}, bulk_kwargs={})¶

Reindex all documents from one index that satisfy a given query to another, potentially (if target_client is specified) on a different cluster. If you don’t specify the query you will reindex all the documents.

Since 2.3 a reindex() api is available as part of elasticsearch itself. It is recommended to use the api instead of this helper wherever possible. The helper is here mostly for backwards compatibility and for situations where more flexibility is needed.

Note

This helper doesn’t transfer mappings, just the data.

Parameters:

client – instance of AsyncElasticsearch to use (for read if target_client is specified as well)

source_index – index (or list of indices) to read documents from

target_index – name of the index in the target cluster to populate

query – body for the search() api

target_client – optional, is specified will be used for writing (thus enabling reindex between clusters)

chunk_size – number of docs in one chunk sent to es (default: 500)

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

op_type – Explicit operation type. Defaults to ‘_index’. Data streams must be set to ‘create’. If not specified, will auto-detect if target_index is a data stream.

scan_kwargs – additional kwargs to be passed to async_scan()

bulk_kwargs – additional kwargs to be passed to async_bulk()

API Reference¶

The API of AsyncElasticsearch is nearly identical to the API of Elasticsearch with the exception that every API call like search() is an async function and requires an await to properly return the response body.

AsyncElasticsearch¶

Note

To reference Elasticsearch APIs that are namespaced like .indices.create() refer to the sync API reference. These APIs are identical between sync and async.
class elasticsearch.AsyncElasticsearch(hosts=None, transport_class=<class 'elasticsearch._async.transport.AsyncTransport'>, **kwargs)¶
Elasticsearch low-level client. Provides a straightforward mapping from Python to ES REST endpoints.

The instance has attributes cat, cluster, indices, ingest, nodes, snapshot and tasks that provide access to instances of CatClient, ClusterClient, IndicesClient, IngestClient, NodesClient, SnapshotClient and TasksClient respectively. This is the preferred (and only supported) way to get access to those classes and their methods.

You can specify your own connection class which should be used by providing the connection_class parameter:
# create connection to localhost using the ThriftConnection
es = Elasticsearch(connection_class=ThriftConnection)
If you want to turn on Sniffing you have several options (described in Transport):
# create connection that will automatically inspect the cluster to get
# the list of active nodes. Start with nodes running on 'esnode1' and
# 'esnode2'
es = Elasticsearch(
    ['esnode1', 'esnode2'],
    # sniff before doing anything
    sniff_on_start=True,
    # refresh nodes after a node fails to respond
    sniff_on_connection_fail=True,
    # and also every 60 seconds
    sniffer_timeout=60
)
Different hosts can have different parameters, use a dictionary per node to specify those:
# connect to localhost directly and another node using SSL on port 443
# and an url_prefix. Note that ``port`` needs to be an int.
es = Elasticsearch([
    {'host': 'localhost'},
    {'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True},
])
If using SSL, there are several parameters that control how we deal with certificates (see Urllib3HttpConnection for detailed description of the options):
es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # make sure we verify SSL certificates
    verify_certs=True,
    # provide a path to CA certs on disk
    ca_certs='/path/to/CA_certs'
)
If using SSL, but don’t verify the certs, a warning message is showed optionally (see Urllib3HttpConnection for detailed description of the options):
es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # no verify SSL certificates
    verify_certs=False,
    # don't show warnings about ssl certs verification
    ssl_show_warn=False
)
SSL client authentication is supported (see Urllib3HttpConnection for detailed description of the options):
es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # make sure we verify SSL certificates
    verify_certs=True,
    # provide a path to CA certs on disk
    ca_certs='/path/to/CA_certs',
    # PEM formatted SSL client certificate
    client_cert='/path/to/clientcert.pem',
    # PEM formatted SSL client key
    client_key='/path/to/clientkey.pem'
)
Alternatively you can use RFC-1738 formatted URLs, as long as they are not in conflict with other options:
es = Elasticsearch(
    [
        'http://user:secret@localhost:9200/',
        'https://user:secret@other_host:443/production'
    ],
    verify_certs=True
)
By default, JSONSerializer is used to encode all outgoing requests. However, you can implement your own custom serializer:
from elasticsearch.serializer import JSONSerializer

class SetEncoder(JSONSerializer):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        if isinstance(obj, Something):
            return 'CustomSomethingRepresentation'
        return JSONSerializer.default(self, obj)

es = Elasticsearch(serializer=SetEncoder())
Parameters:

hosts – list of nodes, or a single node, we should connect to. Node should be a dictionary ({“host”: “localhost”, “port”: 9200}), the entire dictionary will be passed to the Connection class as kwargs, or a string in the format of host[:port] which will be translated to a dictionary automatically. If no value is given the Connection class defaults will be used.

transport_class – Transport subclass to use.

kwargs – any additional arguments will be passed on to the Transport class and, subsequently, to the Connection instances.

bulk(body, index=None, doc_type=None, params=None, headers=None)¶

Allows to perform multiple index/update/delete operations in a single request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-bulk.html

Parameters:

body – The operation definition and data (action-data pairs), separated by newlines

index – Default index for items which don’t provide one

doc_type – Default document type for items which don’t provide one

_source – True or false to return the _source field or not, or default list of fields to return, can be overridden on each sub- request

_source_excludes – Default list of fields to exclude from the returned _source field, can be overridden on each sub-request

_source_includes – Default list of fields to extract and return from the _source field, can be overridden on each sub-request

pipeline – The pipeline id to preprocess incoming documents with

refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for

require_alias – Sets require_alias for all incoming documents. Defaults to unset (false)

routing – Specific routing value

timeout – Explicit operation timeout

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the bulk operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

clear_scroll(body=None, scroll_id=None, params=None, headers=None)¶

Explicitly clears the search context for a scroll.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/clear-scroll-api.html

Parameters:

body – A comma-separated list of scroll IDs to clear if none was specified via the scroll_id parameter

scroll_id – A comma-separated list of scroll IDs to clear

close()¶

Closes the Transport and all internal connections

close_point_in_time(body=None, params=None, headers=None)¶

Close a point in time

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/point-in-time-api.html

Parameters: body – a point-in-time id to close

count(body=None, index=None, doc_type=None, params=None, headers=None)¶

Returns number of documents matching a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-count.html

Parameters:

body – A query to restrict the results specified with the Query DSL (optional)

index – A comma-separated list of indices to restrict the results

doc_type – A comma-separated list of types to restrict the results

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)

analyzer – The analyzer to use for the query string

default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR

df – The field to use as default where no field prefix is given in the query string

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored

min_score – Include only documents with a specific _score value in the result

preference – Specify the node or shard the operation should be performed on (default: random)

q – Query in the Lucene query string syntax

routing – A comma-separated list of specific routing values

terminate_after – The maximum count for each shard, upon reaching which the query execution will terminate early

create(index, id, body, doc_type=None, params=None, headers=None)¶

Creates a new document in the index. Returns a 409 response when a document with a same ID already exists in the index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-index_.html

Parameters:

index – The name of the index

id – Document ID

document – The document

doc_type – The type of the document

pipeline – The pipeline id to preprocess incoming documents with

refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for

routing – Specific routing value

timeout – Explicit operation timeout

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

delete(index, id, doc_type=None, params=None, headers=None)¶

Removes a document from the index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete.html

Parameters:

index – The name of the index

id – The document ID

doc_type – The type of the document

if_primary_term – only perform the delete operation if the last operation that has changed the document has the specified primary term

if_seq_no – only perform the delete operation if the last operation that has changed the document has the specified sequence number

refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for

routing – Specific routing value

timeout – Explicit operation timeout

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the delete operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

delete_by_query(index, body, doc_type=None, params=None, headers=None)¶

Deletes documents matching the provided query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html

Parameters:

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

body – The search definition using the Query DSL

doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)

analyzer – The analyzer to use for the query string

conflicts – What to do when the delete by query hits version conflicts? Valid choices: abort, proceed Default: abort

default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR

df – The field to use as default where no field prefix is given in the query string

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

from – Starting offset (default: 0)

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored

max_docs – Maximum number of documents to process (default: all documents)

preference – Specify the node or shard the operation should be performed on (default: random)

q – Query in the Lucene query string syntax

refresh – Should the effected indexes be refreshed?

request_cache – Specify if request cache should be used for this request or not, defaults to index level setting

requests_per_second – The throttle for this request in sub- requests per second. -1 means no throttle.

routing – A comma-separated list of specific routing values

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

scroll_size – Size on the scroll request powering the delete by query Default: 100

search_timeout – Explicit timeout for each search request. Defaults to no timeout.

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

size – Deprecated, please use max_docs instead

slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1

sort – A comma-separated list of <field>:<direction> pairs

stats – Specific ‘tag’ of the request for logging and statistical purposes

terminate_after – The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.

timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m

version – Specify whether to return document version as part of a hit

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the delete by query operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

wait_for_completion – Should the request should block until the delete by query is complete. Default: True

delete_by_query_rethrottle(task_id, params=None, headers=None)¶

Changes the number of requests per second for a particular Delete By Query operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html

Parameters:

task_id – The task id to rethrottle

requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.

delete_script(id, params=None, headers=None)¶

Deletes a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:

id – Script ID

master_timeout – Specify timeout for connection to master

timeout – Explicit operation timeout

exists(index, id, doc_type=None, params=None, headers=None)¶

Returns information about whether a document exists in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:

index – The name of the index

id – The document ID

doc_type – The type of the document (use _all to fetch the first document matching the ID across all types)

_source – True or false to return the _source field or not, or a list of fields to return

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

preference – Specify the node or shard the operation should be performed on (default: random)

realtime – Specify whether to perform the operation in realtime or search mode

refresh – Refresh the shard containing the document before performing the operation

routing – Specific routing value

stored_fields – A comma-separated list of stored fields to return in the response

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

exists_source(index, id, doc_type=None, params=None, headers=None)¶

Returns information about whether a document source exists in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:

index – The name of the index

id – The document ID

doc_type – The type of the document; deprecated and optional starting with 7.0

_source – True or false to return the _source field or not, or a list of fields to return

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

preference – Specify the node or shard the operation should be performed on (default: random)

realtime – Specify whether to perform the operation in realtime or search mode

refresh – Refresh the shard containing the document before performing the operation

routing – Specific routing value

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

explain(index, id, body=None, doc_type=None, params=None, headers=None)¶

Returns information about why a specific matches (or doesn’t match) a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-explain.html

Parameters:

index – The name of the index

id – The document ID

body – The query definition using the Query DSL

doc_type – The type of the document

_source – True or false to return the _source field or not, or a list of fields to return

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

analyze_wildcard – Specify whether wildcards and prefix queries in the query string query should be analyzed (default: false)

analyzer – The analyzer for the query string query

default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR

df – The default field for query string query (default: _all)

lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored

preference – Specify the node or shard the operation should be performed on (default: random)

q – Query in the Lucene query string syntax

routing – Specific routing value

stored_fields – A comma-separated list of stored fields to return in the response

field_caps(body=None, index=None, params=None, headers=None)¶

Returns the information about the capabilities of fields among multiple indices.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-field-caps.html

Parameters:

body – An index filter specified with the Query DSL

index – A comma-separated list of index names; use _all or empty string to perform the operation on all indices

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

fields – A comma-separated list of field names

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

include_unmapped – Indicates whether unmapped fields should be included in the response.

get(index, id, doc_type=None, params=None, headers=None)¶

Returns a document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:

index – Name of the index that contains the document.

id – Unique identifier of the document.

doc_type – The type of the document (use _all to fetch the first document matching the ID across all types)

_source – True or false to return the _source field or not, or a list of fields to return.

_source_excludes – A comma-separated list of source fields to exclude in the response.

_source_includes – A comma-separated list of source fields to include in the response.

preference – Specifies the node or shard the operation should be performed on. Random by default.

realtime – Boolean) If true, the request is real-time as opposed to near-real-time. Default: True

refresh – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If false, do nothing with refreshes.

routing – Target the specified primary shard.

stored_fields – A comma-separated list of stored fields to return in the response

version – Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.

version_type – Specific version type: internal, external, external_gte. Valid choices: internal, external, external_gte, force

get_script(id, params=None, headers=None)¶

Returns a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:

id – Script ID

master_timeout – Specify timeout for connection to master

get_script_context(params=None, headers=None)¶

Returns all script contexts.

https://www.elastic.co/guide/en/elasticsearch/painless/master/painless-contexts.html

get_script_languages(params=None, headers=None)¶

Returns available script types, languages and contexts

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

get_source(index, id, doc_type=None, params=None, headers=None)¶

Returns the source of a document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:

index – The name of the index

id – The document ID

doc_type – The type of the document; deprecated and optional starting with 7.0

_source – True or false to return the _source field or not, or a list of fields to return

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

preference – Specify the node or shard the operation should be performed on (default: random)

realtime – Specify whether to perform the operation in realtime or search mode

refresh – Refresh the shard containing the document before performing the operation

routing – Specific routing value

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

index(index, body, doc_type=None, id=None, params=None, headers=None)¶

Creates or updates a document in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-index_.html

Parameters:

index – The name of the index

document – The document

doc_type – The type of the document

id – Document ID

if_primary_term – only perform the index operation if the last operation that has changed the document has the specified primary term

if_seq_no – only perform the index operation if the last operation that has changed the document has the specified sequence number

op_type – Explicit operation type. Defaults to index for requests with an explicit document ID, and to `create`for requests without an explicit document ID Valid choices: index, create

pipeline – The pipeline id to preprocess incoming documents with

refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for

require_alias – When true, requires destination to be an alias. Default is false

routing – Specific routing value

timeout – Explicit operation timeout

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

info(params=None, headers=None)¶

Returns basic information about the cluster.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

mget(body, index=None, doc_type=None, params=None, headers=None)¶

Allows to get multiple documents in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-multi-get.html

Parameters:

body – Document identifiers; can be either docs (containing full document information) or ids (when index and type is provided in the URL.

index – The name of the index

doc_type – The type of the document

_source – True or false to return the _source field or not, or a list of fields to return

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

preference – Specify the node or shard the operation should be performed on (default: random)

realtime – Specify whether to perform the operation in realtime or search mode

refresh – Refresh the shard containing the document before performing the operation

routing – Specific routing value

stored_fields – A comma-separated list of stored fields to return in the response

msearch(body, index=None, doc_type=None, params=None, headers=None)¶

Allows to execute several search operations in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-multi-search.html

Parameters:

body – The request definitions (metadata-search request definition pairs), separated by newlines

index – A comma-separated list of index names to use as default

doc_type – A comma-separated list of document types to use as default

ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true

max_concurrent_searches – Controls the maximum number of concurrent searches the multi search api will execute

max_concurrent_shard_requests – The number of concurrent shard requests each sub search executes concurrently per node. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests Default: 5

pre_filter_shard_size – A threshold that enforces a pre- filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.

rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response

msearch_template(body, index=None, doc_type=None, params=None, headers=None)¶

Allows to execute several search template operations in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-multi-search.html

Parameters:

body – The request definitions (metadata-search request definition pairs), separated by newlines

index – A comma-separated list of index names to use as default

doc_type – A comma-separated list of document types to use as default

ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true

max_concurrent_searches – Controls the maximum number of concurrent searches the multi search api will execute

rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response

mtermvectors(body=None, index=None, doc_type=None, params=None, headers=None)¶

Returns multiple termvectors in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-multi-termvectors.html

Parameters:

body – Define ids, documents, parameters or a list of parameters per document here. You must at least provide a list of document ids. See documentation.

index – The index in which the document resides.

doc_type – The type of the document.

field_statistics – Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True

fields – A comma-separated list of fields to return. Applies to all returned documents unless otherwise specified in body “params” or “docs”.

ids – A comma-separated list of documents ids. You must define ids as parameter or set “ids” or “docs” in the request body

offsets – Specifies if term offsets should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True

payloads – Specifies if term payloads should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True

positions – Specifies if term positions should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True

preference – Specify the node or shard the operation should be performed on (default: random) .Applies to all returned documents unless otherwise specified in body “params” or “docs”.

realtime – Specifies if requests are real-time as opposed to near-real-time (default: true).

routing – Specific routing value. Applies to all returned documents unless otherwise specified in body “params” or “docs”.

term_statistics – Specifies if total term frequency and document frequency should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”.

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

open_point_in_time(index, params=None, headers=None)¶

Open a point in time that can be used in subsequent searches

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/point-in-time-api.html

Parameters:

index – A comma-separated list of index names to open point in time; use _all or empty string to perform the operation on all indices

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

keep_alive – Specific the time to live for the point in time

preference – Specify the node or shard the operation should be performed on (default: random)

routing – Specific routing value

ping(params=None, headers=None)¶

Returns whether the cluster is running.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

put_script(id, body, context=None, params=None, headers=None)¶

Creates or updates a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:

id – Script ID

body – The document

context – Script context

master_timeout – Specify timeout for connection to master

timeout – Explicit operation timeout

rank_eval(body, index=None, params=None, headers=None)¶

Allows to evaluate the quality of ranked search results over a set of typical search queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-rank-eval.html

Parameters:

body – The ranking evaluation search definition, including search requests, document ratings and ranking metric definition.

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

reindex(body, params=None, headers=None)¶

Allows to copy documents from one index to another, optionally filtering the source documents by a query, changing the destination index settings, or fetching the documents from a remote cluster.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html

Parameters:

body – The search definition using the Query DSL and the prototype for the index request.

max_docs – Maximum number of documents to process (default: all documents)

refresh – Should the affected indexes be refreshed?

requests_per_second – The throttle to set on this request in sub-requests per second. -1 means no throttle.

scroll – Control how long to keep the search context alive Default: 5m

slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1

timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the reindex operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

wait_for_completion – Should the request should block until the reindex is complete. Default: True

reindex_rethrottle(task_id, params=None, headers=None)¶

Changes the number of requests per second for a particular Reindex operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html

Parameters:

task_id – The task id to rethrottle

requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.

render_search_template(body=None, id=None, params=None, headers=None)¶

Allows to use the Mustache language to pre-render a search definition.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/render-search-template-api.html

Parameters:

body – The search definition template and its params

id – The id of the stored search template

scripts_painless_execute(body=None, params=None, headers=None)¶

Allows an arbitrary script to be executed and a result to be returned

https://www.elastic.co/guide/en/elasticsearch/painless/master/painless-execute-api.html

Warning

This API is experimental so may include breaking changes or be removed in a future version

Parameters: body – The script to execute

scroll(body=None, scroll_id=None, params=None, headers=None)¶

Allows to retrieve a large numbers of results from a single search request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-request-body.html#request-body-search-scroll

Parameters:

body – The scroll ID if not passed by URL or query parameter.

scroll_id – The scroll ID

rest_total_hits_as_int – If true, the API response’s hit.total property is returned as an integer. If false, the API response’s hit.total property is returned as an object.

scroll – Period to retain the search context for scrolling.

search(body=None, index=None, doc_type=None, params=None, headers=None)¶

Returns results matching a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html

Parameters:

body – The search definition using the Query DSL

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types

_source – Indicates which source fields are returned for matching documents. These fields are returned in the hits._source property of the search response.

_source_excludes – A list of fields to exclude from the returned _source field

_source_includes – A list of fields to extract and return from the _source field

aggregations –

aggs –

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

allow_partial_search_results – Indicate if an error should be returned if there is a partial search failure or timeout Default: True

analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)

analyzer – The analyzer to use for the query string

batched_reduce_size – The number of shard results that should be reduced at once on the coordinating node. This value should be used as a protection mechanism to reduce the memory overhead per search request if the potential number of shards in the request can be large. Default: 512

ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true

collapse –

default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR

df – The field to use as default where no field prefix is given in the query string

docvalue_fields – Array of wildcard (*) patterns. The request returns doc values for field names matching these patterns in the hits.fields property of the response.

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

explain – If true, returns detailed information about score computation as part of a hit.

fields – Array of wildcard (*) patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.

from – Starting document offset. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.

highlight –

ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

indices_boost – Boosts the _score of documents from specified indices.

lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored

max_concurrent_shard_requests – The number of concurrent shard requests per node this search executes concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests Default: 5

min_compatible_shard_node – The minimum compatible version that all shards involved in search should have for this request to be successful

min_score – Minimum _score for matching documents. Documents with a lower _score are not included in the search results.

pit – Limits the search to a point in time (PIT). If you provide a PIT, you cannot specify an <index> in the request path.

post_filter –

pre_filter_shard_size – A threshold that enforces a pre- filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.

preference – Specify the node or shard the operation should be performed on (default: random)

profile –

q – Query in the Lucene query string syntax

query – Defines the search definition using the Query DSL.

request_cache – Specify if request cache should be used for this request or not, defaults to index level setting

rescore –

rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response

routing – A comma-separated list of specific routing values

runtime_mappings – Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.

script_fields – Retrieve a script evaluation (based on different fields) for each hit.

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

search_after –

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

seq_no_primary_term – If true, returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.

size – The number of hits to return. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.

slice –

sort –

stats – Stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.

stored_fields – List of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source parameter defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.

suggest –

suggest_field – Specifies which field to use for suggestions.

suggest_mode – Specify suggest mode Valid choices: missing, popular, always Default: missing

suggest_size – How many suggestions to return in response

suggest_text – The source text for which the suggestions should be returned.

terminate_after – Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. Defaults to 0, which does not terminate query execution early.

timeout – Specifies the period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.

track_scores – If true, calculate and return document scores, even if the scores are not used for sorting.

track_total_hits – Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query. Defaults to 10,000 hits.

typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response

version – If true, returns document version as part of a hit.

search_mvt(index, field, zoom, x, y, body=None, params=None, headers=None)¶

Searches a vector tile for geospatial values. Returns results as a binary Mapbox vector tile.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-vector-tile-api.html

Warning

This API is experimental so may include breaking changes or be removed in a future version

Parameters:

index – Comma-separated list of data streams, indices, or aliases to search

field – Field containing geospatial data to return

zoom – Zoom level for the vector tile to search

x – X coordinate for the vector tile to search

y – Y coordinate for the vector tile to search

body – Search request body.

aggs –
Sub-aggregations for the geotile_grid.

Supports the following aggregation types: - avg - cardinality - max - min - sum

exact_bounds – If false, the meta layer’s feature is the bounding box of the tile. If true, the meta layer’s feature is a bounding box resulting from a geo_bounds aggregation. The aggregation runs on <field> values that intersect the <zoom>/<x>/<y> tile with wrap_longitude set to false. The resulting bounding box may be larger than the vector tile.

extent – Size, in pixels, of a side of the tile. Vector tiles are square with equal sides.

fields – Fields to return in the hits layer. Supports wildcards (*). This parameter does not support fields with array values. Fields with array values may return inconsistent results.

grid_precision – Additional zoom levels available through the aggs layer. For example, if <zoom> is 7 and grid_precision is 8, you can zoom in up to level 15. Accepts 0-8. If 0, results don’t include the aggs layer.

grid_type – Determines the geometry type for features in the aggs layer. In the aggs layer, each feature represents a geotile_grid cell. If ‘grid’ each feature is a Polygon of the cells bounding box. If ‘point’ each feature is a Point that is the centroid of the cell.

query – Query DSL used to filter documents for the search.

runtime_mappings – Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.

size – Maximum number of features to return in the hits layer. Accepts 0-10000. If 0, results don’t include the hits layer.

sort – Sorts features in the hits layer. By default, the API calculates a bounding box for each feature. It sorts features based on this box’s diagonal length, from longest to shortest.

search_shards(index=None, params=None, headers=None)¶

Returns information about the indices and shards that a search request would be executed against.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-shards.html

Parameters:

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

local – Return local information, do not retrieve the state from master node (default: false)

preference – Specify the node or shard the operation should be performed on (default: random)

routing – Specific routing value

search_template(body, index=None, doc_type=None, params=None, headers=None)¶

Allows to use the Mustache language to pre-render a search definition.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-template.html

Parameters:

body – The search definition template and its params

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

explain – Specify whether to return detailed information about score computation as part of a hit

ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

preference – Specify the node or shard the operation should be performed on (default: random)

profile – Specify whether to profile the query execution

rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response

routing – A comma-separated list of specific routing values

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response

terms_enum(index, body=None, params=None, headers=None)¶

The terms enum API can be used to discover terms in the index that begin with the provided string. It is designed for low-latency look-ups used in auto- complete scenarios.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-terms-enum.html

Parameters:

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

body – field name, string which is the prefix expected in matching terms, timeout and size for max number of results

termvectors(index, body=None, doc_type=None, id=None, params=None, headers=None)¶

Returns information and statistics about terms in the fields of a particular document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-termvectors.html

Parameters:

index – The index in which the document resides.

body – Define parameters and or supply a document to get termvectors for. See documentation.

doc_type – The type of the document.

id – The id of the document, when not specified a doc param should be supplied.

field_statistics – Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Default: True

fields – A comma-separated list of fields to return.

offsets – Specifies if term offsets should be returned. Default: True

payloads – Specifies if term payloads should be returned. Default: True

positions – Specifies if term positions should be returned. Default: True

preference – Specify the node or shard the operation should be performed on (default: random).

realtime – Specifies if request is real-time as opposed to near-real-time (default: true).

routing – Specific routing value.

term_statistics – Specifies if total term frequency and document frequency should be returned.

version – Explicit version number for concurrency control

version_type – Specific version type Valid choices: internal, external, external_gte, force

update(index, id, body, doc_type=None, params=None, headers=None)¶

Updates a document with a script or partial document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update.html

Parameters:

index – The name of the index

id – Document ID

body – The request definition requires either script or partial doc

doc_type – The type of the document

_source – Set to false to disable source retrieval. You can also specify a comma-separated list of the fields you want to retrieve.

_source_excludes – Specify the source fields you want to exclude.

_source_includes – Specify the source fields you want to retrieve.

detect_noop – Set to false to disable setting ‘result’ in the response to ‘noop’ if no change to the document occurred.

doc – A partial update to an existing document.

doc_as_upsert – Set to true to use the contents of ‘doc’ as the value of ‘upsert’

if_primary_term – Only perform the operation if the document has this primary term.

if_seq_no – Only perform the operation if the document has this sequence number.

lang – The script language. Default: painless

refresh – If ‘true’, Elasticsearch refreshes the affected shards to make this operation visible to search, if ‘wait_for’ then wait for a refresh to make this operation visible to search, if ‘false’ do nothing with refreshes. Valid choices: true, false, wait_for Default: false

require_alias – If true, the destination must be an index alias.

retry_on_conflict – Specify how many times should the operation be retried when a conflict occurs.

routing – Custom value used to route operations to a specific shard.

script – Script to execute to update the document.

scripted_upsert – Set to true to execute the script whether or not the document exists.

timeout – Period to wait for dynamic mapping updates and active shards. This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. Default: 1m

upsert – If the document does not already exist, the contents of ‘upsert’ are inserted as a new document. If the document exists, the ‘script’ is executed.

wait_for_active_shards – The number of shard copies that must be active before proceeding with the operations. Set to ‘all’ or any positive integer up to the total number of shards in the index (number_of_replicas+1). Defaults to 1 meaning the primary shard. Default: 1

update_by_query(index, body=None, doc_type=None, params=None, headers=None)¶

Performs an update on every document in the index without changing the source, for example to pick up a mapping change.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update-by-query.html

Parameters:

index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices

body – The search definition using the Query DSL

doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types

allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)

analyzer – The analyzer to use for the query string

conflicts – What to do when the update by query hits version conflicts? Valid choices: abort, proceed Default: abort

default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR

df – The field to use as default where no field prefix is given in the query string

expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open

from – Starting offset (default: 0)

ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)

lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored

max_docs – Maximum number of documents to process (default: all documents)

pipeline – Ingest pipeline to set on index requests made by this action. (default: none)

preference – Specify the node or shard the operation should be performed on (default: random)

q – Query in the Lucene query string syntax

refresh – Should the affected indexes be refreshed?

request_cache – Specify if request cache should be used for this request or not, defaults to index level setting

requests_per_second – The throttle to set on this request in sub-requests per second. -1 means no throttle.

routing – A comma-separated list of specific routing values

scroll – Specify how long a consistent view of the index should be maintained for scrolled search

scroll_size – Size on the scroll request powering the update by query Default: 100

search_timeout – Explicit timeout for each search request. Defaults to no timeout.

search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch

size – Deprecated, please use max_docs instead

slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1

sort – A comma-separated list of <field>:<direction> pairs

stats – Specific ‘tag’ of the request for logging and statistical purposes

terminate_after – The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.

timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m

version – Specify whether to return document version as part of a hit

version_type – Should the document increment the version number (internal) on hit or not (reindex)

wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the update by query operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)

wait_for_completion – Should the request should block until the update by query operation is complete. Default: True

update_by_query_rethrottle(task_id, params=None, headers=None)¶

Changes the number of requests per second for a particular Update By Query operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update-by-query.html

Parameters:

task_id – The task id to rethrottle

requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.

AsyncTransport¶

class elasticsearch.AsyncTransport(hosts, connection_class=None, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', meta_header=True, **kwargs)¶

Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them.

Main interface is the perform_request method.

Parameters:

hosts – list of dictionaries, each containing keyword arguments to create a connection_class instance

connection_class – subclass of Connection to use

connection_pool_class – subclass of ConnectionPool to use

host_info_callback – callback responsible for taking the node information from /_cluster/nodes, along with already extracted information, and producing a list of arguments (same as hosts parameter)

sniff_on_start – flag indicating whether to obtain a list of nodes from the cluster at startup time

sniffer_timeout – number of seconds between automatic sniffs

sniff_on_connection_fail – flag controlling if connection failure triggers a sniff

sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.

serializer – serializer instance

serializers – optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype)

default_mimetype – when no mimetype is specified by the server response assume this mimetype, defaults to ‘application/json’

max_retries – maximum number of retries before an exception is propagated

retry_on_status – set of HTTP status codes on which we should retry on a different node. defaults to (502, 503, 504)

retry_on_timeout – should timeout trigger a retry on different node? (default False)

send_get_body_as – for GET requests with body this option allows you to specify an alternate way of execution for environments that don’t support passing bodies with GET requests. If you set this to ‘POST’ a POST method will be used instead, if to ‘source’ then the body will be serialized and passed as a query parameter source.

meta_header – If True will send the ‘X-Elastic-Client-Meta’ HTTP header containing simple client metadata. Setting to False will disable the header. Defaults to True.

Any extra keyword arguments will be passed to the connection_class when creating and instance unless overridden by that connection’s options provided as part of the hosts parameter.

DEFAULT_CONNECTION_CLASS¶

alias of elasticsearch._async.http_aiohttp.AIOHttpConnection

close()¶

Explicitly closes connections

create_sniff_task(initial=False)¶

Initiate a sniffing task. Make sure we only have one sniff request running at any given time. If a finished sniffing request is around, collect its result (which can raise its exception).

get_connection()¶

Retrieve a Connection instance from the ConnectionPool instance.

mark_dead(connection)¶

Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process.

Parameters: connection – instance of Connection that failed

perform_request(method, url, headers=None, params=None, body=None)¶

Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it’s perform_request method and return the data.

If an exception was raised, mark the connection as failed and retry (up to max_retries times).

If the operation was successful and the connection used was previously marked as dead, mark it as live, resetting it’s failure count.

Parameters:

method – HTTP method to use

url – absolute url (without host) to target

headers – dictionary of headers, will be handed over to the underlying Connection class

params – dictionary of query parameters, will be handed over to the underlying Connection class for serialization

body – body of the request, will be serialized using serializer and passed to the connection

sniff_hosts(initial=False)¶

Either spawns a sniffing_task which does regular sniffing over time or does a single sniffing session and awaits the results.

AsyncConnection¶

class elasticsearch.AsyncConnection(host='localhost', port=None, use_ssl=False, url_prefix='', timeout=10, headers=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, meta_header=True, **kwargs)¶

Base class for Async HTTP connection implementations

AIOHttpConnection¶

class elasticsearch.AIOHttpConnection(host='localhost', port=None, url_prefix='', timeout=10, http_auth=None, use_ssl=False, verify_certs=<object object>, ssl_show_warn=<object object>, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, loop=None, **kwargs)¶

Default connection class for AsyncElasticsearch using the aiohttp library and the http protocol.

Parameters:

host – hostname of the node (default: localhost)

port – port to use (integer, default: 9200)

url_prefix – optional url prefix for elasticsearch

timeout – default timeout in seconds (float, default: 10)

http_auth – optional http auth information as either ‘:’ separated string or a tuple

use_ssl – use ssl for the connection if True

verify_certs – whether to verify SSL certificates

ssl_show_warn – show warning when verify certs is disabled

ca_certs – optional path to CA bundle. See https://urllib3.readthedocs.io/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set

client_cert – path to the file containing the private key and the certificate, or cert only if using client_key

client_key – path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert)

ssl_version – version of the SSL protocol to use. Choices are: SSLv23 (default) SSLv2 SSLv3 TLSv1 (see PROTOCOL_* constants in the ssl module for exact options for your environment).

ssl_assert_hostname – use hostname verification if not False

ssl_assert_fingerprint – verify the supplied certificate fingerprint if not None

maxsize – the number of connections which will be kept open to this host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more information.

headers – any custom http headers to be add to requests

http_compress – Use gzip compression

cloud_id – The Cloud ID from ElasticCloud. Convenient way to connect to cloud instances. Other host connection params will be ignored.

api_key – optional API Key authentication as either base64 encoded string or a tuple.

opaque_id – Send this value in the ‘X-Opaque-Id’ HTTP header For tracing all requests made by this transport.

loop – asyncio Event Loop to use with aiohttp. This is set by default to the currently running loop.

close()¶

Explicitly closes connection