Using Asyncio with Elasticsearch

Starting in elasticsearch-py v7.8.0 for Python 3.6+ the elasticsearch package supports async/await with Asyncio and Aiohttp. You can either install aiohttp directly or use the [async] extra:

$ python -m pip install elasticsearch>=7.8.0 aiohttp

# - OR -

$ python -m pip install elasticsearch[async]>=7.8.0

Note

Async functionality is a new feature of this library in v7.8.0+ so please open an issue if you find an issue or have a question about async support.

Getting Started with Async

After installation all async API endpoints are available via AsyncElasticsearch and are used in the same way as other APIs, just with an extra await:

import asyncio
from elasticsearch import AsyncElasticsearch

es = AsyncElasticsearch()

async def main():
    resp = await es.search(
        index="documents",
        query={"match_all": {}},
        size=20,
    )
    print(resp)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

All APIs that are available under the sync client are also available under the async client.

ASGI Applications and Elastic APM

ASGI (Asynchronous Server Gateway Interface) is a new way to serve Python web applications making use of async I/O to achieve better performance. Some examples of ASGI frameworks include FastAPI, Django 3.0+, and Starlette. If you’re using one of these frameworks along with Elasticsearch then you should be using AsyncElasticsearch to avoid blocking the event loop with synchronous network calls for optimal performance.

Elastic APM also supports tracing of async Elasticsearch queries just the same as synchronous queries. For an example on how to configure AsyncElasticsearch with a popular ASGI framework FastAPI and APM tracing there is a pre-built example in the examples/fastapi-apm directory.

Frequently Asked Questions

NameError / ImportError when importing AsyncElasticsearch?

If when trying to use AsyncElasticsearch and you’re receiving a NameError or ImportError you should ensure that you’re running Python 3.6+ (check with $ python --version) and that you have aiohttp installed in your environment (check with $ python -m pip freeze | grep aiohttp). If either of the above conditions is not met then async support won’t be available.

What about the elasticsearch-async package?

Previously asyncio was supported separately via the elasticsearch-async package. The elasticsearch-async package has been deprecated in favor of AsyncElasticsearch provided by the elasticsearch package in v7.8 and onwards.

Receiving ‘Unclosed client session / connector’ warning?

This warning is created by aiohttp when an open HTTP connection is garbage collected. You’ll typically run into this when closing your application. To resolve the issue ensure that close() is called before the AsyncElasticsearch instance is garbage collected.

For example if using FastAPI that might look like this:

from fastapi import FastAPI
from elasticsearch import AsyncElasticsearch

app = FastAPI()
es = AsyncElasticsearch()

# This gets called once the app is shutting down.
@app.on_event("shutdown")
async def app_shutdown():
    await es.close()

Async Helpers

Async variants of all helpers are available in elasticsearch.helpers and are all prefixed with async_*. You’ll notice that these APIs are identical to the ones in the sync Helpers documentation.

All async helpers that accept an iterator or generator also accept async iterators and async generators.

Bulk and Streaming Bulk

elasticsearch.helpers.async_bulk(client, actions, stats_only=False, ignore_status=(), *args, **kwargs)

Helper for the bulk() api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True. Note that by default we raise a BulkIndexError when we encounter an error so options like stats_only only+ apply when raise_on_error is set to False.

When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage. If you need to process a lot of data and want to ignore/collect errors please consider using the async_streaming_bulk() helper which will just return the errors and not store them in memory.

Parameters:
  • client – instance of AsyncElasticsearch to use
  • actions – iterator containing the actions
  • stats_only – if True only report number of successful/failed operations instead of just number of successful and a list of error responses
  • ignore_status – list of HTTP status code that you want to ignore

Any additional keyword arguments will be passed to async_streaming_bulk() which is used to execute the operation, see async_streaming_bulk() for more accepted parameters.

import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_bulk

es = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "doc": {"word": word},
        }

async def main():
    await async_bulk(es, gendata())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
elasticsearch.helpers.async_streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=104857600, raise_on_error=True, expand_action_callback=<function expand_action>, raise_on_exception=True, max_retries=0, initial_backoff=2, max_backoff=600, yield_ok=True, ignore_status=(), *args, **kwargs)

Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use async_bulk() which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent.

If you specify max_retries it will also retry any documents that were rejected with a 429 status code. To do this it will wait (by calling asyncio.sleep) for initial_backoff seconds and then, every subsequent rejection for the same chunk, for double the time every time up to max_backoff seconds.

Parameters:
  • client – instance of AsyncElasticsearch to use
  • actions – iterable or async iterable containing the actions to be executed
  • chunk_size – number of docs in one chunk sent to es (default: 500)
  • max_chunk_bytes – the maximum size of the request in bytes (default: 100MB)
  • raise_on_error – raise BulkIndexError containing errors (as .errors) from the execution of the last chunk when some occur. By default we raise.
  • raise_on_exception – if False then don’t propagate exceptions from call to bulk and just report the items that failed as failed.
  • expand_action_callback – callback executed on each action passed in, should return a tuple containing the action line and the data line (None if data line should be omitted).
  • max_retries – maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429
  • initial_backoff – number of seconds we should wait before the first retry. Any subsequent retries will be powers of initial_backoff * 2**retry_number
  • max_backoff – maximum number of seconds a retry will wait
  • yield_ok – if set to False will skip successful documents in the output
  • ignore_status – list of HTTP status code that you want to ignore
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_streaming_bulk

es = AsyncElasticsearch()

async def gendata():
    mywords = ['foo', 'bar', 'baz']
    for word in mywords:
        yield {
            "_index": "mywords",
            "word": word,
        }

async def main():
    async for ok, result in async_streaming_bulk(es, gendata()):
        action, result = result.popitem()
        if not ok:
            print("failed to %s document %s" % ())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Scan

elasticsearch.helpers.async_scan(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, size=1000, request_timeout=None, clear_scroll=True, scroll_kwargs=None, **kwargs)

Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan.

Parameters:
  • client – instance of AsyncElasticsearch to use
  • query – body for the search() api
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • raise_on_error – raises an exception (ScanError) if an error is encountered (some shards fail to execute). By default we raise.
  • preserve_order – don’t set the search_type to scan - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.
  • size – size (per shard) of the batch send at each iteration.
  • request_timeout – explicit timeout for each call to scan
  • clear_scroll – explicitly calls delete on the scroll id via the clear scroll API at the end of the method on completion or error, defaults to true.
  • scroll_kwargs – additional kwargs to be passed to scroll()

Any additional keyword arguments will be passed to the initial search() call:

async_scan(es,
    query={"query": {"match": {"title": "python"}}},
    index="orders-*",
    doc_type="books"
)
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_scan

es = AsyncElasticsearch()

async def main():
    async for doc in async_scan(
        client=es,
        query={"query": {"match": {"title": "python"}}},
        index="orders-*"
    ):
        print(doc)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Reindex

elasticsearch.helpers.async_reindex(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', op_type=None, scan_kwargs={}, bulk_kwargs={})

Reindex all documents from one index that satisfy a given query to another, potentially (if target_client is specified) on a different cluster. If you don’t specify the query you will reindex all the documents.

Since 2.3 a reindex() api is available as part of elasticsearch itself. It is recommended to use the api instead of this helper wherever possible. The helper is here mostly for backwards compatibility and for situations where more flexibility is needed.

Note

This helper doesn’t transfer mappings, just the data.

Parameters:
  • client – instance of AsyncElasticsearch to use (for read if target_client is specified as well)
  • source_index – index (or list of indices) to read documents from
  • target_index – name of the index in the target cluster to populate
  • query – body for the search() api
  • target_client – optional, is specified will be used for writing (thus enabling reindex between clusters)
  • chunk_size – number of docs in one chunk sent to es (default: 500)
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • op_type – Explicit operation type. Defaults to ‘_index’. Data streams must be set to ‘create’. If not specified, will auto-detect if target_index is a data stream.
  • scan_kwargs – additional kwargs to be passed to async_scan()
  • bulk_kwargs – additional kwargs to be passed to async_bulk()

API Reference

The API of AsyncElasticsearch is nearly identical to the API of Elasticsearch with the exception that every API call like search() is an async function and requires an await to properly return the response body.

AsyncElasticsearch

Note

To reference Elasticsearch APIs that are namespaced like .indices.create() refer to the sync API reference. These APIs are identical between sync and async.

class elasticsearch.AsyncElasticsearch(hosts=None, transport_class=<class 'elasticsearch._async.transport.AsyncTransport'>, **kwargs)

Elasticsearch low-level client. Provides a straightforward mapping from Python to ES REST endpoints.

The instance has attributes cat, cluster, indices, ingest, nodes, snapshot and tasks that provide access to instances of CatClient, ClusterClient, IndicesClient, IngestClient, NodesClient, SnapshotClient and TasksClient respectively. This is the preferred (and only supported) way to get access to those classes and their methods.

You can specify your own connection class which should be used by providing the connection_class parameter:

# create connection to localhost using the ThriftConnection
es = Elasticsearch(connection_class=ThriftConnection)

If you want to turn on Sniffing you have several options (described in Transport):

# create connection that will automatically inspect the cluster to get
# the list of active nodes. Start with nodes running on 'esnode1' and
# 'esnode2'
es = Elasticsearch(
    ['esnode1', 'esnode2'],
    # sniff before doing anything
    sniff_on_start=True,
    # refresh nodes after a node fails to respond
    sniff_on_connection_fail=True,
    # and also every 60 seconds
    sniffer_timeout=60
)

Different hosts can have different parameters, use a dictionary per node to specify those:

# connect to localhost directly and another node using SSL on port 443
# and an url_prefix. Note that ``port`` needs to be an int.
es = Elasticsearch([
    {'host': 'localhost'},
    {'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True},
])

If using SSL, there are several parameters that control how we deal with certificates (see Urllib3HttpConnection for detailed description of the options):

es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # make sure we verify SSL certificates
    verify_certs=True,
    # provide a path to CA certs on disk
    ca_certs='/path/to/CA_certs'
)

If using SSL, but don’t verify the certs, a warning message is showed optionally (see Urllib3HttpConnection for detailed description of the options):

es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # no verify SSL certificates
    verify_certs=False,
    # don't show warnings about ssl certs verification
    ssl_show_warn=False
)

SSL client authentication is supported (see Urllib3HttpConnection for detailed description of the options):

es = Elasticsearch(
    ['localhost:443', 'other_host:443'],
    # turn on SSL
    use_ssl=True,
    # make sure we verify SSL certificates
    verify_certs=True,
    # provide a path to CA certs on disk
    ca_certs='/path/to/CA_certs',
    # PEM formatted SSL client certificate
    client_cert='/path/to/clientcert.pem',
    # PEM formatted SSL client key
    client_key='/path/to/clientkey.pem'
)

Alternatively you can use RFC-1738 formatted URLs, as long as they are not in conflict with other options:

es = Elasticsearch(
    [
        'http://user:secret@localhost:9200/',
        'https://user:secret@other_host:443/production'
    ],
    verify_certs=True
)

By default, JSONSerializer is used to encode all outgoing requests. However, you can implement your own custom serializer:

from elasticsearch.serializer import JSONSerializer

class SetEncoder(JSONSerializer):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        if isinstance(obj, Something):
            return 'CustomSomethingRepresentation'
        return JSONSerializer.default(self, obj)

es = Elasticsearch(serializer=SetEncoder())
Parameters:
  • hosts – list of nodes, or a single node, we should connect to. Node should be a dictionary ({“host”: “localhost”, “port”: 9200}), the entire dictionary will be passed to the Connection class as kwargs, or a string in the format of host[:port] which will be translated to a dictionary automatically. If no value is given the Connection class defaults will be used.
  • transport_classTransport subclass to use.
  • kwargs – any additional arguments will be passed on to the Transport class and, subsequently, to the Connection instances.
bulk(body, index=None, doc_type=None, params=None, headers=None)

Allows to perform multiple index/update/delete operations in a single request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-bulk.html

Parameters:
  • body – The operation definition and data (action-data pairs), separated by newlines
  • index – Default index for items which don’t provide one
  • doc_type – Default document type for items which don’t provide one
  • _source – True or false to return the _source field or not, or default list of fields to return, can be overridden on each sub- request
  • _source_excludes – Default list of fields to exclude from the returned _source field, can be overridden on each sub-request
  • _source_includes – Default list of fields to extract and return from the _source field, can be overridden on each sub-request
  • pipeline – The pipeline id to preprocess incoming documents with
  • refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for
  • require_alias – Sets require_alias for all incoming documents. Defaults to unset (false)
  • routing – Specific routing value
  • timeout – Explicit operation timeout
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the bulk operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
clear_scroll(body=None, scroll_id=None, params=None, headers=None)

Explicitly clears the search context for a scroll.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/clear-scroll-api.html

Parameters:
  • body – A comma-separated list of scroll IDs to clear if none was specified via the scroll_id parameter
  • scroll_id – A comma-separated list of scroll IDs to clear
close()

Closes the Transport and all internal connections

close_point_in_time(body=None, params=None, headers=None)

Close a point in time

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/point-in-time-api.html

Parameters:body – a point-in-time id to close
count(body=None, index=None, doc_type=None, params=None, headers=None)

Returns number of documents matching a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-count.html

Parameters:
  • body – A query to restrict the results specified with the Query DSL (optional)
  • index – A comma-separated list of indices to restrict the results
  • doc_type – A comma-separated list of types to restrict the results
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)
  • analyzer – The analyzer to use for the query string
  • default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR
  • df – The field to use as default where no field prefix is given in the query string
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
  • min_score – Include only documents with a specific _score value in the result
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • q – Query in the Lucene query string syntax
  • routing – A comma-separated list of specific routing values
  • terminate_after – The maximum count for each shard, upon reaching which the query execution will terminate early
create(index, id, body, doc_type=None, params=None, headers=None)

Creates a new document in the index. Returns a 409 response when a document with a same ID already exists in the index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-index_.html

Parameters:
  • index – The name of the index
  • id – Document ID
  • document – The document
  • doc_type – The type of the document
  • pipeline – The pipeline id to preprocess incoming documents with
  • refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for
  • routing – Specific routing value
  • timeout – Explicit operation timeout
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
delete(index, id, doc_type=None, params=None, headers=None)

Removes a document from the index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete.html

Parameters:
  • index – The name of the index
  • id – The document ID
  • doc_type – The type of the document
  • if_primary_term – only perform the delete operation if the last operation that has changed the document has the specified primary term
  • if_seq_no – only perform the delete operation if the last operation that has changed the document has the specified sequence number
  • refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for
  • routing – Specific routing value
  • timeout – Explicit operation timeout
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the delete operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
delete_by_query(index, body, doc_type=None, params=None, headers=None)

Deletes documents matching the provided query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html

Parameters:
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • body – The search definition using the Query DSL
  • doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)
  • analyzer – The analyzer to use for the query string
  • conflicts – What to do when the delete by query hits version conflicts? Valid choices: abort, proceed Default: abort
  • default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR
  • df – The field to use as default where no field prefix is given in the query string
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • from – Starting offset (default: 0)
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
  • max_docs – Maximum number of documents to process (default: all documents)
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • q – Query in the Lucene query string syntax
  • refresh – Should the effected indexes be refreshed?
  • request_cache – Specify if request cache should be used for this request or not, defaults to index level setting
  • requests_per_second – The throttle for this request in sub- requests per second. -1 means no throttle.
  • routing – A comma-separated list of specific routing values
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • scroll_size – Size on the scroll request powering the delete by query Default: 100
  • search_timeout – Explicit timeout for each search request. Defaults to no timeout.
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • size – Deprecated, please use max_docs instead
  • slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1
  • sort – A comma-separated list of <field>:<direction> pairs
  • stats – Specific ‘tag’ of the request for logging and statistical purposes
  • terminate_after – The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.
  • timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m
  • version – Specify whether to return document version as part of a hit
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the delete by query operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
  • wait_for_completion – Should the request should block until the delete by query is complete. Default: True
delete_by_query_rethrottle(task_id, params=None, headers=None)

Changes the number of requests per second for a particular Delete By Query operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html

Parameters:
  • task_id – The task id to rethrottle
  • requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.
delete_script(id, params=None, headers=None)

Deletes a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:
  • id – Script ID
  • master_timeout – Specify timeout for connection to master
  • timeout – Explicit operation timeout
exists(index, id, doc_type=None, params=None, headers=None)

Returns information about whether a document exists in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:
  • index – The name of the index
  • id – The document ID
  • doc_type – The type of the document (use _all to fetch the first document matching the ID across all types)
  • _source – True or false to return the _source field or not, or a list of fields to return
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • realtime – Specify whether to perform the operation in realtime or search mode
  • refresh – Refresh the shard containing the document before performing the operation
  • routing – Specific routing value
  • stored_fields – A comma-separated list of stored fields to return in the response
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
exists_source(index, id, doc_type=None, params=None, headers=None)

Returns information about whether a document source exists in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:
  • index – The name of the index
  • id – The document ID
  • doc_type – The type of the document; deprecated and optional starting with 7.0
  • _source – True or false to return the _source field or not, or a list of fields to return
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • realtime – Specify whether to perform the operation in realtime or search mode
  • refresh – Refresh the shard containing the document before performing the operation
  • routing – Specific routing value
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
explain(index, id, body=None, doc_type=None, params=None, headers=None)

Returns information about why a specific matches (or doesn’t match) a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-explain.html

Parameters:
  • index – The name of the index
  • id – The document ID
  • body – The query definition using the Query DSL
  • doc_type – The type of the document
  • _source – True or false to return the _source field or not, or a list of fields to return
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • analyze_wildcard – Specify whether wildcards and prefix queries in the query string query should be analyzed (default: false)
  • analyzer – The analyzer for the query string query
  • default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR
  • df – The default field for query string query (default: _all)
  • lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • q – Query in the Lucene query string syntax
  • routing – Specific routing value
  • stored_fields – A comma-separated list of stored fields to return in the response
field_caps(body=None, index=None, params=None, headers=None)

Returns the information about the capabilities of fields among multiple indices.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-field-caps.html

Parameters:
  • body – An index filter specified with the Query DSL
  • index – A comma-separated list of index names; use _all or empty string to perform the operation on all indices
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • fields – A comma-separated list of field names
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • include_unmapped – Indicates whether unmapped fields should be included in the response.
get(index, id, doc_type=None, params=None, headers=None)

Returns a document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:
  • index – Name of the index that contains the document.
  • id – Unique identifier of the document.
  • doc_type – The type of the document (use _all to fetch the first document matching the ID across all types)
  • _source – True or false to return the _source field or not, or a list of fields to return.
  • _source_excludes – A comma-separated list of source fields to exclude in the response.
  • _source_includes – A comma-separated list of source fields to include in the response.
  • preference – Specifies the node or shard the operation should be performed on. Random by default.
  • realtime – Boolean) If true, the request is real-time as opposed to near-real-time. Default: True
  • refresh – If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If false, do nothing with refreshes.
  • routing – Target the specified primary shard.
  • stored_fields – A comma-separated list of stored fields to return in the response
  • version – Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.
  • version_type – Specific version type: internal, external, external_gte. Valid choices: internal, external, external_gte, force
get_script(id, params=None, headers=None)

Returns a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:
  • id – Script ID
  • master_timeout – Specify timeout for connection to master
get_script_context(params=None, headers=None)

Returns all script contexts.

https://www.elastic.co/guide/en/elasticsearch/painless/master/painless-contexts.html

get_script_languages(params=None, headers=None)

Returns available script types, languages and contexts

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

get_source(index, id, doc_type=None, params=None, headers=None)

Returns the source of a document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-get.html

Parameters:
  • index – The name of the index
  • id – The document ID
  • doc_type – The type of the document; deprecated and optional starting with 7.0
  • _source – True or false to return the _source field or not, or a list of fields to return
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • realtime – Specify whether to perform the operation in realtime or search mode
  • refresh – Refresh the shard containing the document before performing the operation
  • routing – Specific routing value
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
index(index, body, doc_type=None, id=None, params=None, headers=None)

Creates or updates a document in an index.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-index_.html

Parameters:
  • index – The name of the index
  • document – The document
  • doc_type – The type of the document
  • id – Document ID
  • if_primary_term – only perform the index operation if the last operation that has changed the document has the specified primary term
  • if_seq_no – only perform the index operation if the last operation that has changed the document has the specified sequence number
  • op_type – Explicit operation type. Defaults to index for requests with an explicit document ID, and to `create`for requests without an explicit document ID Valid choices: index, create
  • pipeline – The pipeline id to preprocess incoming documents with
  • refresh – If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes. Valid choices: true, false, wait_for
  • require_alias – When true, requires destination to be an alias. Default is false
  • routing – Specific routing value
  • timeout – Explicit operation timeout
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the index operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
info(params=None, headers=None)

Returns basic information about the cluster.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

mget(body, index=None, doc_type=None, params=None, headers=None)

Allows to get multiple documents in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-multi-get.html

Parameters:
  • body – Document identifiers; can be either docs (containing full document information) or ids (when index and type is provided in the URL.
  • index – The name of the index
  • doc_type – The type of the document
  • _source – True or false to return the _source field or not, or a list of fields to return
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • realtime – Specify whether to perform the operation in realtime or search mode
  • refresh – Refresh the shard containing the document before performing the operation
  • routing – Specific routing value
  • stored_fields – A comma-separated list of stored fields to return in the response
msearch(body, index=None, doc_type=None, params=None, headers=None)

Allows to execute several search operations in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-multi-search.html

Parameters:
  • body – The request definitions (metadata-search request definition pairs), separated by newlines
  • index – A comma-separated list of index names to use as default
  • doc_type – A comma-separated list of document types to use as default
  • ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true
  • max_concurrent_searches – Controls the maximum number of concurrent searches the multi search api will execute
  • max_concurrent_shard_requests – The number of concurrent shard requests each sub search executes concurrently per node. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests Default: 5
  • pre_filter_shard_size – A threshold that enforces a pre- filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.
  • rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response
msearch_template(body, index=None, doc_type=None, params=None, headers=None)

Allows to execute several search template operations in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-multi-search.html

Parameters:
  • body – The request definitions (metadata-search request definition pairs), separated by newlines
  • index – A comma-separated list of index names to use as default
  • doc_type – A comma-separated list of document types to use as default
  • ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true
  • max_concurrent_searches – Controls the maximum number of concurrent searches the multi search api will execute
  • rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response
mtermvectors(body=None, index=None, doc_type=None, params=None, headers=None)

Returns multiple termvectors in one request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-multi-termvectors.html

Parameters:
  • body – Define ids, documents, parameters or a list of parameters per document here. You must at least provide a list of document ids. See documentation.
  • index – The index in which the document resides.
  • doc_type – The type of the document.
  • field_statistics – Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True
  • fields – A comma-separated list of fields to return. Applies to all returned documents unless otherwise specified in body “params” or “docs”.
  • ids – A comma-separated list of documents ids. You must define ids as parameter or set “ids” or “docs” in the request body
  • offsets – Specifies if term offsets should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True
  • payloads – Specifies if term payloads should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True
  • positions – Specifies if term positions should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”. Default: True
  • preference – Specify the node or shard the operation should be performed on (default: random) .Applies to all returned documents unless otherwise specified in body “params” or “docs”.
  • realtime – Specifies if requests are real-time as opposed to near-real-time (default: true).
  • routing – Specific routing value. Applies to all returned documents unless otherwise specified in body “params” or “docs”.
  • term_statistics – Specifies if total term frequency and document frequency should be returned. Applies to all returned documents unless otherwise specified in body “params” or “docs”.
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
open_point_in_time(index, params=None, headers=None)

Open a point in time that can be used in subsequent searches

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/point-in-time-api.html

Parameters:
  • index – A comma-separated list of index names to open point in time; use _all or empty string to perform the operation on all indices
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • keep_alive – Specific the time to live for the point in time
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • routing – Specific routing value
ping(params=None, headers=None)

Returns whether the cluster is running.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

put_script(id, body, context=None, params=None, headers=None)

Creates or updates a script.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-scripting.html

Parameters:
  • id – Script ID
  • body – The document
  • context – Script context
  • master_timeout – Specify timeout for connection to master
  • timeout – Explicit operation timeout
rank_eval(body, index=None, params=None, headers=None)

Allows to evaluate the quality of ranked search results over a set of typical search queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-rank-eval.html

Parameters:
  • body – The ranking evaluation search definition, including search requests, document ratings and ranking metric definition.
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
reindex(body, params=None, headers=None)

Allows to copy documents from one index to another, optionally filtering the source documents by a query, changing the destination index settings, or fetching the documents from a remote cluster.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html

Parameters:
  • body – The search definition using the Query DSL and the prototype for the index request.
  • max_docs – Maximum number of documents to process (default: all documents)
  • refresh – Should the affected indexes be refreshed?
  • requests_per_second – The throttle to set on this request in sub-requests per second. -1 means no throttle.
  • scroll – Control how long to keep the search context alive Default: 5m
  • slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1
  • timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the reindex operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
  • wait_for_completion – Should the request should block until the reindex is complete. Default: True
reindex_rethrottle(task_id, params=None, headers=None)

Changes the number of requests per second for a particular Reindex operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html

Parameters:
  • task_id – The task id to rethrottle
  • requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.
render_search_template(body=None, id=None, params=None, headers=None)

Allows to use the Mustache language to pre-render a search definition.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/render-search-template-api.html

Parameters:
  • body – The search definition template and its params
  • id – The id of the stored search template
scripts_painless_execute(body=None, params=None, headers=None)

Allows an arbitrary script to be executed and a result to be returned

https://www.elastic.co/guide/en/elasticsearch/painless/master/painless-execute-api.html

Warning

This API is experimental so may include breaking changes or be removed in a future version

Parameters:body – The script to execute
scroll(body=None, scroll_id=None, params=None, headers=None)

Allows to retrieve a large numbers of results from a single search request.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-request-body.html#request-body-search-scroll

Parameters:
  • body – The scroll ID if not passed by URL or query parameter.
  • scroll_id – The scroll ID
  • rest_total_hits_as_int – If true, the API response’s hit.total property is returned as an integer. If false, the API response’s hit.total property is returned as an object.
  • scroll – Period to retain the search context for scrolling.
search(body=None, index=None, doc_type=None, params=None, headers=None)

Returns results matching a query.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html

Parameters:
  • body – The search definition using the Query DSL
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types
  • _source – Indicates which source fields are returned for matching documents. These fields are returned in the hits._source property of the search response.
  • _source_excludes – A list of fields to exclude from the returned _source field
  • _source_includes – A list of fields to extract and return from the _source field
  • aggregations
  • aggs
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • allow_partial_search_results – Indicate if an error should be returned if there is a partial search failure or timeout Default: True
  • analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)
  • analyzer – The analyzer to use for the query string
  • batched_reduce_size – The number of shard results that should be reduced at once on the coordinating node. This value should be used as a protection mechanism to reduce the memory overhead per search request if the potential number of shards in the request can be large. Default: 512
  • ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true
  • collapse
  • default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR
  • df – The field to use as default where no field prefix is given in the query string
  • docvalue_fields – Array of wildcard (*) patterns. The request returns doc values for field names matching these patterns in the hits.fields property of the response.
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • explain – If true, returns detailed information about score computation as part of a hit.
  • fields – Array of wildcard (*) patterns. The request returns values for field names matching these patterns in the hits.fields property of the response.
  • from – Starting document offset. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.
  • highlight
  • ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • indices_boost – Boosts the _score of documents from specified indices.
  • lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
  • max_concurrent_shard_requests – The number of concurrent shard requests per node this search executes concurrently. This value should be used to limit the impact of the search on the cluster in order to limit the number of concurrent shard requests Default: 5
  • min_compatible_shard_node – The minimum compatible version that all shards involved in search should have for this request to be successful
  • min_score – Minimum _score for matching documents. Documents with a lower _score are not included in the search results.
  • pit – Limits the search to a point in time (PIT). If you provide a PIT, you cannot specify an <index> in the request path.
  • post_filter
  • pre_filter_shard_size – A threshold that enforces a pre- filter roundtrip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • profile
  • q – Query in the Lucene query string syntax
  • query – Defines the search definition using the Query DSL.
  • request_cache – Specify if request cache should be used for this request or not, defaults to index level setting
  • rescore
  • rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response
  • routing – A comma-separated list of specific routing values
  • runtime_mappings – Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.
  • script_fields – Retrieve a script evaluation (based on different fields) for each hit.
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • search_after
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • seq_no_primary_term – If true, returns sequence number and primary term of the last modification of each hit. See Optimistic concurrency control.
  • size – The number of hits to return. By default, you cannot page through more than 10,000 hits using the from and size parameters. To page through more hits, use the search_after parameter.
  • slice
  • sort
  • stats – Stats groups to associate with the search. Each group maintains a statistics aggregation for its associated searches. You can retrieve these stats using the indices stats API.
  • stored_fields – List of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the _source parameter defaults to false. You can pass _source: true to return both source fields and stored fields in the search response.
  • suggest
  • suggest_field – Specifies which field to use for suggestions.
  • suggest_mode – Specify suggest mode Valid choices: missing, popular, always Default: missing
  • suggest_size – How many suggestions to return in response
  • suggest_text – The source text for which the suggestions should be returned.
  • terminate_after – Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting. Defaults to 0, which does not terminate query execution early.
  • timeout – Specifies the period of time to wait for a response from each shard. If no response is received before the timeout expires, the request fails and returns an error. Defaults to no timeout.
  • track_scores – If true, calculate and return document scores, even if the scores are not used for sorting.
  • track_total_hits – Number of hits matching the query to count accurately. If true, the exact number of hits is returned at the cost of some performance. If false, the response does not include the total number of hits matching the query. Defaults to 10,000 hits.
  • typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response
  • version – If true, returns document version as part of a hit.
search_mvt(index, field, zoom, x, y, body=None, params=None, headers=None)

Searches a vector tile for geospatial values. Returns results as a binary Mapbox vector tile.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-vector-tile-api.html

Warning

This API is experimental so may include breaking changes or be removed in a future version

Parameters:
  • index – Comma-separated list of data streams, indices, or aliases to search
  • field – Field containing geospatial data to return
  • zoom – Zoom level for the vector tile to search
  • x – X coordinate for the vector tile to search
  • y – Y coordinate for the vector tile to search
  • body – Search request body.
  • aggs

    Sub-aggregations for the geotile_grid.

    Supports the following aggregation types: - avg - cardinality - max - min - sum

  • exact_bounds – If false, the meta layer’s feature is the bounding box of the tile. If true, the meta layer’s feature is a bounding box resulting from a geo_bounds aggregation. The aggregation runs on <field> values that intersect the <zoom>/<x>/<y> tile with wrap_longitude set to false. The resulting bounding box may be larger than the vector tile.
  • extent – Size, in pixels, of a side of the tile. Vector tiles are square with equal sides.
  • fields – Fields to return in the hits layer. Supports wildcards (*). This parameter does not support fields with array values. Fields with array values may return inconsistent results.
  • grid_precision – Additional zoom levels available through the aggs layer. For example, if <zoom> is 7 and grid_precision is 8, you can zoom in up to level 15. Accepts 0-8. If 0, results don’t include the aggs layer.
  • grid_type – Determines the geometry type for features in the aggs layer. In the aggs layer, each feature represents a geotile_grid cell. If ‘grid’ each feature is a Polygon of the cells bounding box. If ‘point’ each feature is a Point that is the centroid of the cell.
  • query – Query DSL used to filter documents for the search.
  • runtime_mappings – Defines one or more runtime fields in the search request. These fields take precedence over mapped fields with the same name.
  • size – Maximum number of features to return in the hits layer. Accepts 0-10000. If 0, results don’t include the hits layer.
  • sort – Sorts features in the hits layer. By default, the API calculates a bounding box for each feature. It sorts features based on this box’s diagonal length, from longest to shortest.
search_shards(index=None, params=None, headers=None)

Returns information about the indices and shards that a search request would be executed against.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-shards.html

Parameters:
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • local – Return local information, do not retrieve the state from master node (default: false)
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • routing – Specific routing value
search_template(body, index=None, doc_type=None, params=None, headers=None)

Allows to use the Mustache language to pre-render a search definition.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-template.html

Parameters:
  • body – The search definition template and its params
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • ccs_minimize_roundtrips – Indicates whether network round- trips should be minimized as part of cross-cluster search requests execution Default: true
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • explain – Specify whether to return detailed information about score computation as part of a hit
  • ignore_throttled – Whether specified concrete, expanded or aliased indices should be ignored when throttled
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • profile – Specify whether to profile the query execution
  • rest_total_hits_as_int – Indicates whether hits.total should be rendered as an integer or an object in the rest search response
  • routing – A comma-separated list of specific routing values
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • typed_keys – Specify whether aggregation and suggester names should be prefixed by their respective types in the response
terms_enum(index, body=None, params=None, headers=None)

The terms enum API can be used to discover terms in the index that begin with the provided string. It is designed for low-latency look-ups used in auto- complete scenarios.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-terms-enum.html

Parameters:
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • body – field name, string which is the prefix expected in matching terms, timeout and size for max number of results
termvectors(index, body=None, doc_type=None, id=None, params=None, headers=None)

Returns information and statistics about terms in the fields of a particular document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-termvectors.html

Parameters:
  • index – The index in which the document resides.
  • body – Define parameters and or supply a document to get termvectors for. See documentation.
  • doc_type – The type of the document.
  • id – The id of the document, when not specified a doc param should be supplied.
  • field_statistics – Specifies if document count, sum of document frequencies and sum of total term frequencies should be returned. Default: True
  • fields – A comma-separated list of fields to return.
  • offsets – Specifies if term offsets should be returned. Default: True
  • payloads – Specifies if term payloads should be returned. Default: True
  • positions – Specifies if term positions should be returned. Default: True
  • preference – Specify the node or shard the operation should be performed on (default: random).
  • realtime – Specifies if request is real-time as opposed to near-real-time (default: true).
  • routing – Specific routing value.
  • term_statistics – Specifies if total term frequency and document frequency should be returned.
  • version – Explicit version number for concurrency control
  • version_type – Specific version type Valid choices: internal, external, external_gte, force
update(index, id, body, doc_type=None, params=None, headers=None)

Updates a document with a script or partial document.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update.html

Parameters:
  • index – The name of the index
  • id – Document ID
  • body – The request definition requires either script or partial doc
  • doc_type – The type of the document
  • _source – Set to false to disable source retrieval. You can also specify a comma-separated list of the fields you want to retrieve.
  • _source_excludes – Specify the source fields you want to exclude.
  • _source_includes – Specify the source fields you want to retrieve.
  • detect_noop – Set to false to disable setting ‘result’ in the response to ‘noop’ if no change to the document occurred.
  • doc – A partial update to an existing document.
  • doc_as_upsert – Set to true to use the contents of ‘doc’ as the value of ‘upsert’
  • if_primary_term – Only perform the operation if the document has this primary term.
  • if_seq_no – Only perform the operation if the document has this sequence number.
  • lang – The script language. Default: painless
  • refresh – If ‘true’, Elasticsearch refreshes the affected shards to make this operation visible to search, if ‘wait_for’ then wait for a refresh to make this operation visible to search, if ‘false’ do nothing with refreshes. Valid choices: true, false, wait_for Default: false
  • require_alias – If true, the destination must be an index alias.
  • retry_on_conflict – Specify how many times should the operation be retried when a conflict occurs.
  • routing – Custom value used to route operations to a specific shard.
  • script – Script to execute to update the document.
  • scripted_upsert – Set to true to execute the script whether or not the document exists.
  • timeout – Period to wait for dynamic mapping updates and active shards. This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. Default: 1m
  • upsert – If the document does not already exist, the contents of ‘upsert’ are inserted as a new document. If the document exists, the ‘script’ is executed.
  • wait_for_active_shards – The number of shard copies that must be active before proceeding with the operations. Set to ‘all’ or any positive integer up to the total number of shards in the index (number_of_replicas+1). Defaults to 1 meaning the primary shard. Default: 1
update_by_query(index, body=None, doc_type=None, params=None, headers=None)

Performs an update on every document in the index without changing the source, for example to pick up a mapping change.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update-by-query.html

Parameters:
  • index – A comma-separated list of index names to search; use _all or empty string to perform the operation on all indices
  • body – The search definition using the Query DSL
  • doc_type – A comma-separated list of document types to search; leave empty to perform the operation on all types
  • allow_no_indices – Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)
  • analyze_wildcard – Specify whether wildcard and prefix queries should be analyzed (default: false)
  • analyzer – The analyzer to use for the query string
  • conflicts – What to do when the update by query hits version conflicts? Valid choices: abort, proceed Default: abort
  • default_operator – The default operator for query string query (AND or OR) Valid choices: AND, OR Default: OR
  • df – The field to use as default where no field prefix is given in the query string
  • expand_wildcards – Whether to expand wildcard expression to concrete indices that are open, closed or both. Valid choices: open, closed, hidden, none, all Default: open
  • from – Starting offset (default: 0)
  • ignore_unavailable – Whether specified concrete indices should be ignored when unavailable (missing or closed)
  • lenient – Specify whether format-based query failures (such as providing text to a numeric field) should be ignored
  • max_docs – Maximum number of documents to process (default: all documents)
  • pipeline – Ingest pipeline to set on index requests made by this action. (default: none)
  • preference – Specify the node or shard the operation should be performed on (default: random)
  • q – Query in the Lucene query string syntax
  • refresh – Should the affected indexes be refreshed?
  • request_cache – Specify if request cache should be used for this request or not, defaults to index level setting
  • requests_per_second – The throttle to set on this request in sub-requests per second. -1 means no throttle.
  • routing – A comma-separated list of specific routing values
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search
  • scroll_size – Size on the scroll request powering the update by query Default: 100
  • search_timeout – Explicit timeout for each search request. Defaults to no timeout.
  • search_type – Search operation type Valid choices: query_then_fetch, dfs_query_then_fetch
  • size – Deprecated, please use max_docs instead
  • slices – The number of slices this task should be divided into. Defaults to 1, meaning the task isn’t sliced into subtasks. Can be set to auto. Default: 1
  • sort – A comma-separated list of <field>:<direction> pairs
  • stats – Specific ‘tag’ of the request for logging and statistical purposes
  • terminate_after – The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.
  • timeout – Time each individual bulk request should wait for shards that are unavailable. Default: 1m
  • version – Specify whether to return document version as part of a hit
  • version_type – Should the document increment the version number (internal) on hit or not (reindex)
  • wait_for_active_shards – Sets the number of shard copies that must be active before proceeding with the update by query operation. Defaults to 1, meaning the primary shard only. Set to all for all shard copies, otherwise set to any non-negative value less than or equal to the total number of copies for the shard (number of replicas + 1)
  • wait_for_completion – Should the request should block until the update by query operation is complete. Default: True
update_by_query_rethrottle(task_id, params=None, headers=None)

Changes the number of requests per second for a particular Update By Query operation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-update-by-query.html

Parameters:
  • task_id – The task id to rethrottle
  • requests_per_second – The throttle to set on this request in floating sub-requests per second. -1 means set no throttle.

AsyncTransport

class elasticsearch.AsyncTransport(hosts, connection_class=None, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', meta_header=True, **kwargs)

Encapsulation of transport-related to logic. Handles instantiation of the individual connections as well as creating a connection pool to hold them.

Main interface is the perform_request method.

Parameters:
  • hosts – list of dictionaries, each containing keyword arguments to create a connection_class instance
  • connection_class – subclass of Connection to use
  • connection_pool_class – subclass of ConnectionPool to use
  • host_info_callback – callback responsible for taking the node information from /_cluster/nodes, along with already extracted information, and producing a list of arguments (same as hosts parameter)
  • sniff_on_start – flag indicating whether to obtain a list of nodes from the cluster at startup time
  • sniffer_timeout – number of seconds between automatic sniffs
  • sniff_on_connection_fail – flag controlling if connection failure triggers a sniff
  • sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.
  • serializer – serializer instance
  • serializers – optional dict of serializer instances that will be used for deserializing data coming from the server. (key is the mimetype)
  • default_mimetype – when no mimetype is specified by the server response assume this mimetype, defaults to ‘application/json’
  • max_retries – maximum number of retries before an exception is propagated
  • retry_on_status – set of HTTP status codes on which we should retry on a different node. defaults to (502, 503, 504)
  • retry_on_timeout – should timeout trigger a retry on different node? (default False)
  • send_get_body_as – for GET requests with body this option allows you to specify an alternate way of execution for environments that don’t support passing bodies with GET requests. If you set this to ‘POST’ a POST method will be used instead, if to ‘source’ then the body will be serialized and passed as a query parameter source.
  • meta_header – If True will send the ‘X-Elastic-Client-Meta’ HTTP header containing simple client metadata. Setting to False will disable the header. Defaults to True.

Any extra keyword arguments will be passed to the connection_class when creating and instance unless overridden by that connection’s options provided as part of the hosts parameter.

DEFAULT_CONNECTION_CLASS

alias of elasticsearch._async.http_aiohttp.AIOHttpConnection

close()

Explicitly closes connections

create_sniff_task(initial=False)

Initiate a sniffing task. Make sure we only have one sniff request running at any given time. If a finished sniffing request is around, collect its result (which can raise its exception).

get_connection()

Retrieve a Connection instance from the ConnectionPool instance.

mark_dead(connection)

Mark a connection as dead (failed) in the connection pool. If sniffing on failure is enabled this will initiate the sniffing process.

Parameters:connection – instance of Connection that failed
perform_request(method, url, headers=None, params=None, body=None)

Perform the actual request. Retrieve a connection from the connection pool, pass all the information to it’s perform_request method and return the data.

If an exception was raised, mark the connection as failed and retry (up to max_retries times).

If the operation was successful and the connection used was previously marked as dead, mark it as live, resetting it’s failure count.

Parameters:
  • method – HTTP method to use
  • url – absolute url (without host) to target
  • headers – dictionary of headers, will be handed over to the underlying Connection class
  • params – dictionary of query parameters, will be handed over to the underlying Connection class for serialization
  • body – body of the request, will be serialized using serializer and passed to the connection
sniff_hosts(initial=False)

Either spawns a sniffing_task which does regular sniffing over time or does a single sniffing session and awaits the results.

AsyncConnection

class elasticsearch.AsyncConnection(host='localhost', port=None, use_ssl=False, url_prefix='', timeout=10, headers=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, meta_header=True, **kwargs)

Base class for Async HTTP connection implementations

AIOHttpConnection

class elasticsearch.AIOHttpConnection(host='localhost', port=None, url_prefix='', timeout=10, http_auth=None, use_ssl=False, verify_certs=<object object>, ssl_show_warn=<object object>, ca_certs=None, client_cert=None, client_key=None, ssl_version=None, ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, http_compress=None, cloud_id=None, api_key=None, opaque_id=None, loop=None, **kwargs)

Default connection class for AsyncElasticsearch using the aiohttp library and the http protocol.

Parameters:
  • host – hostname of the node (default: localhost)
  • port – port to use (integer, default: 9200)
  • url_prefix – optional url prefix for elasticsearch
  • timeout – default timeout in seconds (float, default: 10)
  • http_auth – optional http auth information as either ‘:’ separated string or a tuple
  • use_ssl – use ssl for the connection if True
  • verify_certs – whether to verify SSL certificates
  • ssl_show_warn – show warning when verify certs is disabled
  • ca_certs – optional path to CA bundle. See https://urllib3.readthedocs.io/en/latest/security.html#using-certifi-with-urllib3 for instructions how to get default set
  • client_cert – path to the file containing the private key and the certificate, or cert only if using client_key
  • client_key – path to the file containing the private key if using separate cert and key files (client_cert will contain only the cert)
  • ssl_version – version of the SSL protocol to use. Choices are: SSLv23 (default) SSLv2 SSLv3 TLSv1 (see PROTOCOL_* constants in the ssl module for exact options for your environment).
  • ssl_assert_hostname – use hostname verification if not False
  • ssl_assert_fingerprint – verify the supplied certificate fingerprint if not None
  • maxsize – the number of connections which will be kept open to this host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more information.
  • headers – any custom http headers to be add to requests
  • http_compress – Use gzip compression
  • cloud_id – The Cloud ID from ElasticCloud. Convenient way to connect to cloud instances. Other host connection params will be ignored.
  • api_key – optional API Key authentication as either base64 encoded string or a tuple.
  • opaque_id – Send this value in the ‘X-Opaque-Id’ HTTP header For tracing all requests made by this transport.
  • loop – asyncio Event Loop to use with aiohttp. This is set by default to the currently running loop.
close()

Explicitly closes connection