Helpers¶

Collection of simple helper functions that abstract some specifics or the raw API.

elasticsearch.helpers.bulk_index(client, docs, chunk_size=500, stats_only=False, raise_on_error=False, **kwargs)¶

Helper for the bulk() api that provides a more human friendly interface - it consumes an iterator of documents and sends them to elasticsearch in chunks.

This function expects the doc to be in the format as returned by search(), for example:

{
    '_index': 'index-name',
    '_type': 'document',
    '_id': 42,
    '_parent': 5,
    '_ttl': '1d',
    '_source': {
        ...
    }
}

alternatively, if _source is not present, it will pop all metadata fields from the doc and use the rest as the document data.

Parameters:

client – instance of Elasticsearch to use
docs – iterator containing the docs
chunk_size – number of docs in one chunk sent to es (default: 500)
stats_only – if True only report number of successful/failed operations instead of just number of successful and a list of error responses
raise_on_error – raise BulkIndexError if some documents failed to index (and stop sending chunks to the server)

Any additional keyword arguments will be passed to the bulk API itself.

elasticsearch.helpers.scan(client, query=None, scroll='5m', **kwargs)¶

Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

Parameters:	client – instance of `Elasticsearch` to use query – body for the `search()` api scroll – Specify how long a consistent view of the index should be maintained for scrolled search

Any additional keyword arguments will be passed to the initial search() call.

elasticsearch.helpers.reindex(client, source_index, target_index, target_client=None, chunk_size=500, scroll='5m')¶

Reindex all documents from one index to another, potentially (if target_client is specified) on a different cluster.

Note

This helper doesn’t transfer mappings, just the data.

Parameters:

client – instance of Elasticsearch to use (for read if target_client is specified as well)
source_index – index (or list of indices) to read documents from
target_index – name of the index in the target cluster to populate
target_client – optional, is specified will be used for writing (thus enabling reindex between clusters)
chunk_size – number of docs in one chunk sent to es (default: 500)
scroll – Specify how long a consistent view of the index should be maintained for scrolled search