Helpers

Collection of simple helper functions that abstract some specifics or the raw API.

elasticsearch.helpers.streaming_bulk(client, actions, chunk_size=500, raise_on_error=False, expand_action_callback=<function expand_action at 0x7f72e8bb3aa0>, **kwargs)

Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use bulk() which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent.

This function expects the action to be in the format as returned by search(), for example:

{
    '_index': 'index-name',
    '_type': 'document',
    '_id': 42,
    '_parent': 5,
    '_ttl': '1d',
    '_source': {
        ...
    }
}

Alternatively, if _source is not present, it will pop all metadata fields from the doc and use the rest as the document data.

Alternative actions (_op_type field defaults to index) can be sent as well:

{
    '_op_type': 'delete',
    '_index': 'index-name',
    '_type': 'document',
    '_id': 42,
}
{
    '_op_type': 'update',
    '_index': 'index-name',
    '_type': 'document',
    '_id': 42,
    'doc': {'question': 'The life, universe and everything.'}
}
Parameters:
  • client – instance of Elasticsearch to use
  • actions – iterable containing the actions to be executed
  • chunk_size – number of docs in one chunk sent to es (default: 500)
  • raise_on_error – raise BulkIndexError containing errors (as .errors from the execution of the last chunk)
  • expand_action_callback – callback executed on each action passed in, should return a tuple containing the action line and the data line (None if data line should be omitted).
elasticsearch.helpers.bulk(client, actions, stats_only=False, **kwargs)

Helper for the bulk() api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True.

See streaming_bulk() for more information and accepted formats.

Parameters:
  • client – instance of Elasticsearch to use
  • actions – iterator containing the actions
  • stats_only – if True only report number of successful/failed operations instead of just number of successful and a list of error responses

Any additional keyword arguments will be passed to streaming_bulk() which is used to execute the operation.

elasticsearch.helpers.scan(client, query=None, scroll='5m', **kwargs)

Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

Parameters:
  • client – instance of Elasticsearch to use
  • query – body for the search() api
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search

Any additional keyword arguments will be passed to the initial search() call.

elasticsearch.helpers.reindex(client, source_index, target_index, target_client=None, chunk_size=500, scroll='5m')

Reindex all documents from one index to another, potentially (if target_client is specified) on a different cluster.

Note

This helper doesn’t transfer mappings, just the data.

Parameters:
  • client – instance of Elasticsearch to use (for read if target_client is specified as well)
  • source_index – index (or list of indices) to read documents from
  • target_index – name of the index in the target cluster to populate
  • target_client – optional, is specified will be used for writing (thus enabling reindex between clusters)
  • chunk_size – number of docs in one chunk sent to es (default: 500)
  • scroll – Specify how long a consistent view of the index should be maintained for scrolled search