Helpers¶
Collection of simple helper functions that abstract some specifics or the raw API.
-
elasticsearch.helpers.
streaming_bulk
(client, actions, chunk_size=500, raise_on_error=True, expand_action_callback=<function expand_action>, raise_on_exception=True, **kwargs)¶ Streaming bulk consumes actions from the iterable passed in and yields results per action. For non-streaming usecases use
bulk()
which is a wrapper around streaming bulk that returns summary information about the bulk operation once the entire input is consumed and sent.This function expects the action to be in the format as returned by
search()
, for example:{ '_index': 'index-name', '_type': 'document', '_id': 42, '_parent': 5, '_ttl': '1d', '_source': { ... } }
Alternatively, if _source is not present, it will pop all metadata fields from the doc and use the rest as the document data.
The
bulk()
api accepts index, create, delete, and update actions. Use the _op_type field to specify an action (_op_type defaults to index):{ '_op_type': 'delete', '_index': 'index-name', '_type': 'document', '_id': 42, } { '_op_type': 'update', '_index': 'index-name', '_type': 'document', '_id': 42, 'doc': {'question': 'The life, universe and everything.'} }
Parameters: - client – instance of
Elasticsearch
to use - actions – iterable containing the actions to be executed
- chunk_size – number of docs in one chunk sent to es (default: 500)
- raise_on_error – raise
BulkIndexError
containing errors (as .errors) from the execution of the last chunk when some occur. By default we raise. - raise_on_exception – if
False
then don’t propagate exceptions from call tobulk
and just report the items that failed as failed. - expand_action_callback – callback executed on each action passed in, should return a tuple containing the action line and the data line (None if data line should be omitted).
- client – instance of
-
elasticsearch.helpers.
bulk
(client, actions, stats_only=False, **kwargs)¶ Helper for the
bulk()
api that provides a more human friendly interface - it consumes an iterator of actions and sends them to elasticsearch in chunks. It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True.See
streaming_bulk()
for more information and accepted formats.Parameters: - client – instance of
Elasticsearch
to use - actions – iterator containing the actions
- stats_only – if True only report number of successful/failed operations instead of just number of successful and a list of error responses
Any additional keyword arguments will be passed to
streaming_bulk()
which is used to execute the operation.- client – instance of
-
elasticsearch.helpers.
scan
(client, query=None, scroll='5m', raise_on_error=True, preserve_order=False, **kwargs)¶ Simple abstraction on top of the
scroll()
api - a simple iterator that yields all hits as returned by underlining scroll requests.By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use
preserve_order=True
. This may be an expensive operation and will negate the performance benefits of usingscan
.Parameters: - client – instance of
Elasticsearch
to use - query – body for the
search()
api - scroll – Specify how long a consistent view of the index should be maintained for scrolled search
- raise_on_error – raises an exception (
ScanError
) if an error is encountered (some shards fail to execute). By default we raise. - preserve_order – don’t set the
search_type
toscan
- this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use with caution.
Any additional keyword arguments will be passed to the initial
search()
call:scan(es, query={"match": {"title": "python"}}, index="orders-*", doc_type="books" )
- client – instance of
-
elasticsearch.helpers.
reindex
(client, source_index, target_index, query=None, target_client=None, chunk_size=500, scroll='5m', scan_kwargs={}, bulk_kwargs={})¶ Reindex all documents from one index that satisfy a given query to another, potentially (if target_client is specified) on a different cluster. If you don’t specify the query you will reindex all the documents.
Note
This helper doesn’t transfer mappings, just the data.
Parameters: - client – instance of
Elasticsearch
to use (for read if target_client is specified as well) - source_index – index (or list of indices) to read documents from
- target_index – name of the index in the target cluster to populate
- query – body for the
search()
api - target_client – optional, is specified will be used for writing (thus enabling reindex between clusters)
- chunk_size – number of docs in one chunk sent to es (default: 500)
- scroll – Specify how long a consistent view of the index should be maintained for scrolled search
- scan_kwargs – additional kwargs to be passed to
scan()
- bulk_kwargs – additional kwargs to be passed to
bulk()
- client – instance of