Text Structure
- class elasticsearch.client.TextStructureClient(client)
- Parameters:
client (BaseClient)
- find_structure(*, text_files=None, body=None, charset=None, column_names=None, delimiter=None, ecs_compatibility=None, explain=None, format=None, grok_pattern=None, has_header_row=None, line_merge_size_limit=None, lines_to_sample=None, quote=None, should_trim_fields=None, timeout=None, timestamp_field=None, timestamp_format=None)
Finds the structure of a text file. The text file must contain data that is suitable to be ingested into Elasticsearch.
https://www.elastic.co/guide/en/elasticsearch/reference/8.16/find-structure.html
- Parameters:
charset (str | None) – The text’s character set. It must be a character set that is supported by the JVM that Elasticsearch uses. For example, UTF-8, UTF-16LE, windows-1252, or EUC-JP. If this parameter is not specified, the structure finder chooses an appropriate character set.
column_names (str | None) – If you have set format to delimited, you can specify the column names in a comma-separated list. If this parameter is not specified, the structure finder uses the column names from the header row of the text. If the text does not have a header role, columns are named “column1”, “column2”, “column3”, etc.
delimiter (str | None) – If you have set format to delimited, you can specify the character used to delimit the values in each row. Only a single character is supported; the delimiter cannot have multiple characters. By default, the API considers the following possibilities: comma, tab, semi-colon, and pipe (|). In this default scenario, all rows must have the same number of fields for the delimited format to be detected. If you specify a delimiter, up to 10% of the rows can have a different number of columns than the first row.
ecs_compatibility (str | None) – The mode of compatibility with ECS compliant Grok patterns (disabled or v1, default: disabled).
explain (bool | None) – If this parameter is set to true, the response includes a field named explanation, which is an array of strings that indicate how the structure finder produced its result.
format (str | None) – The high level structure of the text. Valid values are ndjson, xml, delimited, and semi_structured_text. By default, the API chooses the format. In this default scenario, all rows must have the same number of fields for a delimited format to be detected. If the format is set to delimited and the delimiter is not set, however, the API tolerates up to 5% of rows that have a different number of columns than the first row.
grok_pattern (str | None) – If you have set format to semi_structured_text, you can specify a Grok pattern that is used to extract fields from every message in the text. The name of the timestamp field in the Grok pattern must match what is specified in the timestamp_field parameter. If that parameter is not specified, the name of the timestamp field in the Grok pattern must match “timestamp”. If grok_pattern is not specified, the structure finder creates a Grok pattern.
has_header_row (bool | None) – If you have set format to delimited, you can use this parameter to indicate whether the column names are in the first row of the text. If this parameter is not specified, the structure finder guesses based on the similarity of the first row of the text to other rows.
line_merge_size_limit (int | None) – The maximum number of characters in a message when lines are merged to form messages while analyzing semi-structured text. If you have extremely long messages you may need to increase this, but be aware that this may lead to very long processing times if the way to group lines into messages is misdetected.
lines_to_sample (int | None) – The number of lines to include in the structural analysis, starting from the beginning of the text. The minimum is 2; If the value of this parameter is greater than the number of lines in the text, the analysis proceeds (as long as there are at least two lines in the text) for all of the lines.
quote (str | None) – If you have set format to delimited, you can specify the character used to quote the values in each row if they contain newlines or the delimiter character. Only a single character is supported. If this parameter is not specified, the default value is a double quote (“). If your delimited text format does not use quoting, a workaround is to set this argument to a character that does not appear anywhere in the sample.
should_trim_fields (bool | None) – If you have set format to delimited, you can specify whether values between delimiters should have whitespace trimmed from them. If this parameter is not specified and the delimiter is pipe (|), the default value is true. Otherwise, the default value is false.
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – Sets the maximum amount of time that the structure analysis make take. If the analysis is still running when the timeout expires then it will be aborted.
timestamp_field (str | None) – Optional parameter to specify the timestamp field in the file
timestamp_format (str | None) – The Java time format of the timestamp field in the text.
- Return type:
- test_grok_pattern(*, grok_pattern=None, text=None, ecs_compatibility=None, error_trace=None, filter_path=None, human=None, pretty=None, body=None)
Tests a Grok pattern on some text.
https://www.elastic.co/guide/en/elasticsearch/reference/8.16/test-grok-pattern.html
- Parameters:
grok_pattern (str | None) – Grok pattern to run on the text.
text (Sequence[str] | None) – Lines of text to run the Grok pattern on.
ecs_compatibility (str | None) – The mode of compatibility with ECS compliant Grok patterns (disabled or v1, default: disabled).
error_trace (bool | None)
human (bool | None)
pretty (bool | None)
- Return type: