From b354c42ace25f12589b2284078226e8f0cde75d1 Mon Sep 17 00:00:00 2001 From: Andrew Kroh Date: Fri, 17 Jan 2025 10:14:33 -0500 Subject: [PATCH] Fix merge conflicts --- .../docs/inputs/input-aws-s3.asciidoc | 76 ------------------- 1 file changed, 76 deletions(-) diff --git a/x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc b/x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc index 8c49e0733c0a..4610c6358315 100644 --- a/x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc +++ b/x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc @@ -82,81 +82,6 @@ Please see <> for alternate AWS expand_event_list_from_field: Records ---- -<<<<<<< HEAD -The `aws-s3` input supports the following configuration options plus the -<<{beatname_lc}-input-{type}-common-options>> described later. - -======= -[float] -=== Document ID Generation - -This aws-s3 input feature prevents the duplication of events in Elasticsearch by -generating a custom document `_id` for each event, rather than relying on -Elasticsearch to automatically generate one. Each document in an Elasticsearch -index must have a unique `_id`, and {beatname_uc} uses this property to avoid -ingesting duplicate events. - -The custom `_id` is based on several pieces of information from the S3 object: -the Last-Modified timestamp, the bucket ARN, the object key, and the byte -offset of the data in the event. - -Duplicate prevention is particularly useful in scenarios where {beatname_uc} -needs to retry an operation. {beatname_uc} guarantees at-least-once delivery, -meaning it will retry any failed or incomplete operations. These retries may be -triggered by issues with the host, `{beatname_uc}`, network connectivity, or -services such as Elasticsearch, SQS, or S3. - -[float] -==== Limitations of `_id`-Based Deduplication - -There are some limitations to consider when using `_id`-based deduplication in -Elasticsearch: - -* Deduplication works only within a single index. The same `_id` can exist in - different indices, which is important if you're using data streams or index - aliases. When the backing index rolls over, a duplicate may be ingested. - -* Indexing operations in Elasticsearch may take longer when an `_id` is - specified. Elasticsearch needs to check if the ID already exists before - writing, which can increase the time required for indexing. - -[float] -==== Disabling Duplicate Prevention - -If you want to disable the `_id`-based deduplication, you can remove the -document `_id` using the <> processor in -{beatname_uc}. - -["source","yaml",subs="attributes"] ----- -{beatname_lc}.inputs: - - type: aws-s3 - queue_url: https://queue.amazonaws.com/80398EXAMPLE/MyQueue - processors: - - drop_fields: - fields: - - '@metadata._id' - ignore_missing: true ----- - -Alternatively, you can remove the `_id` field using an Elasticsearch Ingest -Node pipeline. - -["source","json",subs="attributes"] ----- -{ - "processors": [ - { - "remove": { - "if": "ctx.input?.type == \"aws-s3\"", - "field": "_id", - "ignore_missing": true - } - } - ] -} ----- - [float] === Handling Compressed Objects @@ -174,7 +99,6 @@ The `aws-s3` input supports the following configuration options plus the NOTE: For time durations, valid time units are - "ns", "us" (or "µs"), "ms", "s", "m", "h". For example, "2h" ->>>>>>> 7fd2d46de (x-pack/filebeat/docs/ - document gzip S3 object handling (#42306)) [float] ==== `api_timeout`