This prepper forwards ExportTraceServiceRequests
via gRPC to other Data Prepper instances.
The primary usecase of this prepper is to ensure that groups of traces are aggregated by trace ID and processed by the same Prepper instance.
Presently peer discovery is provided by either a static list (configured in yaml) or by a DNS record lookup.
Static list example:
prepper:
- peer_forwarder:
time_out: 300
span_agg_count: 48
discovery_mode: "static"
static_endpoints:
- "data-prepper1"
- "data-prepper2"
DNS lookup example:
prepper:
- peer_forwarder:
time_out: 300
span_agg_count: 48
discovery_mode: "dns"
domain_name: "data-prepper-cluster.my-domain.net"
For DNS cluster setup, see Operating a Cluster with DNS Discovery. To setup for AWS Cloud Map, see Using AWS CloudMap.
DNS discovery is recommended when scaling out a Data Prepper cluster. The core concept is to configure a DNS provider to return a list of Data Prepper hosts when given a single domain name.
Kubernetes is the recommended approach to managing a Data Prepper cluster. In addition to handling tasks like failure detection and host replacement, it also maintains an internal DNS service for pod discovery. Headless services allow for a single service address to map to all Data Prepper pods. This DNS entry is automatically kept up-to-date as pods are created/replaced/destroyed.
See the /examples/dev/k8s directory for a working example using minikube.
A DNS server (like dnsmasq) can be configured to maintain a list of Data Prepper hosts via config files. Data Prepper hosts must be configured to use the custom DNS server as their DNS provider. The list of hosts must be manually updated whenever a new Data Prepper host is created. See the /examples/dev/dns directory for a set of sample dnsmasq configuration files.
Private hosted zones enable Amazon Route 53 to "respond to DNS queries for a domain and its subdomains within one or more VPCs that you create with the Amazon VPC service." Similar to the custom DNS server approach, except that Route 53 maintains the list of Data Prepper hosts. Suffers from the same drawback in that the list must be manually kept up-to-date.
An alternative to DNS discovery is AWS CloudMap. CloudMap provides API-based service discovery as well as DNS-based service discovery.
Peer forwarder can use the API-based service discovery. To support this you must have an existing namespace configured for API instance discovery. You can create a new one following the instructions provided by the CloudMap documentation.
Your pipeline configuration needs to include:
awsCloudMapNamespaceName
- Set to your CloudMap Namespace nameawsCloudMapServiceName
- Set to the service name within your specified NamespaceawsRegion
- The AWS region where your namespace exists.discovery_mode
- Set toaws_cloud_map
Example configuration:
prepper:
- peer_forwarder:
discovery_mode: aws_cloud_map
awsCloudMapNamespaceName: my-namespace
awsCloudMapServiceName: data-prepper-cluster
awsRegion: us-east-1
The DataPrepper must also be running with the necessary permissions. The following IAM policy shows the necessary permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudMapPeerForwarder",
"Effect": "Allow",
"Action": "servicediscovery:DiscoverInstances",
"Resource": "*"
}
]
}
time_out
: timeout in seconds for sendingExportTraceServiceRequest
. Defaults to 3 seconds.span_agg_count
: batch size for number of spans perExportTraceServiceRequest
. Defaults to 48.target_port
: the destination port to forward requests to. Defaults to21890
discovery_mode
: peer discovery mode to be used. Allowable values arestatic
,dns
, andaws_cloud_map
. Defaults tostatic
static_endpoints
: list containing endpoints of all Data Prepper instances.domain_name
: single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.awsCloudMapNamespaceName
- specifies the CloudMap namespace when using AWS CloudMap service discoveryawsCloudMapServiceName
- specifies the CloudMap service when using AWS CloudMap service discovery
The SSL configuration for setting up trust manager for peer forwarding client to connect to other Data Prepper instances. The SSL configuration should be same as the one used for OTel Trace Source.
ssl(Optional)
=> A boolean enables TLS/SSL. Default istrue
.sslKeyCertChainFile(Optional)
=> AString
represents the SSL certificate chain file path or AWS S3 path. S3 path examples3://<bucketName>/<path>
. Required ifssl
is set totrue
.useAcmCertForSSL(Optional)
=> A boolean enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default isfalse
.acmCertificateArn(Optional)
=> AString
represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required ifuseAcmCertForSSL
is set totrue
.awsRegion(Optional)
=> AString
represents the AWS region to use ACM or S3. Required ifuseAcmCertForSSL
is set totrue
orsslKeyCertChainFile
andsslKeyFile
isAWS S3 path
.
Besides common metrics in AbstractPrepper, peer-forwarder introduces the following custom metrics.
latency
: measures latency of forwarded requests.
requests
: measures total number of forwarded requests.errors
: measures number of failed requests.
peerEndpoints
: measures number of dynamically discovered peer data-prepper endpoints. Forstatic
mode, the size is fixed.
This plugin is compatible with Java 14. See