From 7da7b0ec6ccc40ddbbfaca6d3ca5b979dfd4577b Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 20 Nov 2024 14:13:28 +0100 Subject: [PATCH 01/17] wip --- .../dynamic-sampling/fidelity-and-biases.mdx | 86 +++++++++---------- .../dynamic-sampling/the-big-picture.mdx | 56 ++++++------ 2 files changed, 75 insertions(+), 67 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 703795d0f905c8..2ddcff7caa3377 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -13,11 +13,9 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l ## The Concept of Fidelity -At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all transactions of an organization. +At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. -The **determination** of the target sample rate is done dynamically by analyzing the volume of data received by Sentry in a specific time window (configurable [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/options/defaults.py#L690)) and then calling the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)) which takes as input the volume in the time window and returns a sampling tier in the form of (`volume`, `sample_rate`). - -_The `get_sampling_tier_for_volume`, like the `get_blend_sample_rate` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L466)), is a function that must be overridden by the user to customize the behavior of Dynamic Sampling._ +In automatic mode, the target sample rate is computed for each project based on the volume of events in a time window of 24 hours. In manual mode, the user can set a constant sample rate for each project that will not be automatically adjusted. Within this target sample rate, Dynamic Sampling can create a **bias toward more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. @@ -25,7 +23,7 @@ Within this target sample rate, Dynamic Sampling can create a **bias toward more ### Approximate Fidelity -It is important to note that fidelity only determines an **approximate target sample rate**, so there is flexibility in creating exact sample rates. The ingestion pipeline, composed on [Relay](https://docs.sentry.io/product/relay/) and other components, does not have the infrastructure to track volume, so it cannot create an actual weighted distribution within the target sample rate. +It is important to note that fidelity only determines an **approximate target sample rate**, so there is flexibility in creating exact sample rates. The ingestion pipeline, composed of [Relay](https://docs.sentry.io/product/relay/) and other components, does not have the infrastructure to track volume, so it cannot create an actual weighted distribution within the target sample rate. Instead, the Sentry backend **computes a set of rules** whose goal is to cooperatively achieve the target sample rate. Determining when and how to set these rules is part of the Dynamic Sampling infrastructure. @@ -41,9 +39,10 @@ Sentry supports **two fundamentally different types of sampling**. While this is ### Trace Sampling -A trace is a **collection of transactions that are related to each other**. For example a trace could contain transactions started from your frontend that are then generating transactions in your backend. +A trace is a **collection of events that are related to each other**. For example a trace could contain events started from your frontend that are then generating events in your backend. -Trace sampling ensures that **either all transactions of a trace are sampled, or none**. That is, these rules **always yield the same sampling decision** for every transaction in the same trace. This requires the cooperation of SDKs and thus allows sampling only by `project`, `release`, `environment`, and `transaction` name. +TODO: have the fields usable for sampling changed? +Trace sampling ensures that **either all events of a trace are sampled, or none**. That is, these rules **always yield the same sampling decision** for every event in the same trace. This requires the cooperation of SDKs and thus allows sampling only by `project`, `release`, `environment`, and `transaction` name. To achieve trace sampling, SDKs pass all fields that can be sampled by [Dynamic Sampling Context (DSC)](/sdk/performance/dynamic-sampling-context/) (defined [here](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html)) as they propagate traces. _This ensures that every transaction from the same trace comes with the same DSC._ @@ -57,7 +56,13 @@ In order to achieve full trace sampling, the random number generator used by Rel ### Transaction Sampling -Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces. +Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces + +## Sample Rate Adujustment: Automatic Mode and Manual Mode +There are two modes of operation for Dynamic Sampling: Automatic Mode and Manual Mode. +Automatic mode manages the sample rate for each project based on the target sample rate for the organization. +Manual mode allows the user to set sample rates on a per-project basis. + ## Biases for Sampling @@ -71,30 +76,19 @@ An example of how the UI looks is shown in the following screenshot (the content ![Biases in the UI](./images/biasesUI.png) -### Deprioritize Health Checks -This bias is used to de-prioritize transactions that are classified as health checks. The goal is to reduce the amount of data retained for health checks, since they are not very useful for debugging. -In order to mark a transaction as a health check, we leverage a list of known health check endpoints, which is maintained by Sentry and updated regularly. +### Prioritize New Releases -```python -HEALTH_CHECK_GLOBS = [ - "*healthcheck*", - "*healthy*", - "*live*", - "*ready*", - "*heartbeat*", - "*/health", - "*/healthz", - # ... -] -``` +This bias is used to prioritize traces that are coming from a new release. The goal is to increase the sample rate in the time window that occurs between the creation of a release and its adoption by users. _The identification of a new release is done in the `event_manager` defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/event_manager.py#L937-L937)._ -The list of health check endpoints is available [here](https://github.com/getsentry/sentry/blob/4cb0d863de1ef8e3440153cb440eaca8025dee0d/src/sentry/dynamic_sampling/rules/biases/ignore_health_checks_bias.py#L14). +Since the adoption of a release is not constant, we created a system of _decaying_ rules which can interpolate between two sample rates in a given time window with a given function (e.g. `linear`). The idea being that we want to reduce the sample rate since the amount of samples will increase as the release gets adopted by users. -For deprioritizing health checks, we compute a new sample rate by dividing the base sample rate of the project by a factor, which is defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/utils.py#L13-L13). +![Sample Rate and Adoption](./images/sampleRateAndAdoption.png) + +The latest release bias uses a decaying rule to interpolate between a starting sample rate and an ending sample rate over a time window that is statically defined for each platform (the list of time to adoptions is define [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/helpers/time_to_adoptions.py#L26-L26). For example, Android has a bigger time window than Javascript because on average Android apps take more time to get adopted by users. -### Boost Dev Environments +### Prioritize Dev Environments This bias is used to prioritize traces coming from a development environment in order to increase the amount of data retained for such environments, since they are more likely to be useful for debugging. @@ -115,34 +109,40 @@ The list of development environments is available [here](https://github.com/gets For prioritizing dev environments, we use a sample rate of `1.0` (100%), which results in all traces being sampled. -### Boost New Releases - -This bias is used to prioritize traces that are coming from a new release. The goal is to increase the sample rate in the time window that occurs between the creation of a release and its adoption by users. _The identification of a new release is done in the `event_manager` defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/event_manager.py#L937-L937)._ - -Since the adoption of a release is not constant, we created a system of _decaying_ rules which can interpolate between two sample rates in a given time window with a given function (e.g. `linear`). The idea being that we want to reduce the sample rate since the amount of samples will increase as the release gets adopted by users. -![Sample Rate and Adoption](./images/sampleRateAndAdoption.png) +### Prioritize Low Volume Transactions +This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace. -The latest release bias uses a decaying rule to interpolate between a starting sample rate and an ending sample rate over a time window that is statically defined for each platform (the list of time to adoptions is define [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/helpers/time_to_adoptions.py#L26-L26). For example, Android has a bigger time window than Javascript because on average Android apps take more time to get adopted by users. +In order to rebalance transactions, the system computes the counts of the transactions for each project and runs an algorithm that, given the sample rate of the organization and the counts of each transaction, computes a new sample rate for each transaction assuming an ideal distribution of the counts. -### Boost Low Volume Transactions + -This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace. +The algorithms for boosting low volume events are run periodically (with cron jobs) with a sliding window to account for changes in the incoming volume. -In order to rebalance transactions, the system computes the counts of the transactions for each project and runs an algorithm that, given the sample rate of the organization and the counts of each transaction, computes a new sample rate for each transaction assuming an ideal distribution of the counts. + -### Boost Low Volume Projects +### Deprioritize Health Checks -This bias is the simplest one that can be defined. It applies to any incoming trace and is defined on a per-project basis. +This bias is used to de-prioritize transactions that are classified as health checks. The goal is to reduce the amount of data retained for health checks, since they are not very useful for debugging. -_The sample rate of the boost low volume projects bias is computed using an algorithm that leverages a dynamic sample rate obtained by measuring the incoming volume of transactions in a sliding time window, known as the target fidelity rate. This rate is obtained by calling, at fixed intervals, the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)), which given the volume in a time window, will determine the appropriate target fidelity rate for the entire organization._ +In order to mark a transaction as a health check, we leverage a list of known health check endpoints, which is maintained by Sentry and updated regularly. -The algorithm used in this bias, computes a new sample rate with the goal of prioritizing low-volume projects, which can be drowned out by high-volume projects. The mechanism used for prioritizing is similar to the low-volume transactions bias in which given the sample rate of the organization and the counts of each project, it computes a new sample rate for each project, assuming an ideal distribution of the counts. +```python +HEALTH_CHECK_GLOBS = [ + "*healthcheck*", + "*healthy*", + "*live*", + "*ready*", + "*heartbeat*", + "*/health", + "*/healthz", + # ... +] +``` - +The list of health check endpoints is available [here](https://github.com/getsentry/sentry/blob/4cb0d863de1ef8e3440153cb440eaca8025dee0d/src/sentry/dynamic_sampling/rules/biases/ignore_health_checks_bias.py#L14). -The algorithms for boosting low volume transactions and projects are run periodically (with cron jobs) with a sliding window to account for changes in the incoming volume. +For deprioritizing health checks, we compute a new sample rate by dividing the base sample rate of the project by a factor, which is defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/utils.py#L13-L13). - If you want to learn more about the architecture behind Dynamic Sampling, continue to the [next page](/dynamic-sampling/architecture/). diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx index 6ddfaca1a38fd7..aa79bc774f1949 100644 --- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx @@ -6,29 +6,37 @@ sidebar_order: 1 ![Sequencing](./images/sequencing.png) + + + +Dynamic Sampling currently operates on spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed out using these info-bubbles. + + + + ## Sequencing -Dynamic Sampling occurs at the edge of our ingestion pipeline, precisely in [Relay](https://github.com/getsentry/relay). +Dynamic Sampling occurs at the edge of our ingestion pipeline, precisely in [Relay](https://github.com/getsentry/relay). If the feature flag `dynamic-sampling-spans` is activated, the logic applies to spans, otherwise it applies to transactions. As we go on, everything will be moving to spans and the transactions model will be phased out. -When transaction events arrive, in a simplified model, they go through the following steps (some of which won't apply if you self-host Sentry): +When events arrive, in a simplified model, they go through the following steps (some of which won't apply if you self-host Sentry): -1. **Inbound data filters**: every transaction runs through inbound data filters as configured in project settings, such as legacy browsers or denied releases. Transactions dropped here do not count for quota and are not included in “total transactions” data. -2. **Quota enforcement**: Sentry charges for all further transactions sent in, before events are passed on to dynamic sampling. -3. **Metrics extraction**: after passing quotas, Sentry extracts metrics from the total incoming transactions. These metrics provide granular numbers for the performance and frequency of every application transaction. -4. **Dynamic Sampling**: based on an internal set of rules, Relay determines a sample rate for every incoming transaction event. A random number generator finally decides whether this payload should be kept or dropped. -5. **Rate limiting**: transactions that are sampled by Dynamic Sampling will be stored and indexed. To protect the infrastructure, internal rate limits apply at this point. Under normal operation, this **rate limit is never reached** since dynamic sampling already reduces the volume of stored events. +1. **Inbound data filters**: every event runs through inbound data filters as configured in project settings, such as legacy browsers or denied releases. Events dropped here are not counted towards quota and are not included in "total events" data. +2. **Quota enforcement**: Sentry charges for all further events sent in, before they are passed on to dynamic sampling. +3. **Metrics extraction**: after passing quotas, Sentry extracts metrics from the total incoming events. These metrics provide granular numbers for the performance and frequency of every application event. +4. **Dynamic Sampling**: based on an internal set of rules, Relay determines a sample rate for every incoming event. A random number generator finally decides whether this payload should be kept or dropped. +5. **Rate limiting**: events that are sampled by Dynamic Sampling will be stored and indexed. To protect the infrastructure, internal rate limits apply at this point. Under normal operation, this **rate limit is never reached** since dynamic sampling already reduces the volume of stored events. -A client is sending 1000 transactions per second to Sentry: -1. 100 transactions per second are from old browsers and get dropped through an inbound data filter. -2. The remaining 900 transactions per second show up as total transactions in Sentry. -3. Their current overall sample rate is at 20%, which statistically samples 180 transactions per second. -4. Since this is above the 100/s limit, about 80 transactions per second are randomly dropped, and the rest is stored. +A client is sending 1000 events per second to Sentry: +1. 100 events per second are from old browsers and get dropped through an inbound data filter. +2. The remaining 900 events per second show up as total events in Sentry. +3. Their current overall sample rate is at 20%, which statistically samples 180 events per second. +4. Since this is above the 100/s limit, about 80 events per second are randomly dropped, and the rest is stored. -## Rate Limiting and Total Transactions +## Rate Limiting and Total Events The ingestion pipeline has two kinds of rate limits that behave differently compared to organizations without dynamic sampling: @@ -37,35 +45,35 @@ The ingestion pipeline has two kinds of rate limits that behave differently com -There is a dedicated rate limit for stored transactions after inbound filters and dynamic sampling. However, it does not affect total transactions since the fidelity decreases with higher total transaction volumes and this rate limit is not expected to trigger since Dynamic Sampling already reduces the stored transaction throughput. +There is a dedicated rate limit for stored events after inbound filters and dynamic sampling. However, it does not affect total events since the fidelity decreases with higher total event volumes and this rate limit is not expected to trigger since Dynamic Sampling already reduces the stored event throughput. ## Rate Limiting and Trace Completeness -Dynamic sampling ensures complete traces by retaining all transactions associated with a trace if the head transaction is preserved. +Dynamic sampling ensures complete traces by retaining all events associated with a trace if the head event is preserved. -Despite dynamic sampling providing trace completeness, transactions or other items (errors, replays, ...) may still be missing from a trace when rate limiting drops one or more transactions. Rate limiting drops items without regard for the trace, making each decision independently and potentially resulting in broken traces. +Despite dynamic sampling providing trace completeness, events or other items (errors, replays, ...) may still be missing from a trace when rate limiting drops one or more events. Rate limiting drops items without regard for the trace, making each decision independently and potentially resulting in broken traces. -For example, if there is a trace from `Project A` to `Project B` and `Project B` is subject to rate limiting or quota enforcement, transactions of `Project B` from the trace initiated by `Project A` are lost. +For example, if there is a trace from `Project A` to `Project B` and `Project B` is subject to rate limiting or quota enforcement, events of `Project B` from the trace initiated by `Project A` are lost. ## Client Side Sampling and Dynamic Sampling -Clients have their own [traces sample rate](https://docs.sentry.io/platforms/javascript/performance/#configure-the-sample-rate). The client sample rate is a number in the range `[0.0, 1.0]` (from 0% to 100%) that controls **how many transactions arrive at Sentry**. While documentation will generally suggest a sample rate of `1.0`, for some use cases it might be better to reduce it. +Clients have their own [traces sample rate](https://docs.sentry.io/platforms/javascript/tracing/#configure). The client sample rate is a number in the range `[0.0, 1.0]` (from 0% to 100%) that controls **how many events arrive at Sentry**. While documentation will generally suggest a sample rate of `1.0`, for some use cases it might be better to reduce it. -Dynamic Sampling further reduces how many transactions get stored internally. **While many-to-most graphs and numbers in Sentry are based on total transactions**, accessing spans and tags requires stored transactions. The sample rates apply on top of each other. +Dynamic Sampling further reduces how many events get stored internally. **While many-to-most graphs and numbers in Sentry are based on total events**, accessing spans and tags requires stored events. The sample rates apply on top of each other. -An example of client side sampling and Dynamic Sampling starting from 100k transactions which results in 15k stored transactions is shown below: +An example of client side sampling and Dynamic Sampling starting from 100k events which results in 15k stored events is shown below: ![Client and Dynamic Sampling](./images/clientAndDynamicSampling.png) ## Total Transactions -To collect unsampled information for “total” transactions in Performance, Alerts, and Dashboards, Relay extracts [metrics](https://getsentry.github.io/relay/relay_metrics/index.html) from transactions. In short, these metrics comprise: +To collect unsampled information for “total” transactions in Performance, Alerts, and Dashboards, Relay extracts [metrics](https://getsentry.github.io/relay/relay_metrics/index.html) from spans and transactions. In short, these metrics comprise: - Counts and durations for all transactions. - A distribution (histogram) for all measurements, most notably the web vitals. @@ -73,13 +81,13 @@ To collect unsampled information for “total” transactions in Performance, Al Each of these metrics can be filtered and grouped by a number of predefined tags, [implemented in Relay](https://github.com/getsentry/relay/blob/master/relay-server/src/metrics_extraction/transactions/types.rs#L142-L157). -For more granular queries, **stored transaction events are needed**. _The purpose of dynamic sampling here is to ensure that enough representatives are always available._ +For more granular queries, **stored events are needed**. _The purpose of dynamic sampling here is to ensure that enough representatives are always available._ -If Sentry applies a 1% dynamic sample rate, you can still receive accurate TPM (transactions per minute) and web vital quantiles through total transaction data backed by metrics. There is also a listing of each of these numbers by the transaction. +If Sentry applies a 1% dynamic sample rate, you can still receive accurate events per minute (SPM or TPM, depending on event type) and web vital quantiles through total event data backed by metrics. There is also a listing of each of these numbers by the transaction. -When you go into transaction summary or Discover, you might want to now split the data by a custom tag you’ve added to your transactions. This granularity is not offered by metrics, so **these queries need to use stored transactions**. +When you go into the trace explorer or Discover, you might want to now split the data by a custom tag you’ve added to your events. This granularity is not offered by metrics, so **these queries need to use stored events**. From db1290b3902ffef68326b443cf39672fdc0c8432 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 20 Nov 2024 14:16:22 +0100 Subject: [PATCH 02/17] typo --- .../dynamic-sampling/fidelity-and-biases.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 2ddcff7caa3377..15bc575a950f3a 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -58,7 +58,7 @@ In order to achieve full trace sampling, the random number generator used by Rel Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces -## Sample Rate Adujustment: Automatic Mode and Manual Mode +## Sample Rate Adjustment: Automatic Mode and Manual Mode There are two modes of operation for Dynamic Sampling: Automatic Mode and Manual Mode. Automatic mode manages the sample rate for each project based on the target sample rate for the organization. Manual mode allows the user to set sample rates on a per-project basis. From 29c674ec72b054e77ba9acc293e8199849ff903f Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 20 Nov 2024 14:29:26 +0100 Subject: [PATCH 03/17] wip --- .../dynamic-sampling/the-big-picture.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx index aa79bc774f1949..9eb5cff0cf706d 100644 --- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx @@ -9,7 +9,7 @@ sidebar_order: 1 -Dynamic Sampling currently operates on spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed out using these info-bubbles. +Dynamic Sampling currently operates on spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed explicitly. From 1a1f39bf02e2bd71015beb8b4a5702421447a319 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Fri, 22 Nov 2024 09:25:51 +0100 Subject: [PATCH 04/17] proof read big picture --- .../dynamic-sampling/the-big-picture.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx index 9eb5cff0cf706d..e7a40782f3cdbb 100644 --- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx @@ -7,24 +7,24 @@ sidebar_order: 1 ![Sequencing](./images/sequencing.png) - + -Dynamic Sampling currently operates on spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed explicitly. +Dynamic Sampling currently operates on either spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed out explicitly. ## Sequencing -Dynamic Sampling occurs at the edge of our ingestion pipeline, precisely in [Relay](https://github.com/getsentry/relay). If the feature flag `dynamic-sampling-spans` is activated, the logic applies to spans, otherwise it applies to transactions. As we go on, everything will be moving to spans and the transactions model will be phased out. +Dynamic Sampling occurs at the edge of our ingestion pipeline, precisely in [Relay](https://github.com/getsentry/relay). When events arrive, in a simplified model, they go through the following steps (some of which won't apply if you self-host Sentry): 1. **Inbound data filters**: every event runs through inbound data filters as configured in project settings, such as legacy browsers or denied releases. Events dropped here are not counted towards quota and are not included in "total events" data. 2. **Quota enforcement**: Sentry charges for all further events sent in, before they are passed on to dynamic sampling. -3. **Metrics extraction**: after passing quotas, Sentry extracts metrics from the total incoming events. These metrics provide granular numbers for the performance and frequency of every application event. -4. **Dynamic Sampling**: based on an internal set of rules, Relay determines a sample rate for every incoming event. A random number generator finally decides whether this payload should be kept or dropped. -5. **Rate limiting**: events that are sampled by Dynamic Sampling will be stored and indexed. To protect the infrastructure, internal rate limits apply at this point. Under normal operation, this **rate limit is never reached** since dynamic sampling already reduces the volume of stored events. +3. **Metrics extraction**: after passing quotas, Sentry extracts metrics from the total incoming events. These metrics provide granular numbers for the performance and frequency of every event. +4. **Dynamic Sampling**: based on an internal set of rules, Relay determines a sample rate for every incoming event. A random number generator finally decides whether a payload should be kept or dropped. +5. **Rate limiting**: events that are sampled by Dynamic Sampling will be stored and indexed. To protect the infrastructure, internal rate limits apply at this point. Under normal operation, this **rate limit is never reached** since dynamic sampling already reduces the volume of events stored. @@ -53,7 +53,7 @@ There is a dedicated rate limit for stored events after inbound filters and dyna Dynamic sampling ensures complete traces by retaining all events associated with a trace if the head event is preserved. -Despite dynamic sampling providing trace completeness, events or other items (errors, replays, ...) may still be missing from a trace when rate limiting drops one or more events. Rate limiting drops items without regard for the trace, making each decision independently and potentially resulting in broken traces. +Despite dynamic sampling providing trace completeness, events or other items (errors, replays, ...) may still be missing from a trace when rate limiting drops one or more of them. Rate limiting drops items without regard for the trace, making each decision independently and potentially resulting in broken traces. @@ -65,7 +65,7 @@ For example, if there is a trace from `Project A` to `Project B` and `Project B` Clients have their own [traces sample rate](https://docs.sentry.io/platforms/javascript/tracing/#configure). The client sample rate is a number in the range `[0.0, 1.0]` (from 0% to 100%) that controls **how many events arrive at Sentry**. While documentation will generally suggest a sample rate of `1.0`, for some use cases it might be better to reduce it. -Dynamic Sampling further reduces how many events get stored internally. **While many-to-most graphs and numbers in Sentry are based on total events**, accessing spans and tags requires stored events. The sample rates apply on top of each other. +Dynamic Sampling further reduces how many events get stored internally. **While most graphs and numbers in Sentry are based on total events**, accessing spans and tags requires stored events. The sample rates apply on top of each other. An example of client side sampling and Dynamic Sampling starting from 100k events which results in 15k stored events is shown below: @@ -75,13 +75,13 @@ An example of client side sampling and Dynamic Sampling starting from 100k event To collect unsampled information for “total” transactions in Performance, Alerts, and Dashboards, Relay extracts [metrics](https://getsentry.github.io/relay/relay_metrics/index.html) from spans and transactions. In short, these metrics comprise: -- Counts and durations for all transactions. +- Counts and durations for all events. - A distribution (histogram) for all measurements, most notably the web vitals. - The number of unique users (set). Each of these metrics can be filtered and grouped by a number of predefined tags, [implemented in Relay](https://github.com/getsentry/relay/blob/master/relay-server/src/metrics_extraction/transactions/types.rs#L142-L157). -For more granular queries, **stored events are needed**. _The purpose of dynamic sampling here is to ensure that enough representatives are always available._ +For more granular queries, **stored events are needed**. _The purpose of dynamic sampling here is to ensure that there are always sufficient representative sample events._ From d7ec3526e988bd272c3c91cdbdc7959505127a6a Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Fri, 22 Nov 2024 09:36:44 +0100 Subject: [PATCH 05/17] proof read fidelity and biases --- .../dynamic-sampling/fidelity-and-biases.mdx | 22 ++++++++----------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 15bc575a950f3a..70f9805efc8383 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -3,7 +3,7 @@ title: Fidelity and Biases sidebar_order: 2 --- -Dynamic Sampling is a feature that allows Sentry to automatically adjust the amount of data retained based on the value of the data. This is technically achieved by applying a **sample rate** to every event, which is determined by a **set of rules** that are evaluated for each event. +Dynamic Sampling allows Sentry to automatically adjust the amount of data retained based on how valuable the data is to the user. This is technically achieved by applying a **sample rate** to every event, which is determined by a **set of rules** that are evaluated for each event. @@ -15,7 +15,10 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. -In automatic mode, the target sample rate is computed for each project based on the volume of events in a time window of 24 hours. In manual mode, the user can set a constant sample rate for each project that will not be automatically adjusted. +### Target Sample Rate Adjustment: Automatic Mode and Manual Mode +There are two available modes to govern the target sample rates for Dynamic Sampling: Automatic Mode and Manual Mode. +- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. +- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Within this target sample rate, Dynamic Sampling can create a **bias toward more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. @@ -39,12 +42,11 @@ Sentry supports **two fundamentally different types of sampling**. While this is ### Trace Sampling -A trace is a **collection of events that are related to each other**. For example a trace could contain events started from your frontend that are then generating events in your backend. +A trace is a **collection of events that are related to each other**. For example, a trace could contain events started from your frontend that are then generating events in your backend. -TODO: have the fields usable for sampling changed? Trace sampling ensures that **either all events of a trace are sampled, or none**. That is, these rules **always yield the same sampling decision** for every event in the same trace. This requires the cooperation of SDKs and thus allows sampling only by `project`, `release`, `environment`, and `transaction` name. -To achieve trace sampling, SDKs pass all fields that can be sampled by [Dynamic Sampling Context (DSC)](/sdk/performance/dynamic-sampling-context/) (defined [here](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html)) as they propagate traces. _This ensures that every transaction from the same trace comes with the same DSC._ +To achieve trace sampling, SDKs pass all fields that can be sampled by [Dynamic Sampling Context (DSC)](/sdk/performance/dynamic-sampling-context/) (defined [here](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html)) as they propagate traces. _This ensures that every event from the same trace comes with the same DSC._ ![Trace Sampling](./images/traceSampling.png) @@ -56,17 +58,11 @@ In order to achieve full trace sampling, the random number generator used by Rel ### Transaction Sampling -Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces - -## Sample Rate Adjustment: Automatic Mode and Manual Mode -There are two modes of operation for Dynamic Sampling: Automatic Mode and Manual Mode. -Automatic mode manages the sample rate for each project based on the target sample rate for the organization. -Manual mode allows the user to set sample rates on a per-project basis. - +Transaction Sampling **does not guarantee complete traces** and instead **applies to individual transactions** by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces. ## Biases for Sampling -A bias is a set of one or more rules that are evaluated for each event. More specifically, when we define a bias, we want to achieve a specific objective, which **can be expressed as a set of rules**. To learn more about rules, check out the architecture page [here](/dynamic-sampling/architecture/). +A bias is a set of one or more rules that are evaluated for each event. More specifically, when we define a bias, we want to achieve a specific objective, which **can be expressed as a set of rules**. You learn more about rules on the architecture page [here](/dynamic-sampling/architecture/). Sentry has already defined a set of biases that are available to all customers. These biases have different goals, but they can be combined to express more complex semantics. From 2706eeff1a08bbe527732dfd2bc5323b659bf38f Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 09:56:57 +0100 Subject: [PATCH 06/17] work in review commeents --- .../dynamic-sampling/fidelity-and-biases.mdx | 19 +++++++++++++++---- .../dynamic-sampling/the-big-picture.mdx | 4 ++-- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 70f9805efc8383..8134bba62a1da8 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -15,12 +15,19 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. -### Target Sample Rate Adjustment: Automatic Mode and Manual Mode +### Dynamic Sampling Modes There are two available modes to govern the target sample rates for Dynamic Sampling: Automatic Mode and Manual Mode. - **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. - **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. -Within this target sample rate, Dynamic Sampling can create a **bias toward more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. +Internally, Automatic Mode is called Organization Mode, while Manual Mode is called Project Mode. The settings around the mode and the sample rates are implemented using organization and project options. The [DynamicSamplingMode object](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/types.py#L7-L12) defines the available modes and their string representations to be set in the options. The dynamic sampling mode is set using the organization option `sentry:sampling_mode`. + +If `sentry:sampling_mode` == `organization`, the **organization** option `sentry:target_sample_rate` defines the organization target sample rate. +If `sentry:sampling_mode` == `project`, the **project** option `sentry:target_sample_rate` defines the project target sample rate for each project. + +On switching between modes, the current target sample rates are preserved unless changed by the user explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. + +The [target sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. ![Concept of Fidelity](./images/fidelityAndPriorities.png) @@ -82,7 +89,7 @@ Since the adoption of a release is not constant, we created a system of _decayin ![Sample Rate and Adoption](./images/sampleRateAndAdoption.png) -The latest release bias uses a decaying rule to interpolate between a starting sample rate and an ending sample rate over a time window that is statically defined for each platform (the list of time to adoptions is define [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/helpers/time_to_adoptions.py#L26-L26). For example, Android has a bigger time window than Javascript because on average Android apps take more time to get adopted by users. +The latest release bias uses a decaying rule to interpolate between a starting sample rate and an ending sample rate over a time window that is statically defined for each platform (the list of time to adoptions is define [here](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/helpers/time_to_adoptions.py#L25). For example, Android has a bigger time window than Javascript because on average Android apps take more time to get adopted by users. ### Prioritize Dev Environments @@ -109,7 +116,11 @@ For prioritizing dev environments, we use a sample rate of `1.0` (100%), which r ### Prioritize Low Volume Transactions This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace. -In order to rebalance transactions, the system computes the counts of the transactions for each project and runs an algorithm that, given the sample rate of the organization and the counts of each transaction, computes a new sample rate for each transaction assuming an ideal distribution of the counts. +Prioritization of low volume projects works slightly differently depending on the dynamic sampling mode: +- In **Automatic Mode** (`sentry:sampling_mode` == `organization`), the organization target sample rate is used as the base sample rate for the balancing algorithm. +- In **Manual Mode** (`sentry:sampling_mode` == `project`), the project target sample rate is used as the base sample rate for the balancing algorithm. + +In order to rebalance transactions, the system retrieves the counts of the transactions for each project and calculates a new sample rate for each transaction. diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx index e7a40782f3cdbb..d0c0fc5fad1a09 100644 --- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx @@ -18,7 +18,7 @@ Dynamic Sampling currently operates on either spans or transactions, based on th Dynamic Sampling occurs at the edge of our ingestion pipeline, precisely in [Relay](https://github.com/getsentry/relay). -When events arrive, in a simplified model, they go through the following steps (some of which won't apply if you self-host Sentry): +When events arrive, in a simplified model, they go through the following steps: 1. **Inbound data filters**: every event runs through inbound data filters as configured in project settings, such as legacy browsers or denied releases. Events dropped here are not counted towards quota and are not included in "total events" data. 2. **Quota enforcement**: Sentry charges for all further events sent in, before they are passed on to dynamic sampling. @@ -65,7 +65,7 @@ For example, if there is a trace from `Project A` to `Project B` and `Project B` Clients have their own [traces sample rate](https://docs.sentry.io/platforms/javascript/tracing/#configure). The client sample rate is a number in the range `[0.0, 1.0]` (from 0% to 100%) that controls **how many events arrive at Sentry**. While documentation will generally suggest a sample rate of `1.0`, for some use cases it might be better to reduce it. -Dynamic Sampling further reduces how many events get stored internally. **While most graphs and numbers in Sentry are based on total events**, accessing spans and tags requires stored events. The sample rates apply on top of each other. +Dynamic Sampling further reduces how many events get stored internally. **While most graphs and numbers in Sentry are based on metrics**, accessing spans and tags requires stored events. The sample rates apply on top of each other. An example of client side sampling and Dynamic Sampling starting from 100k events which results in 15k stored events is shown below: From 3355008839cda789e443f7ccb0a93bd48255d543 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 11:32:03 +0100 Subject: [PATCH 07/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 8134bba62a1da8..0ace935a026fe6 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -16,18 +16,18 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. ### Dynamic Sampling Modes -There are two available modes to govern the target sample rates for Dynamic Sampling: Automatic Mode and Manual Mode. +There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. The [DynamicSamplingMode object](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/types.py#L7-L12) defines the available modes and their string representations to be set in the options: + - **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. + - Automatic mode is called Organization mode internally. If activated, i.e. `sentry:sampling_mode` == `organization`, the **organization** option `sentry:target_sample_rate` defines the organization target sample rate. - **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. + - Manual mode is called Project mode internally. If activated, i.e. `sentry:sampling_mode` == `project`, the **project** option `sentry:target_sample_rate` defines the project target sample rate for each project. -Internally, Automatic Mode is called Organization Mode, while Manual Mode is called Project Mode. The settings around the mode and the sample rates are implemented using organization and project options. The [DynamicSamplingMode object](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/types.py#L7-L12) defines the available modes and their string representations to be set in the options. The dynamic sampling mode is set using the organization option `sentry:sampling_mode`. - -If `sentry:sampling_mode` == `organization`, the **organization** option `sentry:target_sample_rate` defines the organization target sample rate. -If `sentry:sampling_mode` == `project`, the **project** option `sentry:target_sample_rate` defines the project target sample rate for each project. +The dynamic sampling mode is set using the organization option `sentry:sampling_mode`, and all functionality defaults to Automatic Mode if the option is not set. On switching between modes, the current target sample rates are preserved unless changed by the user explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. -The [target sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. +The [sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. ![Concept of Fidelity](./images/fidelityAndPriorities.png) From f3e3ace727000209c46b0d385d9b1de4d4ae9a93 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 12:51:04 +0100 Subject: [PATCH 08/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 0ace935a026fe6..2c1821364d9d23 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -16,14 +16,10 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. ### Dynamic Sampling Modes -There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. The [DynamicSamplingMode object](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/types.py#L7-L12) defines the available modes and their string representations to be set in the options: +There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. The dynamic sampling mode is set using the organization option `sentry:sampling_mode`: -- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. - - Automatic mode is called Organization mode internally. If activated, i.e. `sentry:sampling_mode` == `organization`, the **organization** option `sentry:target_sample_rate` defines the organization target sample rate. -- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. - - Manual mode is called Project mode internally. If activated, i.e. `sentry:sampling_mode` == `project`, the **project** option `sentry:target_sample_rate` defines the project target sample rate for each project. - -The dynamic sampling mode is set using the organization option `sentry:sampling_mode`, and all functionality defaults to Automatic Mode if the option is not set. +- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`. +- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. All functionality defaults to Automatic Mode if the option is not set. On switching between modes, the current target sample rates are preserved unless changed by the user explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. From f44a7071381dcc99b087be7521fe8d04820643fd Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 14:08:59 +0100 Subject: [PATCH 09/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 2c1821364d9d23..463362e7c9f8ae 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -16,12 +16,13 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. ### Dynamic Sampling Modes -There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. The dynamic sampling mode is set using the organization option `sentry:sampling_mode`: +There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. - **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`. -- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. All functionality defaults to Automatic Mode if the option is not set. +- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. +All functionality defaults to Automatic Mode if the option is not set. -On switching between modes, the current target sample rates are preserved unless changed by the user explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. +When the user switches between modes, sample rates are preserved unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. The [sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. From dc5bacfb873bc790a31b40c570d5106270bb4a88 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 14:51:23 +0100 Subject: [PATCH 10/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 463362e7c9f8ae..013550f9d95562 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -20,6 +20,7 @@ There are two available modes to govern the target sample rates for Dynamic Samp - **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`. - **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. + All functionality defaults to Automatic Mode if the option is not set. When the user switches between modes, sample rates are preserved unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. From e442ac353dc3bf906992837fd8cc02dc245849a5 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 15:20:35 +0100 Subject: [PATCH 11/17] typo --- .../dynamic-sampling/fidelity-and-biases.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 013550f9d95562..0caf65c3c2e6ab 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -114,7 +114,7 @@ For prioritizing dev environments, we use a sample rate of `1.0` (100%), which r ### Prioritize Low Volume Transactions This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace. -Prioritization of low volume projects works slightly differently depending on the dynamic sampling mode: +Prioritization of low volume transactions works slightly differently depending on the dynamic sampling mode: - In **Automatic Mode** (`sentry:sampling_mode` == `organization`), the organization target sample rate is used as the base sample rate for the balancing algorithm. - In **Manual Mode** (`sentry:sampling_mode` == `project`), the project target sample rate is used as the base sample rate for the balancing algorithm. From dd30b39532175d0ceef1c8fbb2ee47e372a27243 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 16:33:00 +0100 Subject: [PATCH 12/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 0caf65c3c2e6ab..08cb22bc8a69ac 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -13,12 +13,12 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l ## The Concept of Fidelity -At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all events of an organization. +At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all spans and transactions of an organization. ### Dynamic Sampling Modes -There are two available modes to govern the target sample rates for Dynamic Sampling. The settings around the mode and the sample rates are implemented using organization and project options. +There are two available modes to govern the target sample rates for Dynamic Sampling. The definition of both the mode and the target sample rates are implemented using the organization options `sentry:sampling_mode` and `sentry:target_sample_rate` as well as the project options `sentry:target_sample_rate`. -- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`. +- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`, and project target sample rates are calculated based on the organization target sample rate. - **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. All functionality defaults to Automatic Mode if the option is not set. @@ -81,7 +81,7 @@ An example of how the UI looks is shown in the following screenshot (the content ### Prioritize New Releases -This bias is used to prioritize traces that are coming from a new release. The goal is to increase the sample rate in the time window that occurs between the creation of a release and its adoption by users. _The identification of a new release is done in the `event_manager` defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/event_manager.py#L937-L937)._ +This bias is used to prioritize traces that are coming from a new release. The goal is to increase the sample rate in the time window that occurs between the creation of a release and its adoption by users. _The identification of a new release is done in the `event_manager` defined [here](https://github.com/getsentry/sentry/blob/43d7c41aee2b22ca9f51916afac40f6cbdcd2b15/src/sentry/event_manager.py#L739-L773)._ Since the adoption of a release is not constant, we created a system of _decaying_ rules which can interpolate between two sample rates in a given time window with a given function (e.g. `linear`). The idea being that we want to reduce the sample rate since the amount of samples will increase as the release gets adopted by users. From 1e33a5071d8ddf3434d38dd887d142c7c22f89cb Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 25 Nov 2024 16:47:18 +0100 Subject: [PATCH 13/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 08cb22bc8a69ac..c7f55c93a8fe4c 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -16,14 +16,14 @@ A sample rate is a number in the interval `[0.0, 1.0]` that will determine the l At the core of Dynamic Sampling there is the concept of **fidelity**, which translates to an overall **target sample rate** that should be applied across all spans and transactions of an organization. ### Dynamic Sampling Modes -There are two available modes to govern the target sample rates for Dynamic Sampling. The definition of both the mode and the target sample rates are implemented using the organization options `sentry:sampling_mode` and `sentry:target_sample_rate` as well as the project options `sentry:target_sample_rate`. +There are two available modes to govern the target sample rates for Dynamic Sampling. The definition of both the mode and the target sample rates are implemented using the organization options `sentry:sampling_mode` and `sentry:target_sample_rate` as well as the project option `sentry:target_sample_rate`. -- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Internally, it is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`, and project target sample rates are calculated based on the organization target sample rate. -- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Interally, it is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. +- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Automatic mode is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`, and project target sample rates are calculated based on the organization target sample rate. +- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Manual mode is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. -All functionality defaults to Automatic Mode if the option is not set. +All functionality defaults to Automatic Mode if the option `sentry:sampling_mode` is not set, and all target sample rates default to 1 if the option `sentry:target_sample_rate` is not set. -When the user switches between modes, sample rates are preserved unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. +When the user switches between modes, target sample rates are preserved unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. The [sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. From 503f12bf09411cf02ba90e562c89d20d0818e4a7 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 26 Nov 2024 15:31:02 +0100 Subject: [PATCH 14/17] callout big picture --- .../dynamic-sampling/the-big-picture.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx index d0c0fc5fad1a09..00ba3b12035460 100644 --- a/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/the-big-picture.mdx @@ -9,7 +9,8 @@ sidebar_order: 1 -Dynamic Sampling currently operates on either spans or transactions, based on the feature flag `dynamic-sampling-spans`. The logic between the two event types is similar, so most of this documentation is kept at a generic level and important differences are pointed out explicitly. +Dynamic Sampling currently operates on either spans or transactions to measure data throughput. This is controlled by the feature flag `organizations:dynamic-sampling-spans` and usually set to what the organization's subscription is metered by. In development, this currently defaults to transactions. +The logic between the two data categories is identical, so most of this documentation is kept at a generic level and important differences are pointed out explicitly. From 73d6f8a57aaca36d949f7847b0fb9dbc7873bdde Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 26 Nov 2024 15:54:44 +0100 Subject: [PATCH 15/17] editing --- .../dynamic-sampling/fidelity-and-biases.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index c7f55c93a8fe4c..473df2e0cb2c37 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -18,12 +18,12 @@ At the core of Dynamic Sampling there is the concept of **fidelity**, which tran ### Dynamic Sampling Modes There are two available modes to govern the target sample rates for Dynamic Sampling. The definition of both the mode and the target sample rates are implemented using the organization options `sentry:sampling_mode` and `sentry:target_sample_rate` as well as the project option `sentry:target_sample_rate`. -- **Automatic mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Automatic mode is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`, and project target sample rates are calculated based on the organization target sample rate. -- **Manual mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Manual mode is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. +- **Automatic Mode** dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility. Automatic Mode is active if the organization option `sentry:sampling_mode` is set to `organization`. The target sample rate for the organization is stored in the **organization** option `sentry:target_sample_rate`, and project target sample rates are calculated based on the organization target sample rate. +- **Manual Mode** allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system. Manual Mode is active if the organization option `sentry:sampling_mode` is set to `project`. The target sample rates for projects are stored in the **project** option `sentry:target_sample_rate`. All functionality defaults to Automatic Mode if the option `sentry:sampling_mode` is not set, and all target sample rates default to 1 if the option `sentry:target_sample_rate` is not set. -When the user switches between modes, target sample rates are preserved unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the current target sample rate for the organization is preserved by setting the project options `project:target_sample_rate` to the project target sample rates calculated during automatic mode. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. +When the user switches between modes, target sample rates are transferred unless changed explicitly. For example, if the user switches from Automatic Mode to Manual Mode, the sample rates calculated during Automatic Mode are persisted in the project option `sentry:target_sample_rate`. Conversely, if the user switches from Manual Mode to Automatic Mode, the project target sample rates are recalculated based on the overall organization target sample rate. The [sample rates are periodically recalibrated](https://github.com/getsentry/sentry/blob/9b98be6b97323a78809a829e06dcbef26a16365c/src/sentry/dynamic_sampling/rules/biases/recalibration_bias.py#L11-L44) to ensure that the overall target sample rate is met. This recalibration is done on a project level or organization level, depending on the dynamic sampling mode. Within the target sample rate, Dynamic Sampling **biases towards more meaningful data**. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event. From 1f3e0c846d9bfea4a3b787fb266d3c0496ca3b6b Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 27 Nov 2024 10:47:04 +0100 Subject: [PATCH 16/17] add low volume projects bias & callout for am2 sliding window --- .../dynamic-sampling/fidelity-and-biases.mdx | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 473df2e0cb2c37..562f1fd493b65b 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -29,6 +29,10 @@ The [sample rates are periodically recalibrated](https://github.com/getsentry/se ![Concept of Fidelity](./images/fidelityAndPriorities.png) + +For orgs under AM2, Dynamic sampling uses a [sliding window function](https://github.com/getsentry/sentry/blob/cc8cc38c8a108719d068e5622b24a8d0c744e84c/src/sentry/dynamic_sampling/tasks/sliding_window_org.py#L37-L61) over the incoming volume to calculate the target sample rate. + + ### Approximate Fidelity It is important to note that fidelity only determines an **approximate target sample rate**, so there is flexibility in creating exact sample rates. The ingestion pipeline, composed of [Relay](https://docs.sentry.io/product/relay/) and other components, does not have the infrastructure to track volume, so it cannot create an actual weighted distribution within the target sample rate. @@ -110,12 +114,20 @@ The list of development environments is available [here](https://github.com/gets For prioritizing dev environments, we use a sample rate of `1.0` (100%), which results in all traces being sampled. +### Prioritize Low Volume Projects + +This bias is only active in Automatic Mode (and not in Manual Mode). It applies to any incoming trace and is defined on a per-project basis. + + +Some projects have more data, some have less - this bias ensures that, in Automatic Mode, projects with lower volume are sampled with a higher sample rate. The sample rate of the boost low volume projects bias is computed using an algorithm that leverages a dynamic sample rate obtained by measuring the incoming volume of transactions in a sliding time window, known as the target fidelity rate. This rate is obtained by calling, at fixed intervals, the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)). + +The algorithm used in this bias computes a new sample rate with the goal of prioritizing low-volume projects, which can be drowned out by high-volume projects. The mechanism used for prioritizing is similar to the low-volume transactions bias in which given the sample rate of the organization and the counts of each project, it computes a new sample rate for each project, assuming an ideal distribution of the counts. ### Prioritize Low Volume Transactions This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace. Prioritization of low volume transactions works slightly differently depending on the dynamic sampling mode: -- In **Automatic Mode** (`sentry:sampling_mode` == `organization`), the organization target sample rate is used as the base sample rate for the balancing algorithm. +- In **Automatic Mode** (`sentry:sampling_mode` == `organization`), the output of the [boost_low_volume_projects](https://github.com/getsentry/sentry/blob/dee539472e999bf590cfc4e99b9b12981963defb/src/sentry/dynamic_sampling/tasks/boost_low_volume_transactions.py#L183) task is used as the base sample rate for the balancing algorithm. - In **Manual Mode** (`sentry:sampling_mode` == `project`), the project target sample rate is used as the base sample rate for the balancing algorithm. In order to rebalance transactions, the system retrieves the counts of the transactions for each project and calculates a new sample rate for each transaction. From b082a47bf04fce85bb10bd464964632c0822cf50 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 27 Nov 2024 10:48:34 +0100 Subject: [PATCH 17/17] edit prioritize low volume projects --- .../dynamic-sampling/fidelity-and-biases.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx index 562f1fd493b65b..f2d450eebda67a 100644 --- a/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/fidelity-and-biases.mdx @@ -119,9 +119,9 @@ For prioritizing dev environments, we use a sample rate of `1.0` (100%), which r This bias is only active in Automatic Mode (and not in Manual Mode). It applies to any incoming trace and is defined on a per-project basis. -Some projects have more data, some have less - this bias ensures that, in Automatic Mode, projects with lower volume are sampled with a higher sample rate. The sample rate of the boost low volume projects bias is computed using an algorithm that leverages a dynamic sample rate obtained by measuring the incoming volume of transactions in a sliding time window, known as the target fidelity rate. This rate is obtained by calling, at fixed intervals, the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)). +The algorithm used in this bias computes a new sample rate with the goal of prioritizing low-volume projects, which can be drowned out by high-volume projects. The mechanism used for prioritizing is similar to the low-volume transactions bias in which given the sample rate of the organization and the counts of each project, it computes a new sample rate for each project, assuming an ideal distribution of the counts. The sample rate of the boost low volume projects bias is computed using an algorithm that leverages a dynamic sample rate obtained by measuring the incoming volume of transactions in a sliding time window, known as the target fidelity rate. This rate is obtained by calling, at fixed intervals, the `get_sampling_tier_for_volume` function (defined [here](https://github.com/getsentry/sentry/blob/f3a2220ccd3a2118a1255a4c96a9ec2010dab0d8/src/sentry/quotas/base.py#L481)). + -The algorithm used in this bias computes a new sample rate with the goal of prioritizing low-volume projects, which can be drowned out by high-volume projects. The mechanism used for prioritizing is similar to the low-volume transactions bias in which given the sample rate of the organization and the counts of each project, it computes a new sample rate for each project, assuming an ideal distribution of the counts. ### Prioritize Low Volume Transactions This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace.