From 14bd54ed43c34a928d0e9222d5c6d77e0f771ec5 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 23 Jul 2021 14:25:53 -0700
Subject: [PATCH 01/42] Specify how to propagate head sampling probability

---
 text/trace/0000-sampling-propagation.md | 126 ++++++++++++++++++++++++
 1 file changed, 126 insertions(+)
 create mode 100644 text/trace/0000-sampling-propagation.md

diff --git a/text/trace/0000-sampling-propagation.md b/text/trace/0000-sampling-propagation.md
new file mode 100644
index 000000000..1dbdd8e63
--- /dev/null
+++ b/text/trace/0000-sampling-propagation.md
@@ -0,0 +1,126 @@
+# Propagate head trace sampling probability
+
+Propose extending the W3C trace context `traceparent` to convey head trace sampling probability.
+
+## Motivation
+
+The head trace probability is useful in child contexts to be able to
+record the effective sampling probability in child spans.  This is
+documented in [OTEP 148](TODO: after merging), which establishes
+semantic conventions for conveying the adjusted count of a span via
+attributes recorded with the span.  When a sampling decision is based
+on the parent's context, the effective sampling probability, which
+determines the child's adjusted count, cannot be recorded without
+propagating it through the context.
+
+We propose to propagate the trace sampling probability that is in
+effect whenever the [W3C
+sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag is
+set by extending the `traceparent`.
+
+## Explanation
+
+To limit the cost of this extension, to ensure that it is widely
+supported, and for statistical reasons documented below, we propose to
+limit head tracing probability to powers of two.  This limits the
+available head sampling probabilities to 1/2, 1/4, 1/8, and so on, and
+we can compactly encode these probabilities as small integers using
+the negative base-2 logarithm of the effective probability.
+
+For example, the value 2 corresponds with 1-in-4 sampling, the value
+10 corresponds with 1-in-1024 sampling.
+
+Wheres the [version-0 W3C trace context `traceparent`
+header](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers)
+is a concatenation of four fields, this proposal would upgrade
+`traceparent` to version 1:
+
+```
+traceparent: (version)-(trace_id)-(span_id)-(flags)
+```
+
+The version 1 `traceparent` header will use a new field named `log-count`, i.e.,:
+
+```
+traceparent: (version)-(trace_id)-(span_id)-(flags)-(log-count)
+```
+
+where `log-count` is the encoded negative base-2 logarithm of
+sampling probability, which is the base-2 logarithm of the adjusted
+count of a child span created in this context (i.e., the logarithm of
+the effective count, thus "log-count").  To compute the adjusted count
+of a child span created in this context, use `2^log-count`.  A
+log-count of `0` corresponds with `(2^0)=1`, thus 0 conveys a context
+with probability 1.
+
+The sampling probability of a context is independent from whether it
+is sampled.  We consider it [useful to convey sampling probability
+even when unsampled]() as it can be used to estimate the potential
+overhead of starting new sampled traces.
+
+### Examples
+
+These are extended [from the W3C
+examples](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers):
+
+```
+Value = 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01-05
+base16(version) = 00
+base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
+base16(parent-id) = 00f067aa0ba902b7
+base16(trace-flags) = 01  // sampled
+base16(log-count) = 05  // head probability is 2^-5.
+```
+
+```
+Value = 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-00-00
+base16(version) = 00
+base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
+base16(parent-id) = 00f067aa0ba902b7
+base16(trace-flags) = 00  // not sampled
+base16(log-count) = 10 // head probability is 2^-16
+```
+
+We are able to express sampling probabilities as small as 2^-255 using
+just 3 bytes per `traceparent`.
+
+## Internal details
+
+A use known as "inflationary sampling" from Google's Dapper system is
+documented in [OTEP 148](TODO: inflationary sampling section).  This
+is is used to justify propagating the head sampling probability even
+when unsampled.
+
+[An algorithm for making statistical inferance from partially-sampled
+traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
+explains how to work with power-of-2 sampling rates.  The reasoning
+behind restricting the set of sampling rates is:
+
+- Lowers the cost of propagating head sampling probability
+- Makes math involving partial traces tractable
+
+## Trade-offs and mitigations
+
+Restricting head sampling rates to powers of two does not limit tail
+Samplers from using arbitrary probabilities.
+
+Restricting head sampling rates to powers of two does not limit
+Samplers from using arbitrary effective probabilities over a period of
+time.  For example, choosing 1/2 sampling half of the time and 1/4
+sampling half of the time leads to an effective sampling rate of 3/8.
+
+## Prior art and alternatives
+
+Google's Dapper system propagated a field in its trace context called
+"inverse_probability", which is equivalent to adjusted count.  This
+proposal uses the base-2 logarithm of adjusted count to save space
+
+## Open questions
+
+This OTEP suggests how to modify the W3C trace context to accomodate
+sampling in OpenTelemetry.  [OTEP 148](TODO) suggests semantic
+conventions for encoding adjusted count in a Span, but neither text
+specifies how to modify the built-in Samplers to produce the proposed
+new `traceparent` field so that the `ParentBased` Sampler can
+correctly set the proposed `sampler.adjusted_count` attribute.  This
+will be future work.

From 1d5d60a30b98088be53ab31f2a9115e4298f8e65 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 23 Jul 2021 14:29:14 -0700
Subject: [PATCH 02/42] edit

---
 text/trace/0000-sampling-propagation.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/text/trace/0000-sampling-propagation.md b/text/trace/0000-sampling-propagation.md
index 1dbdd8e63..e649ae22f 100644
--- a/text/trace/0000-sampling-propagation.md
+++ b/text/trace/0000-sampling-propagation.md
@@ -14,9 +14,9 @@ determines the child's adjusted count, cannot be recorded without
 propagating it through the context.
 
 We propose to propagate the trace sampling probability that is in
-effect whenever the [W3C
-sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag is
-set by extending the `traceparent`.
+effect alongside the [W3C
+sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag by
+extending the `traceparent`.
 
 ## Explanation
 
@@ -32,14 +32,14 @@ For example, the value 2 corresponds with 1-in-4 sampling, the value
 
 Wheres the [version-0 W3C trace context `traceparent`
 header](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers)
-is a concatenation of four fields, this proposal would upgrade
-`traceparent` to version 1:
+is a concatenation of four fields,
 
 ```
 traceparent: (version)-(trace_id)-(span_id)-(flags)
 ```
 
-The version 1 `traceparent` header will use a new field named `log-count`, i.e.,:
+This proposal would upgrade `traceparent` to version 1 with a new
+field named `log-count`,
 
 ```
 traceparent: (version)-(trace_id)-(span_id)-(flags)-(log-count)

From c741f7e5993933a641285e1b086987a6c40f3f43 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 23 Jul 2021 14:30:40 -0700
Subject: [PATCH 03/42] version

---
 text/trace/0000-sampling-propagation.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/text/trace/0000-sampling-propagation.md b/text/trace/0000-sampling-propagation.md
index e649ae22f..9707122db 100644
--- a/text/trace/0000-sampling-propagation.md
+++ b/text/trace/0000-sampling-propagation.md
@@ -64,8 +64,8 @@ These are extended [from the W3C
 examples](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers):
 
 ```
-Value = 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01-05
-base16(version) = 00
+Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01-05
+base16(version) = 01
 base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
 base16(parent-id) = 00f067aa0ba902b7
 base16(trace-flags) = 01  // sampled
@@ -73,12 +73,12 @@ base16(log-count) = 05  // head probability is 2^-5.
 ```
 
 ```
-Value = 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-00-00
-base16(version) = 00
+Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-00-00
+base16(version) = 01
 base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
 base16(parent-id) = 00f067aa0ba902b7
 base16(trace-flags) = 00  // not sampled
-base16(log-count) = 10 // head probability is 2^-16
+base16(log-count) = 11 // head probability is 2^-17
 ```
 
 We are able to express sampling probabilities as small as 2^-255 using

From 6adbd1a0e44286cc2df93da5356db5f8998076c2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 23 Jul 2021 14:33:09 -0700
Subject: [PATCH 04/42] links to OTEP 148 are TODOs

---
 text/trace/0000-sampling-propagation.md | 28 ++++++++++++-------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/text/trace/0000-sampling-propagation.md b/text/trace/0000-sampling-propagation.md
index 9707122db..119226e06 100644
--- a/text/trace/0000-sampling-propagation.md
+++ b/text/trace/0000-sampling-propagation.md
@@ -6,12 +6,12 @@ Propose extending the W3C trace context `traceparent` to convey head trace sampl
 
 The head trace probability is useful in child contexts to be able to
 record the effective sampling probability in child spans.  This is
-documented in [OTEP 148](TODO: after merging), which establishes
-semantic conventions for conveying the adjusted count of a span via
-attributes recorded with the span.  When a sampling decision is based
-on the parent's context, the effective sampling probability, which
-determines the child's adjusted count, cannot be recorded without
-propagating it through the context.
+documented in [OTEP 148](TODO), which establishes semantic conventions
+for conveying the adjusted count of a span via attributes recorded
+with the span.  When a sampling decision is based on the parent's
+context, the effective sampling probability, which determines the
+child's adjusted count, cannot be recorded without propagating it
+through the context.
 
 We propose to propagate the trace sampling probability that is in
 effect alongside the [W3C
@@ -86,18 +86,18 @@ just 3 bytes per `traceparent`.
 
 ## Internal details
 
+The reasoning behind restricting the set of sampling rates is that it:
+
+- Lowers the cost of propagating head sampling probability
+- Makes math involving partial traces tractable.
+
 A use known as "inflationary sampling" from Google's Dapper system is
-documented in [OTEP 148](TODO: inflationary sampling section).  This
-is is used to justify propagating the head sampling probability even
-when unsampled.
+documented in [OTEP 148](TODO).  This is is used to justify
+propagating the head sampling probability even when unsampled.
 
 [An algorithm for making statistical inferance from partially-sampled
 traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
-explains how to work with power-of-2 sampling rates.  The reasoning
-behind restricting the set of sampling rates is:
-
-- Lowers the cost of propagating head sampling probability
-- Makes math involving partial traces tractable
+explains how to work with power-of-2 sampling rates.
 
 ## Trade-offs and mitigations
 

From 11206d7e6f0a8ae1eb9083ad044de7539af1dc45 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 13:45:10 -0700
Subject: [PATCH 05/42] rename

---
 ...sampling-propagation.md => 0168-sampling-propagation.md} | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
 rename text/trace/{0000-sampling-propagation.md => 0168-sampling-propagation.md} (96%)

diff --git a/text/trace/0000-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
similarity index 96%
rename from text/trace/0000-sampling-propagation.md
rename to text/trace/0168-sampling-propagation.md
index 119226e06..f706772b0 100644
--- a/text/trace/0000-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -6,7 +6,7 @@ Propose extending the W3C trace context `traceparent` to convey head trace sampl
 
 The head trace probability is useful in child contexts to be able to
 record the effective sampling probability in child spans.  This is
-documented in [OTEP 148](TODO), which establishes semantic conventions
+documented in [OTEP 170](TODO), which establishes semantic conventions
 for conveying the adjusted count of a span via attributes recorded
 with the span.  When a sampling decision is based on the parent's
 context, the effective sampling probability, which determines the
@@ -92,7 +92,7 @@ The reasoning behind restricting the set of sampling rates is that it:
 - Makes math involving partial traces tractable.
 
 A use known as "inflationary sampling" from Google's Dapper system is
-documented in [OTEP 148](TODO).  This is is used to justify
+documented in [OTEP 170](TODO).  This is is used to justify
 propagating the head sampling probability even when unsampled.
 
 [An algorithm for making statistical inferance from partially-sampled
@@ -118,7 +118,7 @@ proposal uses the base-2 logarithm of adjusted count to save space
 ## Open questions
 
 This OTEP suggests how to modify the W3C trace context to accomodate
-sampling in OpenTelemetry.  [OTEP 148](TODO) suggests semantic
+sampling in OpenTelemetry.  [OTEP 170](TODO) suggests semantic
 conventions for encoding adjusted count in a Span, but neither text
 specifies how to modify the built-in Samplers to produce the proposed
 new `traceparent` field so that the `ParentBased` Sampler can

From 408597267bba85dba54d4a2f63e938fa65414594 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 15:28:21 -0700
Subject: [PATCH 06/42] Add a tracestate variation

---
 text/trace/0168-sampling-propagation.md | 122 +++++++++++++++---------
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index f706772b0..278bfa908 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -7,29 +7,37 @@ Propose extending the W3C trace context `traceparent` to convey head trace sampl
 The head trace probability is useful in child contexts to be able to
 record the effective sampling probability in child spans.  This is
 documented in [OTEP 170](TODO), which establishes semantic conventions
-for conveying the adjusted count of a span via attributes recorded
-with the span.  When a sampling decision is based on the parent's
-context, the effective sampling probability, which determines the
-child's adjusted count, cannot be recorded without propagating it
-through the context.
+for conveying the adjusted count of a span via span attributes.  When
+a sampling decision is based on the parent's context, the effective
+sampling probability, which determines the child's adjusted count,
+cannot be recorded without propagating it through the context.
 
 We propose to propagate the trace sampling probability that is in
 effect alongside the [W3C
-sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag by
-extending the `traceparent`.
+sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag
+either by extending the `traceparent` or through the use of
+`tracestate` with an `otel` vendor tag.
 
 ## Explanation
 
-To limit the cost of this extension, to ensure that it is widely
-supported, and for statistical reasons documented below, we propose to
-limit head tracing probability to powers of two.  This limits the
-available head sampling probabilities to 1/2, 1/4, 1/8, and so on, and
-we can compactly encode these probabilities as small integers using
-the negative base-2 logarithm of the effective probability.
+Two variations of this proposal are presented.  The first, based on
+`traceparent`, is the more-ideal choice because it ensures broad
+support and reduces the number of bytes per request.  The second,
+based on a `tracestate` key=value, is less appealing as `tracestate`
+has the appearance of a vendor-specific field, when it is not.
+
+In both cases, to limit the cost of this extension and for statistical
+reasons documented below, we propose to limit head tracing probability
+to powers of two.  This limits the available head sampling
+probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
+these probabilities as small integers using the base-2 logarithm of
+the adjusted count.
 
 For example, the value 2 corresponds with 1-in-4 sampling, the value
 10 corresponds with 1-in-1024 sampling.
 
+### Proposal using `traceparent`
+
 Wheres the [version-0 W3C trace context `traceparent`
 header](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers)
 is a concatenation of four fields,
@@ -39,51 +47,81 @@ traceparent: (version)-(trace_id)-(span_id)-(flags)
 ```
 
 This proposal would upgrade `traceparent` to version 1 with a new
-field named `log-count`,
+field named `log_count`,
 
 ```
-traceparent: (version)-(trace_id)-(span_id)-(flags)-(log-count)
+traceparent: (version)-(trace_id)-(span_id)-(flags)-(log_count)
 ```
 
-where `log-count` is the encoded negative base-2 logarithm of
-sampling probability, which is the base-2 logarithm of the adjusted
-count of a child span created in this context (i.e., the logarithm of
-the effective count, thus "log-count").  To compute the adjusted count
-of a child span created in this context, use `2^log-count`.  A
-log-count of `0` corresponds with `(2^0)=1`, thus 0 conveys a context
-with probability 1.
+where `log_count` is the base-2 logarithm of the adjusted count of a
+child span created in this context (i.e., the logarithm of the
+effective count, thus "log_count").  To compute the adjusted count of
+a child span created in this context, use `2^log_count`.  A log_count
+of `0` corresponds with `(2^0)=1`, thus 0 conveys a context with
+probability 1.
 
 The sampling probability of a context is independent from whether it
-is sampled.  We consider it [useful to convey sampling probability
-even when unsampled]() as it can be used to estimate the potential
-overhead of starting new sampled traces.
+is sampled.  We consider it useful to convey sampling probability even
+when unsampled, as shown by Dapper's "inflationary" sampler.  Note,
+however, that an unsampled trace with probability 1-in-1 is illogical.
+To prevent illogical interpretation and to avoid errors introduced by
+downgrading `traceparent` to the version 0 format, a new flag
+`probabilistic` flag is introduced to indicate when the `log_count`
+field is meaningful.
 
-### Examples
+This flag would use the 2nd available bit in the W3C trace flags
+field (i.e., 0x2).
+
+#### Examples using `traceparent`
 
 These are extended [from the W3C
 examples](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers):
 
 ```
-Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01-05
+Traceparent = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-03-05
 base16(version) = 01
-base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
-base16(parent-id) = 00f067aa0ba902b7
-base16(trace-flags) = 01  // sampled
-base16(log-count) = 05  // head probability is 2^-5.
+base16(trace_id) = 4bf92f3577b34da6a3ce929d0e0e4736
+base16(parent_id) = 00f067aa0ba902b7
+base16(trace_flags) = 03  // sampled, probabilistic
+base16(log_count) = 05  // head probability is 2^-5.
 ```
 
 ```
-Value = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-00-00
+Traceparent = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-02-11
 base16(version) = 01
-base16(trace-id) = 4bf92f3577b34da6a3ce929d0e0e4736
-base16(parent-id) = 00f067aa0ba902b7
-base16(trace-flags) = 00  // not sampled
-base16(log-count) = 11 // head probability is 2^-17
+base16(trace_id) = 4bf92f3577b34da6a3ce929d0e0e4736
+base16(parent_id) = 00f067aa0ba902b7
+base16(trace_flags) = 02  // not sampled, probabilistic
+base16(log_count) = 11 // head probability is 2^-17
 ```
 
 We are able to express sampling probabilities as small as 2^-255 using
 just 3 bytes per `traceparent`.
 
+### Proposal using `tracestate`
+
+The `otel` vendor tag will be used to convey information using the
+`headprob` sub-key with value set to the decimal value of the
+`log_count` field documented above, where `k` represents `1-in-(2^k)`
+head sampling.
+
+#### Examples using `tracestate` 
+
+To convey 1-in-1024 head sampling:
+
+```
+tracestate: otel=headprob:10
+```
+
+To convey 1-in-8 head sampling:
+
+```
+tracestate: otel=headprob:3
+```
+
+This uses around 10x as many bytes per request as the `traceparent`
+proposal (e.g., 29 bytes vs. 3 bytes).
+
 ## Internal details
 
 The reasoning behind restricting the set of sampling rates is that it:
@@ -95,9 +133,9 @@ A use known as "inflationary sampling" from Google's Dapper system is
 documented in [OTEP 170](TODO).  This is is used to justify
 propagating the head sampling probability even when unsampled.
 
-[An algorithm for making statistical inferance from partially-sampled
+[An algorithm for making statistical inference from partially-sampled
 traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
-explains how to work with power-of-2 sampling rates.
+explains how to work with a limited number of power-of-2 sampling rates.
 
 ## Trade-offs and mitigations
 
@@ -117,10 +155,4 @@ proposal uses the base-2 logarithm of adjusted count to save space
 
 ## Open questions
 
-This OTEP suggests how to modify the W3C trace context to accomodate
-sampling in OpenTelemetry.  [OTEP 170](TODO) suggests semantic
-conventions for encoding adjusted count in a Span, but neither text
-specifies how to modify the built-in Samplers to produce the proposed
-new `traceparent` field so that the `ParentBased` Sampler can
-correctly set the proposed `sampler.adjusted_count` attribute.  This
-will be future work.
+Which of these two proposals is better and/or more likely to succeed?

From 5cd3b9ac0d9f5daa7f8504f5ac4a7a55560f2a5d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 28 Jul 2021 14:39:10 -0700
Subject: [PATCH 07/42] redraft using tracestate and two values

---
 text/trace/0168-sampling-propagation.md | 162 +++++++++++-------------
 1 file changed, 76 insertions(+), 86 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 278bfa908..2ae760ee0 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -1,126 +1,116 @@
 # Propagate head trace sampling probability
 
-Propose extending the W3C trace context `traceparent` to convey head trace sampling probability.
+Use the W3C trace context to convey consistent head trace sampling probability.
 
 ## Motivation
 
-The head trace probability is useful in child contexts to be able to
-record the effective sampling probability in child spans.  This is
-documented in [OTEP 170](TODO), which establishes semantic conventions
-for conveying the adjusted count of a span via span attributes.  When
-a sampling decision is based on the parent's context, the effective
-sampling probability, which determines the child's adjusted count,
-cannot be recorded without propagating it through the context.
+The head trace sampling probability is the probability factor
+associated with the start of a tracing context that determines whether
+child contexts are sampled or not.  It is useful to know the head
+trace sampling probability associated with a context in order to build
+span-to-metrics pipelines when the built-in `ParentBased` Sampler is
+used.
+
+A consistent trace sampling decision is one that can be carried out at
+any node in a trace, which supports collecting partial traces.
+OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
+aims to accomplish this goal but was left incomplete (see
+[TODOs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased))in the specification.
 
 We propose to propagate the trace sampling probability that is in
-effect alongside the [W3C
-sampled](https://www.w3.org/TR/trace-context/#sampled-flag) flag
-either by extending the `traceparent` or through the use of
-`tracestate` with an `otel` vendor tag.
+effect alongside the [W3C sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) 
+using `tracestate` with an `otelprob` vendor tag.
 
 ## Explanation
 
-Two variations of this proposal are presented.  The first, based on
-`traceparent`, is the more-ideal choice because it ensures broad
-support and reduces the number of bytes per request.  The second,
-based on a `tracestate` key=value, is less appealing as `tracestate`
-has the appearance of a vendor-specific field, when it is not.
+Two pieces of information are needed to convey consistent head trace
+sampling probability:
 
-In both cases, to limit the cost of this extension and for statistical
-reasons documented below, we propose to limit head tracing probability
-to powers of two.  This limits the available head sampling
-probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
-these probabilities as small integers using the base-2 logarithm of
-the adjusted count.
+1. The head trace sampling probability
+2. Source of consistent sampling decisions.
 
-For example, the value 2 corresponds with 1-in-4 sampling, the value
-10 corresponds with 1-in-1024 sampling.
+This proposal uses one byte of information for each of these.
 
-### Proposal using `traceparent`
+### Probability value
 
-Wheres the [version-0 W3C trace context `traceparent`
-header](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers)
-is a concatenation of four fields,
+To limit the cost of this extension and for statistical reasons
+documented below, we propose to limit head trace sampling probability
+to powers of two.  This limits the available head trace sampling
+probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
+these probabilities as small integer values using the base-2 logarithm
+of the adjusted count (i.e., inverse probability).
 
-```
-traceparent: (version)-(trace_id)-(span_id)-(flags)
-```
+For example, the probability value 2 corresponds with 1-in-4 sampling,
+the probability value 10 corresponds with 1-in-1024 sampling.  Using
+one byte of information we can convey sampling rates as small as 2^-255.
 
-This proposal would upgrade `traceparent` to version 1 with a new
-field named `log_count`,
+### Random value
 
-```
-traceparent: (version)-(trace_id)-(span_id)-(flags)-(log_count)
-```
+With head trace sampling probabilities limited to powers of two, the
+amount of randomness needed per trace context is limited.  A
+consistent sampling decision is accomplished by propagating a
+geometrically distributed random variable with shape parameter `1/2`,
+requiring only two bits of randomness on average per trace.  See
+[Estimation from Partially Sampled Distributed
+Traces](https://arxiv.org/pdf/2107.07703.pdf) section 2.8 for a
+detailed explanation.
 
-where `log_count` is the base-2 logarithm of the adjusted count of a
-child span created in this context (i.e., the logarithm of the
-effective count, thus "log_count").  To compute the adjusted count of
-a child span created in this context, use `2^log_count`.  A log_count
-of `0` corresponds with `(2^0)=1`, thus 0 conveys a context with
-probability 1.
+Such a random variable `r` can be generated using the following
+pseudocode:
 
-The sampling probability of a context is independent from whether it
-is sampled.  We consider it useful to convey sampling probability even
-when unsampled, as shown by Dapper's "inflationary" sampler.  Note,
-however, that an unsampled trace with probability 1-in-1 is illogical.
-To prevent illogical interpretation and to avoid errors introduced by
-downgrading `traceparent` to the version 0 format, a new flag
-`probabilistic` flag is introduced to indicate when the `log_count`
-field is meaningful.
+```
+r := 0
+for {
+  if nextRandomBit() {
+    break // The expected value of r is 2
+  }
+  r++
+}
+```
 
-This flag would use the 2nd available bit in the W3C trace flags
-field (i.e., 0x2).
+This can be computed from a stream of random bits as the number of
+leading zeros using efficient instructions on modern computer
+architectures.
 
-#### Examples using `traceparent`
+For example, the value 3 means there were three leading zeros and
+corresponds with being sampled at probabilities 1-in-1 through 1-in-8
+but not at probabilities 1-in-16 and smaller.  Using one byte of
+information we can convey a consistent sampling decision for sampling
+rates as small as 2^-255.
 
-These are extended [from the W3C
-examples](https://www.w3.org/TR/trace-context/#examples-of-http-traceparent-headers):
+### Proposed `tracestate` syntax
 
-```
-Traceparent = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-03-05
-base16(version) = 01
-base16(trace_id) = 4bf92f3577b34da6a3ce929d0e0e4736
-base16(parent_id) = 00f067aa0ba902b7
-base16(trace_flags) = 03  // sampled, probabilistic
-base16(log_count) = 05  // head probability is 2^-5.
-```
+The consistent sampling decision and head trace sampling probability
+will be propagated using four bytes of base16 content, as follows:
 
 ```
-Traceparent = 01-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-02-11
-base16(version) = 01
-base16(trace_id) = 4bf92f3577b34da6a3ce929d0e0e4736
-base16(parent_id) = 00f067aa0ba902b7
-base16(trace_flags) = 02  // not sampled, probabilistic
-base16(log_count) = 11 // head probability is 2^-17
+tracestate: otelprob=PPRR
 ```
 
-We are able to express sampling probabilities as small as 2^-255 using
-just 3 bytes per `traceparent`.
-
-### Proposal using `tracestate`
-
-The `otel` vendor tag will be used to convey information using the
-`headprob` sub-key with value set to the decimal value of the
-`log_count` field documented above, where `k` represents `1-in-(2^k)`
-head sampling.
+where `PP` are two bytes of base16 probability value and `RR` are two
+bytes of base16 random value.
 
-#### Examples using `tracestate` 
+### Examples
 
-To convey 1-in-1024 head sampling:
+The following `tracestate` value:
 
 ```
-tracestate: otel=headprob:10
+tracestate: otelprob=0a03
 ```
 
-To convey 1-in-8 head sampling:
+translates to
 
 ```
-tracestate: otel=headprob:3
+base16(probability) = 03 // 1-in-8 head probability
+base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
 ```
 
-This uses around 10x as many bytes per request as the `traceparent`
-proposal (e.g., 29 bytes vs. 3 bytes).
+Any `TraceIDRatioBased` Sampler configured with probability 2^-10 or
+greater will enable sampling this trace, whereas any
+`TraceIDRatioBased` Sampler configured with probability 2^-11 or less
+will stop sampling this trace.  The W3C `sampled` flag is set to true
+when the probability value is less than or equal to the randomness
+value.
 
 ## Internal details
 

From 5aedc9c14119d1aebcb4ef869c6194ea377c93c9 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 28 Jul 2021 14:46:21 -0700
Subject: [PATCH 08/42] edits

---
 text/trace/0168-sampling-propagation.md | 35 +++++++++++++------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 2ae760ee0..53b4714b3 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -4,22 +4,22 @@ Use the W3C trace context to convey consistent head trace sampling probability.
 
 ## Motivation
 
-The head trace sampling probability is the probability factor
-associated with the start of a tracing context that determines whether
-child contexts are sampled or not.  It is useful to know the head
-trace sampling probability associated with a context in order to build
-span-to-metrics pipelines when the built-in `ParentBased` Sampler is
-used.
+The head trace sampling probability is the probability associated with
+the start of a trace context that determines whether child contexts
+are sampled or not when using the `ParentBased` Sampler.  It is useful
+to know the head trace sampling probability associated with a context
+in order to build span-to-metrics pipelines when the built-in
+`ParentBased` Sampler is used.
 
 A consistent trace sampling decision is one that can be carried out at
 any node in a trace, which supports collecting partial traces.
 OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
 aims to accomplish this goal but was left incomplete (see
-[TODOs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased))in the specification.
+[TODOs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) in the specification).
 
-We propose to propagate the trace sampling probability that is in
-effect alongside the [W3C sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) 
-using `tracestate` with an `otelprob` vendor tag.
+We propose to propagate the necessary information alongside the [W3C
+sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
+`tracestate` with an `otelprob` vendor tag.
 
 ## Explanation
 
@@ -117,10 +117,11 @@ value.
 The reasoning behind restricting the set of sampling rates is that it:
 
 - Lowers the cost of propagating head sampling probability
+- Limits the number of random bits required
 - Makes math involving partial traces tractable.
 
 A use known as "inflationary sampling" from Google's Dapper system is
-documented in [OTEP 170](TODO).  This is is used to justify
+documented in [OTEP 170](https://github.com/open-telemetry/oteps/pull/170).  This is is used to justify
 propagating the head sampling probability even when unsampled.
 
 [An algorithm for making statistical inference from partially-sampled
@@ -130,7 +131,10 @@ explains how to work with a limited number of power-of-2 sampling rates.
 ## Trade-offs and mitigations
 
 Restricting head sampling rates to powers of two does not limit tail
-Samplers from using arbitrary probabilities.
+Samplers from using arbitrary probabilities.  The
+`sampler.adjusted_count` attribute specified in [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170) is not limited
+to power-of-two values.
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
@@ -141,8 +145,5 @@ sampling half of the time leads to an effective sampling rate of 3/8.
 
 Google's Dapper system propagated a field in its trace context called
 "inverse_probability", which is equivalent to adjusted count.  This
-proposal uses the base-2 logarithm of adjusted count to save space
-
-## Open questions
-
-Which of these two proposals is better and/or more likely to succeed?
+proposal uses the base-2 logarithm of adjusted count to save space and
+limit required randomness.

From 32544ead6e02ebdc836f65f4696aaab11ee8ad69 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 28 Jul 2021 15:01:30 -0700
Subject: [PATCH 09/42] Drop mention of inflationary

---
 text/trace/0168-sampling-propagation.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 53b4714b3..f7ca61e7f 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -120,10 +120,6 @@ The reasoning behind restricting the set of sampling rates is that it:
 - Limits the number of random bits required
 - Makes math involving partial traces tractable.
 
-A use known as "inflationary sampling" from Google's Dapper system is
-documented in [OTEP 170](https://github.com/open-telemetry/oteps/pull/170).  This is is used to justify
-propagating the head sampling probability even when unsampled.
-
 [An algorithm for making statistical inference from partially-sampled
 traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
 explains how to work with a limited number of power-of-2 sampling rates.

From aa226098da02793373896b7224f3113ddd43809b Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 28 Jul 2021 15:30:41 -0700
Subject: [PATCH 10/42] detail about samplers

---
 text/trace/0168-sampling-propagation.md | 36 +++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index f7ca61e7f..5b643d744 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -124,6 +124,42 @@ The reasoning behind restricting the set of sampling rates is that it:
 traces has been published](https://arxiv.org/pdf/2107.07703.pdf) that
 explains how to work with a limited number of power-of-2 sampling rates.
 
+### Behavior of the `TraceIDRatioBased` Sampler
+
+The Sampler must be configured with a power-of-two probability
+`P=2^-S`.
+
+Using one byte to represent both the probability and randomness values
+means each value is limited to 255.  As a special case, the head
+probability `P=0` is represented using the probability value `S=255`,
+meaning `P=0` is indistinguishable from `P=2^-255`.  We propose to
+limit valid head sampling probabilities to `P=2^-254` or greater to
+address this ambiguity.
+
+If the context is a new root, the initial `tracestate` must be created
+using geometrically-distributed random value `R` (as described above,
+with maximum value 254) and the initial head probability `S`.
+
+If the context is not a new root, output a new `tracestate` with the
+same `R` value as the parent context, using the Sampler's own value of
+`S` for the head probability.
+
+In both cases, set the `sampled` bit if `S<=R`.
+
+### Behavior of the `ParentBased` sampler
+
+The `ParentBased` sampler is unmodified by this proposal.  It honors
+the W3C `sampled` flag and copies the incoming `tracestate` keys to
+the child context.
+
+### Behavior of the `AlwaysOn` Sampler
+
+The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with `P=1` (i.e., `S=0`)
+
+### Behavior of the `AlwaysOff` Sampler
+
+The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=255`).
+
 ## Trade-offs and mitigations
 
 Restricting head sampling rates to powers of two does not limit tail

From 73f3b6f5b87f11125248ce36920c533aabb8ef68 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 29 Jul 2021 09:20:15 -0700
Subject: [PATCH 11/42] edit

---
 text/trace/0168-sampling-propagation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 5b643d744..b8242999b 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -138,7 +138,7 @@ address this ambiguity.
 
 If the context is a new root, the initial `tracestate` must be created
 using geometrically-distributed random value `R` (as described above,
-with maximum value 254) and the initial head probability `S`.
+with maximum value 254) and the initial head probability value `S`.
 
 If the context is not a new root, output a new `tracestate` with the
 same `R` value as the parent context, using the Sampler's own value of

From 2fbcb30eede824bd3ca23280830d54044dec2a08 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 10 Aug 2021 16:18:24 -0700
Subject: [PATCH 12/42] change the format to otel=k1:v;k2:v; explain geometric
 distribution

---
 text/trace/0168-sampling-propagation.md | 45 ++++++++++++++++++-------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index b8242999b..e304fd01d 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -14,12 +14,15 @@ in order to build span-to-metrics pipelines when the built-in
 A consistent trace sampling decision is one that can be carried out at
 any node in a trace, which supports collecting partial traces.
 OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
-aims to accomplish this goal but was left incomplete (see
-[TODOs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) in the specification).
+aims to accomplish this goal but was left incomplete (see a
+[TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) 
+in the specification).
 
 We propose to propagate the necessary information alongside the [W3C
 sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
-`tracestate` with an `otelprob` vendor tag.
+`tracestate` with an `otel` vendor tag, which will require
+(separately) [specifying how the OpenTelemetry project uses
+`tracestate` itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
 
 ## Explanation
 
@@ -29,7 +32,9 @@ sampling probability:
 1. The head trace sampling probability
 2. Source of consistent sampling decisions.
 
-This proposal uses one byte of information for each of these.
+This proposal uses 6 bits of information for each of these and does
+not depend on built-in TraceID randomness, which is not sufficiently
+specified for probability sampling at this time.
 
 ### Probability value
 
@@ -42,16 +47,29 @@ of the adjusted count (i.e., inverse probability).
 
 For example, the probability value 2 corresponds with 1-in-4 sampling,
 the probability value 10 corresponds with 1-in-1024 sampling.  Using
-one byte of information we can convey sampling rates as small as 2^-255.
+six bits of information we can convey sampling rates as small as
+2^-62.  The value 63 is reserved to mean sampling with probability 0,
+which conveys an adjusted count of 0 for the associated context.
 
 ### Random value
 
 With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
-consistent sampling decision is accomplished by propagating a
-geometrically distributed random variable with shape parameter `1/2`,
-requiring only two bits of randomness on average per trace.  See
-[Estimation from Partially Sampled Distributed
+consistent sampling decision is accomplished by propagating a specific
+random variable.  The random variable is a described by a discrete
+geometric distribution having shape parameter `1/2`, listed below:
+
+| Value | Probability |
+| ----- | ----------- |
+| 0 | 1/2 |
+| 1 | 1/4 |
+| 2 | 1/8 |
+| 3 | 1/16 |
+| 4 | 1/32 |
+| ... | ... |
+| N | 1/(2^(N+1)) |
+
+See [Estimation from Partially Sampled Distributed
 Traces](https://arxiv.org/pdf/2107.07703.pdf) section 2.8 for a
 detailed explanation.
 
@@ -74,9 +92,10 @@ architectures.
 
 For example, the value 3 means there were three leading zeros and
 corresponds with being sampled at probabilities 1-in-1 through 1-in-8
-but not at probabilities 1-in-16 and smaller.  Using one byte of
+but not at probabilities 1-in-16 and smaller.  Using one six bits of
 information we can convey a consistent sampling decision for sampling
-rates as small as 2^-255.
+rates as small as 2^-62.  The value 63 is reserved to mean 0
+probability.
 
 ### Proposed `tracestate` syntax
 
@@ -84,7 +103,7 @@ The consistent sampling decision and head trace sampling probability
 will be propagated using four bytes of base16 content, as follows:
 
 ```
-tracestate: otelprob=PPRR
+tracestate: otel=p:PP;r:RR
 ```
 
 where `PP` are two bytes of base16 probability value and `RR` are two
@@ -95,7 +114,7 @@ bytes of base16 random value.
 The following `tracestate` value:
 
 ```
-tracestate: otelprob=0a03
+tracestate: otel=r:0a;p:03
 ```
 
 translates to

From 695025c98ee58b0b72e80aa35fbc87fd78be9f19 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 20 Aug 2021 15:59:47 -0700
Subject: [PATCH 13/42] followup from feedback and this week's SIG

---
 text/trace/0168-sampling-propagation.md | 148 +++++++++++++++++-------
 1 file changed, 105 insertions(+), 43 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index e304fd01d..ad87529c4 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -5,8 +5,9 @@ Use the W3C trace context to convey consistent head trace sampling probability.
 ## Motivation
 
 The head trace sampling probability is the probability associated with
-the start of a trace context that determines whether child contexts
-are sampled or not when using the `ParentBased` Sampler.  It is useful
+the start of a trace context that was used to determine whether the
+W3C `sampled` flag is set, which determines whether child contexts
+will be  sampled by a `ParentBased` Sampler.  It is useful
 to know the head trace sampling probability associated with a context
 in order to build span-to-metrics pipelines when the built-in
 `ParentBased` Sampler is used.
@@ -34,7 +35,8 @@ sampling probability:
 
 This proposal uses 6 bits of information for each of these and does
 not depend on built-in TraceID randomness, which is not sufficiently
-specified for probability sampling at this time.
+specified for probability sampling at this time.  This proposal closely 
+follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).
 
 ### Probability value
 
@@ -51,38 +53,64 @@ six bits of information we can convey sampling rates as small as
 2^-62.  The value 63 is reserved to mean sampling with probability 0,
 which conveys an adjusted count of 0 for the associated context.
 
-### Random value
+When propagated the probability value will be interpreted as shown in
+the following table:
+
+| Probability Value | Head Probability |
+| ----- | ----------- |
+| 0 | 1 |
+| 1 | 1/2 |
+| 2 | 1/4 |
+| 3 | 1/8 |
+| ... | ... |
+| N | 1/(2^N) |
+| ... | ... |
+| 63 | 0 |
+
+### Randomness value
 
 With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
 consistent sampling decision is accomplished by propagating a specific
-random variable.  The random variable is a described by a discrete
-geometric distribution having shape parameter `1/2`, listed below:
+random variable denoted `R`.  The random variable is a described by a
+discrete geometric distribution having shape parameter `1/2`, listed
+below:
 
-| Value | Probability |
-| ----- | ----------- |
+| `R` Value | Selection Probability |
+| ---------------- | --------------------- |
 | 0 | 1/2 |
 | 1 | 1/4 |
 | 2 | 1/8 |
 | 3 | 1/16 |
 | 4 | 1/32 |
 | ... | ... |
-| N | 1/(2^(N+1)) |
-
-See [Estimation from Partially Sampled Distributed
-Traces](https://arxiv.org/pdf/2107.07703.pdf) section 2.8 for a
-detailed explanation.
-
-Such a random variable `r` can be generated using the following
-pseudocode:
-
-```
-r := 0
-for {
-  if nextRandomBit() {
-    break // The expected value of r is 2
+| 0 <= `R` <= 62 | 1/(2^(`R`+1)) |
+| ... | ... |
+| 62 | 2^-63 |
+| `R` >= 63 | Reject |
+
+Such a random variable `R` can be generated using the following
+pseudocode.  Note there is a tiny probability that the code has to
+reject the calculated result and start over, since the value 63 is
+defined to have adjusted count 0, not 2^63.
+
+```golang
+func nextRandomness() int {
+  // Repeat until a valid result is produced.
+  for {
+    R := 0
+    for {
+      if nextRandomBit() {
+        break
+      }
+      R++
+    }
+    // The expected value of R is 2.
+	if R < 63 {
+	  return R
+    }
+	// Reject, try again.
   }
-  r++
 }
 ```
 
@@ -94,8 +122,7 @@ For example, the value 3 means there were three leading zeros and
 corresponds with being sampled at probabilities 1-in-1 through 1-in-8
 but not at probabilities 1-in-16 and smaller.  Using one six bits of
 information we can convey a consistent sampling decision for sampling
-rates as small as 2^-62.  The value 63 is reserved to mean 0
-probability.
+rates as small as 2^-62.
 
 ### Proposed `tracestate` syntax
 
@@ -107,7 +134,14 @@ tracestate: otel=p:PP;r:RR
 ```
 
 where `PP` are two bytes of base16 probability value and `RR` are two
-bytes of base16 random value.
+bytes of base16 random value.  These values are omitted when they are
+unknown.
+
+This proposal should be taken as a recommendation and will be modified
+to [match whatever format OpenTelemtry specifies for its
+`tracestate`](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
+The choice of base16 encoding is therefore just a recommendation,
+chosen because `traceparent` uses base16 encoding.
 
 ### Examples
 
@@ -137,6 +171,7 @@ The reasoning behind restricting the set of sampling rates is that it:
 
 - Lowers the cost of propagating head sampling probability
 - Limits the number of random bits required
+- Avoids floating-point to integer rounding errors
 - Makes math involving partial traces tractable.
 
 [An algorithm for making statistical inference from partially-sampled
@@ -146,24 +181,20 @@ explains how to work with a limited number of power-of-2 sampling rates.
 ### Behavior of the `TraceIDRatioBased` Sampler
 
 The Sampler must be configured with a power-of-two probability
-`P=2^-S`.
-
-Using one byte to represent both the probability and randomness values
-means each value is limited to 255.  As a special case, the head
-probability `P=0` is represented using the probability value `S=255`,
-meaning `P=0` is indistinguishable from `P=2^-255`.  We propose to
-limit valid head sampling probabilities to `P=2^-254` or greater to
-address this ambiguity.
+`P=2^-S` except for the special case of `P=0`, which is handled
+specially.
 
 If the context is a new root, the initial `tracestate` must be created
 using geometrically-distributed random value `R` (as described above,
-with maximum value 254) and the initial head probability value `S`.
+with maximum value 62) and the initial head probability value `S`.  If
+the `P=0` use `S=63`, the specified value for zero.
 
 If the context is not a new root, output a new `tracestate` with the
-same `R` value as the parent context, using the Sampler's own value of
-`S` for the head probability.
+same `R` value as the parent context, and this Sampler's value of `S`
+for the outgoing context's probability value (i.e., as the value for
+`P`).
 
-In both cases, set the `sampled` bit if `S<=R`.
+In both cases, set the `sampled` bit if `S<=R` and `S<63`.
 
 ### Behavior of the `ParentBased` sampler
 
@@ -177,15 +208,46 @@ The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with `P=1` (i.e.,
 
 ### Behavior of the `AlwaysOff` Sampler
 
-The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=255`).
+The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=63`).
+
+## Prototype
+
+[This proposal has been prototyped in the OTel-Go
+SDK.](https://github.com/open-telemetry/opentelemetry-go/pull/2177) No
+changes in the OTel-Go Tracing SDK's `Sampler` or `tracestate` APIs
+were needed.
 
 ## Trade-offs and mitigations
 
+### Not using TraceID randomness
+
+It would be possible, if TraceID were specified to have at least 62
+uniform random bits, to compute the randomness value described above
+as the number of leading zeros among those 62 random bits.
+
+This proposal requires modifying the W3C traceparent specification,
+therefore we do not propose to use bits of the TraceID.
+
+### Not using TraceID hashing
+
+It would be possible to make a consistent sampling decision by hashing
+the TraceID, but we feel such an approach is not sufficient for making
+unbiased sampling decisions.  It is seen as a relatively difficult
+task to define and specify a good enough hashing function, much less
+to have it implemented in multiple languages.
+
+Hashing is also computationally expensive. This proposal uses extra
+data to avoid the computational cost of hashing TraceIDs.
+
+### Restriction to power-of-two 
+
 Restricting head sampling rates to powers of two does not limit tail
-Samplers from using arbitrary probabilities.  The
-`sampler.adjusted_count` attribute specified in [OTEP
-170](https://github.com/open-telemetry/oteps/pull/170) is not limited
-to power-of-two values.
+Samplers from using arbitrary probabilities.  The companion [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170) has discussed
+the use of a `sampler.adjusted_count` attribute that would not be
+limited to power-of-two values.  Discussion about how to represent the
+effective adjusted count for tail-sampled Spans belongs in [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170), not this OTEP.
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of

From fb75d9cd4c441f1dd2903c6489a4dbb52bd3b8ef Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 20 Aug 2021 16:07:57 -0700
Subject: [PATCH 14/42] edits

---
 text/trace/0168-sampling-propagation.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index ad87529c4..9071780ef 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -7,10 +7,12 @@ Use the W3C trace context to convey consistent head trace sampling probability.
 The head trace sampling probability is the probability associated with
 the start of a trace context that was used to determine whether the
 W3C `sampled` flag is set, which determines whether child contexts
-will be  sampled by a `ParentBased` Sampler.  It is useful
-to know the head trace sampling probability associated with a context
-in order to build span-to-metrics pipelines when the built-in
-`ParentBased` Sampler is used.
+will be sampled by a `ParentBased` Sampler.  It is useful to know the
+head trace sampling probability associated with a context in order to
+build span-to-metrics pipelines when the built-in `ParentBased`
+Sampler is used.  Further motivation for supporting span-to-metrics
+pipelines is presented in [OTEP
+170](https://github.com/open-telemetry/oteps/pull/170).
 
 A consistent trace sampling decision is one that can be carried out at
 any node in a trace, which supports collecting partial traces.
@@ -120,7 +122,7 @@ architectures.
 
 For example, the value 3 means there were three leading zeros and
 corresponds with being sampled at probabilities 1-in-1 through 1-in-8
-but not at probabilities 1-in-16 and smaller.  Using one six bits of
+but not at probabilities 1-in-16 and smaller.  Using six bits of
 information we can convey a consistent sampling decision for sampling
 rates as small as 2^-62.
 
@@ -228,6 +230,8 @@ as the number of leading zeros among those 62 random bits.
 This proposal requires modifying the W3C traceparent specification,
 therefore we do not propose to use bits of the TraceID.
 
+[This issue has been filed with the W3C trace context group.](https://github.com/w3c/trace-context/issues/463)
+
 ### Not using TraceID hashing
 
 It would be possible to make a consistent sampling decision by hashing

From 8f7ad734cd7f30459977bb33102920404748eb9d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 23 Aug 2021 13:21:52 -0700
Subject: [PATCH 15/42] Let 2^61 be the min probability; leaves one unused
 value to represent unknown in the span data

---
 text/trace/0168-sampling-propagation.md | 32 +++++++++++++++----------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 9071780ef..c4e90f6c8 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -52,7 +52,7 @@ of the adjusted count (i.e., inverse probability).
 For example, the probability value 2 corresponds with 1-in-4 sampling,
 the probability value 10 corresponds with 1-in-1024 sampling.  Using
 six bits of information we can convey sampling rates as small as
-2^-62.  The value 63 is reserved to mean sampling with probability 0,
+2^-61.  The value 62 is reserved to mean sampling with probability 0,
 which conveys an adjusted count of 0 for the associated context.
 
 When propagated the probability value will be interpreted as shown in
@@ -65,9 +65,18 @@ the following table:
 | 2 | 1/4 |
 | 3 | 1/8 |
 | ... | ... |
-| N | 1/(2^N) |
+| N | 2^-N |
 | ... | ... |
-| 63 | 0 |
+| 61 | 2^-61 |
+| 62 | 0 |
+| 63 | _Reserved_ |
+
+The value 63 is reserved for use in encoding adjusted count in Span
+data.  [Described in OTEP
+170](https://github.com/open-telemetry/oteps/pull/170), Span data
+would encode the probability value described here offset by +1, when
+the adjusted count is known, and would encode 0 when the adjusted
+count is unknown.
 
 ### Randomness value
 
@@ -93,8 +102,8 @@ below:
 
 Such a random variable `R` can be generated using the following
 pseudocode.  Note there is a tiny probability that the code has to
-reject the calculated result and start over, since the value 63 is
-defined to have adjusted count 0, not 2^63.
+reject the calculated result and start over, since the value 62 is
+defined to have adjusted count 0, not 2^62.
 
 ```golang
 func nextRandomness() int {
@@ -122,9 +131,7 @@ architectures.
 
 For example, the value 3 means there were three leading zeros and
 corresponds with being sampled at probabilities 1-in-1 through 1-in-8
-but not at probabilities 1-in-16 and smaller.  Using six bits of
-information we can convey a consistent sampling decision for sampling
-rates as small as 2^-62.
+but not at probabilities 1-in-16 and smaller.
 
 ### Proposed `tracestate` syntax
 
@@ -188,15 +195,16 @@ specially.
 
 If the context is a new root, the initial `tracestate` must be created
 using geometrically-distributed random value `R` (as described above,
-with maximum value 62) and the initial head probability value `S`.  If
-the `P=0` use `S=63`, the specified value for zero.
+with maximum value 61) and the initial head probability value `S`.  If
+the head probability is zero (i.e., `P=0`) use `S=62`, the specified
+value for zero probability.
 
 If the context is not a new root, output a new `tracestate` with the
 same `R` value as the parent context, and this Sampler's value of `S`
 for the outgoing context's probability value (i.e., as the value for
 `P`).
 
-In both cases, set the `sampled` bit if `S<=R` and `S<63`.
+In both cases, set the `sampled` bit if `S<=R` and `S<62`.
 
 ### Behavior of the `ParentBased` sampler
 
@@ -210,7 +218,7 @@ The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with `P=1` (i.e.,
 
 ### Behavior of the `AlwaysOff` Sampler
 
-The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=63`).
+The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=62`).
 
 ## Prototype
 

From 765bd120e36ba987b98abbd2bf509d9cb0c1f877 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 2 Sep 2021 17:22:28 -0700
Subject: [PATCH 16/42] worked example (draft)

---
 text/trace/0168-sampling-propagation.md | 89 ++++++++++++++++++-------
 1 file changed, 66 insertions(+), 23 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index c4e90f6c8..b55eb276d 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -52,27 +52,26 @@ of the adjusted count (i.e., inverse probability).
 For example, the probability value 2 corresponds with 1-in-4 sampling,
 the probability value 10 corresponds with 1-in-1024 sampling.  Using
 six bits of information we can convey sampling rates as small as
-2^-61.  The value 62 is reserved to mean sampling with probability 0,
+2**-61.  The value 62 is reserved to mean sampling with probability 0,
 which conveys an adjusted count of 0 for the associated context.
 
 When propagated the probability value will be interpreted as shown in
-the following table:
+the following table, which uses an offset of +1:
 
 | Probability Value | Head Probability |
 | ----- | ----------- |
-| 0 | 1 |
-| 1 | 1/2 |
-| 2 | 1/4 |
-| 3 | 1/8 |
+| 0 | Unknown |
+| 1 | 1 |
+| 2 | 1/2 |
+| 3 | 1/4 |
 | ... | ... |
-| N | 2^-N |
+| N | 2**(-N+1) |
 | ... | ... |
-| 61 | 2^-61 |
-| 62 | 0 |
-| 63 | _Reserved_ |
+| 61 | 2**-60 |
+| 62 | 2**-61 |
+| 63 | 0 |
 
-The value 63 is reserved for use in encoding adjusted count in Span
-data.  [Described in OTEP
+[Described in OTEP
 170](https://github.com/open-telemetry/oteps/pull/170), Span data
 would encode the probability value described here offset by +1, when
 the adjusted count is known, and would encode 0 when the adjusted
@@ -93,17 +92,18 @@ below:
 | 1 | 1/4 |
 | 2 | 1/8 |
 | 3 | 1/16 |
-| 4 | 1/32 |
 | ... | ... |
-| 0 <= `R` <= 62 | 1/(2^(`R`+1)) |
+| 0 <= `R` <= 61 | 1/(2**(-`R`+1)) |
 | ... | ... |
-| 62 | 2^-63 |
-| `R` >= 63 | Reject |
+| 60 | 2**-61 |
+| 61 | 2**-62 |
+| 62 | 2**-62 |
+| 63 | 0 |
 
 Such a random variable `R` can be generated using the following
 pseudocode.  Note there is a tiny probability that the code has to
 reject the calculated result and start over, since the value 62 is
-defined to have adjusted count 0, not 2^62.
+defined to have adjusted count 0, not 2**62.
 
 ```golang
 func nextRandomness() int {
@@ -167,9 +167,9 @@ base16(probability) = 03 // 1-in-8 head probability
 base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
 ```
 
-Any `TraceIDRatioBased` Sampler configured with probability 2^-10 or
+Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
 greater will enable sampling this trace, whereas any
-`TraceIDRatioBased` Sampler configured with probability 2^-11 or less
+`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
 will stop sampling this trace.  The W3C `sampled` flag is set to true
 when the probability value is less than or equal to the randomness
 value.
@@ -190,21 +190,21 @@ explains how to work with a limited number of power-of-2 sampling rates.
 ### Behavior of the `TraceIDRatioBased` Sampler
 
 The Sampler must be configured with a power-of-two probability
-`P=2^-S` except for the special case of `P=0`, which is handled
+`2**-S` except for the special case of zero probability, which is handled
 specially.
 
 If the context is a new root, the initial `tracestate` must be created
 using geometrically-distributed random value `R` (as described above,
 with maximum value 61) and the initial head probability value `S`.  If
-the head probability is zero (i.e., `P=0`) use `S=62`, the specified
-value for zero probability.
+the head probability is zero use `S=63`, the specified value for zero
+probability.
 
 If the context is not a new root, output a new `tracestate` with the
 same `R` value as the parent context, and this Sampler's value of `S`
 for the outgoing context's probability value (i.e., as the value for
 `P`).
 
-In both cases, set the `sampled` bit if `S<=R` and `S<62`.
+In both cases, set the `sampled` bit if `S<=R` and `S<63`.
 
 ### Behavior of the `ParentBased` sampler
 
@@ -220,6 +220,49 @@ The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with `P=1` (i.e.,
 
 The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=62`).
 
+## Worked example
+
+The behavior of these tables can be verified by hand using a smaller
+example.  The following table shows how these equations work where
+`R`, `P`, and `S` are limited to 3 bits.
+
+Values of `P`, which have the same encoded value and interpretation as
+for the proposed `log_head_adjusted_count` field of OTEP 170, would be
+interpreted as follows:
+
+| `P` value | Adjusted count |
+| -----     | -----          |
+| 0         | Unknown        |
+| 1         | 1              |
+| 2         | 2              |
+| 3         | 4              |
+| 4         | 8              |
+| 5         | 16             |
+| 6         | 32             |
+| 7         | 0              |
+
+Note there are only 6 non-zero, non-unknown values for the adjusted
+count. Thus there are six defined values of `R` and `S`.  The
+following table shows `R` and the corresponding selection probability,
+along with the calculated adjusted count for each `S`:
+
+| `R` value | `R` selection probability | `S=0` | `S=1` | `S=2` | `S=4` | `S=5` | `S=6` |
+| --        | --                        | --    | --    | --    | --    | --    | --    |
+| 0         | 1/2                       | 1     | 0     | 0     | 0     | 0     | 0     |
+| 1         | 1/4                       | 1     | 2     | 0     | 0     | 0     | 0     |
+| 2         | 1/8                       | 1     | 2     | 4     | 0     | 0     | 0     |
+| 3         | 1/16                      | 1     | 2     | 4     | 8     | 0     | 0     |
+| 4         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 0     |
+| 5         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 32    |
+
+Notice that the sum of `R` selection probability times adjusted count
+in each of the `S=*` columns equals 1.  For example, in the `S=5`
+column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
+16/32 + 16/32 = 1`.  In the `S=2` column we have `0*1/2 + 0*1/4 +
+4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
+1/4 + 1/8 + 1/8 = 1`.  We conclude that when `R` is chosen with the
+given probabilities, any choice of `S` produces one expected span.
+
 ## Prototype
 
 [This proposal has been prototyped in the OTel-Go

From 56910bdf9437885b860e5273c1848492694da9a4 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 7 Sep 2021 23:57:34 -0700
Subject: [PATCH 17/42] corner cases

---
 text/trace/0168-sampling-propagation.md | 217 +++++++++++++++---------
 1 file changed, 136 insertions(+), 81 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index b55eb276d..aa5fc6e28 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -19,7 +19,7 @@ any node in a trace, which supports collecting partial traces.
 OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
 aims to accomplish this goal but was left incomplete (see a
 [TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) 
-in the specification).
+in the v1.0 Trace specification).
 
 We propose to propagate the necessary information alongside the [W3C
 sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
@@ -47,86 +47,78 @@ documented below, we propose to limit head trace sampling probability
 to powers of two.  This limits the available head trace sampling
 probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
 these probabilities as small integer values using the base-2 logarithm
-of the adjusted count (i.e., inverse probability).
+of the adjusted count.
 
 For example, the probability value 2 corresponds with 1-in-4 sampling,
-the probability value 10 corresponds with 1-in-1024 sampling.  Using
+and the probability value 10 corresponds with 1-in-1024 sampling.  Using
 six bits of information we can convey sampling rates as small as
-2**-61.  The value 62 is reserved to mean sampling with probability 0,
+2**-61.  The value 63 is reserved to mean sampling with probability 0,
 which conveys an adjusted count of 0 for the associated context.
 
-When propagated the probability value will be interpreted as shown in
-the following table, which uses an offset of +1:
-
-| Probability Value | Head Probability |
-| ----- | ----------- |
-| 0 | Unknown |
-| 1 | 1 |
-| 2 | 1/2 |
-| 3 | 1/4 |
-| ... | ... |
-| N | 2**(-N+1) |
-| ... | ... |
-| 61 | 2**-60 |
-| 62 | 2**-61 |
-| 63 | 0 |
+When propagated, the probability value will be interpreted as shown in
+the folowing table, which uses an offset of +1:
+
+| Probability Value | Head Probability | Note                   |
+| -----             | -----------      | ----                   |
+| 0                 | Unknown          | Reserved for span data |
+| 1                 | 1                |                        |
+| 2                 | 1/2              |                        |
+| 3                 | 1/4              |                        |
+| ...               | ...              |                        |
+| N                 | 2**(-N+1)        | 1 in 2**(N-1)          |
+| ...               | ...              |                        |
+| 61                | 2**-60           |                        |
+| 62                | 2**-61           |                        |
+| 63                | 0                | Maximum encoded value  |
 
 [Described in OTEP
 170](https://github.com/open-telemetry/oteps/pull/170), Span data
-would encode the probability value described here offset by +1, when
-the adjusted count is known, and would encode 0 when the adjusted
-count is unknown.
+sampled by the `ParentBased` sampler will encode the value that was
+propagated by the parent span as its "probability value" `p`.
+
+The value `p=0` MAY be propagated using `tracestate` explicitly, although
+equivalent interpretation can be obtained by omitting `p`, since that
+is the default.
 
 ### Randomness value
 
 With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
 consistent sampling decision is accomplished by propagating a specific
-random variable denoted `R`.  The random variable is a described by a
+random variable denoted `r`.  The random variable is a described by a
 discrete geometric distribution having shape parameter `1/2`, listed
 below:
 
-| `R` Value | Selection Probability |
+| `r` Value | Selection Probability |
 | ---------------- | --------------------- |
 | 0 | 1/2 |
 | 1 | 1/4 |
 | 2 | 1/8 |
 | 3 | 1/16 |
 | ... | ... |
-| 0 <= `R` <= 61 | 1/(2**(-`R`+1)) |
+| 0 <= `r` <= 61 | 1/(2**(-`r`+1)) |
 | ... | ... |
 | 60 | 2**-61 |
-| 61 | 2**-62 |
-| 62 | 2**-62 |
-| 63 | 0 |
+| 61 | 2**-61 |
 
-Such a random variable `R` can be generated using the following
-pseudocode.  Note there is a tiny probability that the code has to
-reject the calculated result and start over, since the value 62 is
-defined to have adjusted count 0, not 2**62.
+Such a random variable `r` can be generated using the following
+pseudocode.
 
 ```golang
 func nextRandomness() int {
   // Repeat until a valid result is produced.
   for {
-    R := 0
-    for {
-      if nextRandomBit() {
-        break
-      }
-      R++
-    }
-    // The expected value of R is 2.
-	if R < 63 {
-	  return R
+    r := 0
+    for r < 61 && nextRandomBit() == false {
+      r++
     }
-	// Reject, try again.
+    return R
   }
 }
 ```
 
 This can be computed from a stream of random bits as the number of
-leading zeros using efficient instructions on modern computer
+leadieng zeros using efficient instructions on modern computer
 architectures.
 
 For example, the value 3 means there were three leading zeros and
@@ -135,8 +127,9 @@ but not at probabilities 1-in-16 and smaller.
 
 ### Proposed `tracestate` syntax
 
-The consistent sampling decision and head trace sampling probability
-will be propagated using four bytes of base16 content, as follows:
+The consistent sampling randomness valuw (`r`) and and head sampling
+probability value (`p`) will be propagated using two bytes of base16 content
+for each of the two fields, as follows,
 
 ```
 tracestate: otel=p:PP;r:RR
@@ -170,9 +163,7 @@ base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
 Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
 greater will enable sampling this trace, whereas any
 `TraceIDRatioBased` Sampler configured with probability 2**-11 or less
-will stop sampling this trace.  The W3C `sampled` flag is set to true
-when the probability value is less than or equal to the randomness
-value.
+will stop sampling this trace.
 
 ## Internal details
 
@@ -189,48 +180,64 @@ explains how to work with a limited number of power-of-2 sampling rates.
 
 ### Behavior of the `TraceIDRatioBased` Sampler
 
-The Sampler must be configured with a power-of-two probability
-`2**-S` except for the special case of zero probability, which is handled
-specially.
+The Sampler MUST be configured with a power-of-two probability
+expressed as `2**-s` except for the special case of zero probability,
+which is handled specially.
 
 If the context is a new root, the initial `tracestate` must be created
-using geometrically-distributed random value `R` (as described above,
-with maximum value 61) and the initial head probability value `S`.  If
-the head probability is zero use `S=63`, the specified value for zero
-probability.
+with randomness value `r` (as described above, in the range [0, 61]),
+and the initial head probability value `p` set to the initial value of
+`s` plus 1 (in the range [1, 63].  If the head probability is zero use
+`p=63`, the specified value for zero probability.
 
 If the context is not a new root, output a new `tracestate` with the
-same `R` value as the parent context, and this Sampler's value of `S`
-for the outgoing context's probability value (i.e., as the value for
-`P`).
+same `r` value as the parent context, and 1 plus this Sampler's value
+of `s` for the outgoing context's `p` value.  The incoming context's
+`p` is not used.
 
-In both cases, set the `sampled` bit if `S<=R` and `S<63`.
+If the context is not a new root and the incoming context's `r` value
+is not set, the implementation SHOULD notify the user of an error
+condition and follow the incoming context's `sampled` flag.
+
+The span's `log_head_adjusted_count` field is set to the outgoing `p`.
+
+In both cases, set the `sampled` bit if the outgoing `p` minus 1 is
+less than the outgoing `r` minus 1 and `p` is less than 63, i.e.,
+sampled implies `p-1 < r+1` and `p < 63`.
 
 ### Behavior of the `ParentBased` sampler
 
-The `ParentBased` sampler is unmodified by this proposal.  It honors
+The `ParentBased` sampler is modified by this proposal.  It honors
 the W3C `sampled` flag and copies the incoming `tracestate` keys to
 the child context.
 
+The span's `log_head_adjusted_count` field is set to the incoming
+value of `p` when both `p` and `r` is defined.  When `r` is not
+defined the span's `log_head_adjusted_count` MUST be set to 0
+indicating unknown probability, because the decision cannot be made
+consistently across the trace.
+
 ### Behavior of the `AlwaysOn` Sampler
 
-The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with `P=1` (i.e., `S=0`)
+The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with
+100% sampling probability (i.e., `s=0` yielding `p=1`).
 
 ### Behavior of the `AlwaysOff` Sampler
 
-The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with `P=0` (i.e., `S=62`).
+The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with
+zero probability (i.e., `p=63`, `s` undefined).
 
-## Worked example
+## Worked 3-bit example
 
 The behavior of these tables can be verified by hand using a smaller
 example.  The following table shows how these equations work where
-`R`, `P`, and `S` are limited to 3 bits.
+`r`, `p`, and `s` are limited to 3 bits.
 
-Values of `P`, which have the same encoded value and interpretation as
+Values of `p`, which have the same encoded value and interpretation as
 for the proposed `log_head_adjusted_count` field of OTEP 170, would be
 interpreted as follows:
 
-| `P` value | Adjusted count |
+| `p` value | Adjusted count |
 | -----     | -----          |
 | 0         | Unknown        |
 | 1         | 1              |
@@ -242,11 +249,11 @@ interpreted as follows:
 | 7         | 0              |
 
 Note there are only 6 non-zero, non-unknown values for the adjusted
-count. Thus there are six defined values of `R` and `S`.  The
-following table shows `R` and the corresponding selection probability,
-along with the calculated adjusted count for each `S`:
+count. Thus there are six defined values of `r` and `s`.  The
+following table shows `r` and the corresponding selection probability,
+along with the calculated adjusted count for each `s`:
 
-| `R` value | `R` selection probability | `S=0` | `S=1` | `S=2` | `S=4` | `S=5` | `S=6` |
+| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=4` | `s=5` | `s=6` |
 | --        | --                        | --    | --    | --    | --    | --    | --    |
 | 0         | 1/2                       | 1     | 0     | 0     | 0     | 0     | 0     |
 | 1         | 1/4                       | 1     | 2     | 0     | 0     | 0     | 0     |
@@ -255,13 +262,45 @@ along with the calculated adjusted count for each `S`:
 | 4         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 0     |
 | 5         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 32    |
 
-Notice that the sum of `R` selection probability times adjusted count
-in each of the `S=*` columns equals 1.  For example, in the `S=5`
+Notice that the sum of `r` selection probability times adjusted count
+in each of the `s=*` columns equals 1.  For example, in the `s=5`
 column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
-16/32 + 16/32 = 1`.  In the `S=2` column we have `0*1/2 + 0*1/4 +
+16/32 + 16/32 = 1`.  In the `s=2` column we have `0*1/2 + 0*1/4 +
 4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
-1/4 + 1/8 + 1/8 = 1`.  We conclude that when `R` is chosen with the
-given probabilities, any choice of `S` produces one expected span.
+1/4 + 1/8 + 1/8 = 1`.  We conclude that when `r` is chosen with the
+given probabilities, any choice of `s` produces one expected span.
+
+## Summary
+
+The following table summarizes how the three Sampler cases behave with
+respect to the incoming and outgoing values for `p`, `r`, and
+`sampled`:
+
+| Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`    | Outgoing `p`   | Outgoing `sampled` |
+| --                     | --           | --           | --                 | --              | --             | --                 |
+| Parent                 | unused       | expected     | respected          | passed through  | passed through | passed through     |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | pass through    | set to `s+1`   | set to `p-1<r+1`   |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1<r+1`   |
+
+There are cases where the resulting span's `log_head_adjusted_count` is unknown:
+
+| Sampler                | Unknown condition |
+| --                     | --                |
+| Parent                 | no incoming `p`   |
+| TraceIDRatio(Root)     | no incoming `r`   |
+| TraceIDRatio(Non-Root) | none              |
+|                        |                   |
+
+There are cases where the combination of `p` and `r` and `sampled`
+that cannot be generated by the built-in samplers.  The case where
+sampled is true with `p=63` indicating 0% probability may be used when
+recording spans that were selected by a different sampler while a
+probability sampler is also in use.  These cases are known as "zero
+adjusted count" contexts which are sampled with 0% probability.
+
+The case where sampled is false with `p=1` indicating 100% probability
+is an illogical condition.  See [Propagating `p` when
+unsampled](#propagating-p-when-unsampled) below.
 
 ## Prototype
 
@@ -309,9 +348,25 @@ Samplers from using arbitrary effective probabilities over a period of
 time.  For example, choosing 1/2 sampling half of the time and 1/4
 sampling half of the time leads to an effective sampling rate of 3/8.
 
-## Prior art and alternatives
+### Propagating `p` when unsampled
+
+Consistent trace sampling requires the `r` value to be propagated even
+when the span itself is not sampled.  It is not necessary, however, to
+propagate the `p` value when the context is not sampled, since
+`ParentBased` samplers will not change the decision.  Although one
+use-case was docmented in Google's early Dapper system (known as
+"inflationary sampling", see
+https://github.com/open-telemetry/oteps/pull/170), the same effect can
+be achieved using a consistent sampling decision in this framework.
+
+### Default behavior
+
+In order for consistent trace sampling decisions to be made, the `r`
+value MUST be set at the root of the trace.  This behavior could be
+opt-in or opt-out.  If opt-in, users would have to enable the setting
+of `r` and the setting and propagating of `p` in the tracestate.  If
+opt-out, users would have to disable these features to turn them off.
+The cost and convenience of Sampling features depend on this choice.
 
-Google's Dapper system propagated a field in its trace context called
-"inverse_probability", which is equivalent to adjusted count.  This
-proposal uses the base-2 logarithm of adjusted count to save space and
-limit required randomness.
+This author's recommendation is that these behaviors be opt-out, i.e.,
+on-by-default.  This decision should not block this OTEP.

From e06a7cfc643c1ec47ff45c3977f1b9c0027242dc Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 8 Sep 2021 00:12:25 -0700
Subject: [PATCH 18/42] corner case edits

---
 text/trace/0168-sampling-propagation.md | 62 ++++++++++++-------------
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index aa5fc6e28..5a6556a1d 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -49,14 +49,14 @@ probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
 these probabilities as small integer values using the base-2 logarithm
 of the adjusted count.
 
-For example, the probability value 2 corresponds with 1-in-4 sampling,
-and the probability value 10 corresponds with 1-in-1024 sampling.  Using
-six bits of information we can convey sampling rates as small as
-2**-61.  The value 63 is reserved to mean sampling with probability 0,
-which conveys an adjusted count of 0 for the associated context.
+Using six bits of information we can convey known and unknown sampling
+rates as small as 2**-61.  The value 63 is reserved to mean sampling
+with probability 0, which conveys an adjusted count of 0 for the
+associated context.
 
 When propagated, the probability value will be interpreted as shown in
-the folowing table, which uses an offset of +1:
+the folowing table, which uses an offset of +1 in order to place the
+Unknown value at 0:
 
 | Probability Value | Head Probability | Note                   |
 | -----             | -----------      | ----                   |
@@ -76,9 +76,9 @@ the folowing table, which uses an offset of +1:
 sampled by the `ParentBased` sampler will encode the value that was
 propagated by the parent span as its "probability value" `p`.
 
-The value `p=0` MAY be propagated using `tracestate` explicitly, although
-equivalent interpretation can be obtained by omitting `p`, since that
-is the default.
+The value `p=0` SHOULD NOT be propagated using `tracestate`
+explicitly, because the equivalent interpretation can be obtained by
+omitting `p`.
 
 ### Randomness value
 
@@ -86,8 +86,7 @@ With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
 consistent sampling decision is accomplished by propagating a specific
 random variable denoted `r`.  The random variable is a described by a
-discrete geometric distribution having shape parameter `1/2`, listed
-below:
+geometric distribution having shape parameter `1/2`, listed below:
 
 | `r` Value | Selection Probability |
 | ---------------- | --------------------- |
@@ -106,14 +105,11 @@ pseudocode.
 
 ```golang
 func nextRandomness() int {
-  // Repeat until a valid result is produced.
-  for {
-    r := 0
-    for r < 61 && nextRandomBit() == false {
-      r++
-    }
-    return R
+  r := 0
+  for r < 61 && nextRandomBit() == false {
+    r++
   }
+  return R
 }
 ```
 
@@ -181,30 +177,30 @@ explains how to work with a limited number of power-of-2 sampling rates.
 ### Behavior of the `TraceIDRatioBased` Sampler
 
 The Sampler MUST be configured with a power-of-two probability
-expressed as `2**-s` except for the special case of zero probability,
-which is handled specially.
+expressed as `2**-s` except for the special case of zero probability.
 
 If the context is a new root, the initial `tracestate` must be created
-with randomness value `r` (as described above, in the range [0, 61]),
-and the initial head probability value `p` set to the initial value of
-`s` plus 1 (in the range [1, 63].  If the head probability is zero use
-`p=63`, the specified value for zero probability.
-
+with randomness value `r`, as described above, in the range [0, 61].
 If the context is not a new root, output a new `tracestate` with the
-same `r` value as the parent context, and 1 plus this Sampler's value
-of `s` for the outgoing context's `p` value.  The incoming context's
-`p` is not used.
-
-If the context is not a new root and the incoming context's `r` value
-is not set, the implementation SHOULD notify the user of an error
-condition and follow the incoming context's `sampled` flag.
+same `r` value as the parent context.
 
-The span's `log_head_adjusted_count` field is set to the outgoing `p`.
+When sampled, in both cases, the context's probability value `p` is
+set to the value of `s+1` in the range [1, 63].  If the sampling
+probability is zero (the special case where `s` is undefined), use
+`p=63` the specified value for zero probability.
 
 In both cases, set the `sampled` bit if the outgoing `p` minus 1 is
 less than the outgoing `r` minus 1 and `p` is less than 63, i.e.,
 sampled implies `p-1 < r+1` and `p < 63`.
 
+If the context is not a new root and the incoming context's `r` value
+is not set, the implementation SHOULD notify the user of an error
+condition and follow the incoming context's `sampled` flag.
+
+The span's `log_head_adjusted_count` field is set to the outgoing `p`
+unless `r` is unknown, in which case it MUST be set to zero (unknown
+probability).
+
 ### Behavior of the `ParentBased` sampler
 
 The `ParentBased` sampler is modified by this proposal.  It honors

From 08046499fe6d197831cfb0cbd1601caa0ed9de9f Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 8 Sep 2021 00:17:09 -0700
Subject: [PATCH 19/42] corner case edits

---
 text/trace/0168-sampling-propagation.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 5a6556a1d..7419fb4dd 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -189,9 +189,9 @@ set to the value of `s+1` in the range [1, 63].  If the sampling
 probability is zero (the special case where `s` is undefined), use
 `p=63` the specified value for zero probability.
 
-In both cases, set the `sampled` bit if the outgoing `p` minus 1 is
-less than the outgoing `r` minus 1 and `p` is less than 63, i.e.,
-sampled implies `p-1 < r+1` and `p < 63`.
+In both cases, set the `sampled` bit if the outgoing `p` minus one is
+less than the outgoing `r` plus one and `p` is less than 63 (i.e.,
+`p-1 < r+1` and `p < 63` implies sampled).
 
 If the context is not a new root and the incoming context's `r` value
 is not set, the implementation SHOULD notify the user of an error
@@ -208,8 +208,8 @@ the W3C `sampled` flag and copies the incoming `tracestate` keys to
 the child context.
 
 The span's `log_head_adjusted_count` field is set to the incoming
-value of `p` when both `p` and `r` is defined.  When `r` is not
-defined the span's `log_head_adjusted_count` MUST be set to 0
+value of `p` when both `p` and `r` are defined.  When `r` is not
+defined, the span's `log_head_adjusted_count` MUST be set to 0
 indicating unknown probability, because the decision cannot be made
 consistently across the trace.
 
@@ -275,17 +275,17 @@ respect to the incoming and outgoing values for `p`, `r`, and
 | Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`    | Outgoing `p`   | Outgoing `sampled` |
 | --                     | --           | --           | --                 | --              | --             | --                 |
 | Parent                 | unused       | expected     | respected          | passed through  | passed through | passed through     |
-| TraceIDRatio(Non-Root) | used         | unused       | ignored            | pass through    | set to `s+1`   | set to `p-1<r+1`   |
-| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1<r+1`   |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | passed through  | set to `s+1`   | set to `p-1 < r+1` |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1 < r+1` |
 
-There are cases where the resulting span's `log_head_adjusted_count` is unknown:
+There are several cases where the resulting span's
+`log_head_adjusted_count` is unknown:
 
 | Sampler                | Unknown condition |
 | --                     | --                |
 | Parent                 | no incoming `p`   |
 | TraceIDRatio(Root)     | no incoming `r`   |
 | TraceIDRatio(Non-Root) | none              |
-|                        |                   |
 
 There are cases where the combination of `p` and `r` and `sampled`
 that cannot be generated by the built-in samplers.  The case where

From cb068a286eb7451d28c94553c17a0a0ca986cbdb Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 8 Sep 2021 08:22:20 -0700
Subject: [PATCH 20/42] edit

---
 text/trace/0168-sampling-propagation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 7419fb4dd..ca22e2859 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -275,8 +275,8 @@ respect to the incoming and outgoing values for `p`, `r`, and
 | Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`    | Outgoing `p`   | Outgoing `sampled` |
 | --                     | --           | --           | --                 | --              | --             | --                 |
 | Parent                 | unused       | expected     | respected          | passed through  | passed through | passed through     |
-| TraceIDRatio(Non-Root) | used         | unused       | ignored            | passed through  | set to `s+1`   | set to `p-1 < r+1` |
-| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1 < r+1` |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | passed through  | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
 
 There are several cases where the resulting span's
 `log_head_adjusted_count` is unknown:

From c9fa24f282f17b6c6cfdf93dde9c21d4b8d33d4c Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 8 Sep 2021 16:14:42 -0700
Subject: [PATCH 21/42] from @oertl feedback especially

---
 text/trace/0168-sampling-propagation.md | 116 ++++++++++++++----------
 1 file changed, 68 insertions(+), 48 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index ca22e2859..359a590b0 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -32,7 +32,7 @@ sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
 Two pieces of information are needed to convey consistent head trace
 sampling probability:
 
-1. The head trace sampling probability
+1. The head trace sampling probability.
 2. Source of consistent sampling decisions.
 
 This proposal uses 6 bits of information for each of these and does
@@ -85,20 +85,22 @@ omitting `p`.
 With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
 consistent sampling decision is accomplished by propagating a specific
-random variable denoted `r`.  The random variable is a described by a
-geometric distribution having shape parameter `1/2`, listed below:
-
-| `r` Value | Selection Probability |
-| ---------------- | --------------------- |
-| 0 | 1/2 |
-| 1 | 1/4 |
-| 2 | 1/8 |
-| 3 | 1/16 |
-| ... | ... |
-| 0 <= `r` <= 61 | 1/(2**(-`r`+1)) |
-| ... | ... |
-| 60 | 2**-61 |
-| 61 | 2**-61 |
+random variable denoted `r`.  The random variable is described by a
+(truncated) geometric distribution having shape parameter `1/2`, listed below:
+
+| `r` Value        | Selection Probability | Sampling probability   |
+| ---------------- | --------------------- | ----                   |
+| 0                | 1/2                   | 1-in-1                 |
+| 1                | 1/4                   | 1-in-2 and above       |
+| 2                | 1/8                   | 1-in-4 and above       |
+| 3                | 1/16                  | 1-in-8 and above       |
+| ...              | ...                   | ...                    |
+| 0 <= `r` <= 60   | 1/(2**(-`r`-1))       | 1-in-2**-`r` and above |
+| ...              | ...                   | ...                    |
+| 58               | 2**-59                | 1-in-2**-58 and above  |
+| 59               | 2**-60                | 1-in-2**-59 and above  |
+| 60               | 2**-61                | 1-in-2**-60 and above  |
+| 61               | 2**-61                | 1-in-2**-61 and above  |
 
 Such a random variable `r` can be generated using the following
 pseudocode.
@@ -114,7 +116,7 @@ func nextRandomness() int {
 ```
 
 This can be computed from a stream of random bits as the number of
-leadieng zeros using efficient instructions on modern computer
+leading zeros using efficient instructions on modern computer
 architectures.
 
 For example, the value 3 means there were three leading zeros and
@@ -125,7 +127,7 @@ but not at probabilities 1-in-16 and smaller.
 
 The consistent sampling randomness valuw (`r`) and and head sampling
 probability value (`p`) will be propagated using two bytes of base16 content
-for each of the two fields, as follows,
+for each of the two fields, as follows:
 
 ```
 tracestate: otel=p:PP;r:RR
@@ -152,13 +154,13 @@ tracestate: otel=r:0a;p:03
 translates to
 
 ```
-base16(probability) = 03 // 1-in-8 head probability
-base16(randomness) = 0a // qualifies for 1-in-1024 sampling or greater
+base16(probability) = 03 // 1-in-4 head probability
+base16(randomness) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
 ```
 
-Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
+Any `TraceIDRatioBased` Sampler configured with probability 2**-9 or
 greater will enable sampling this trace, whereas any
-`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
+`TraceIDRatioBased` Sampler configured with probability 2**-10 or less
 will stop sampling this trace.
 
 ## Internal details
@@ -177,7 +179,8 @@ explains how to work with a limited number of power-of-2 sampling rates.
 ### Behavior of the `TraceIDRatioBased` Sampler
 
 The Sampler MUST be configured with a power-of-two probability
-expressed as `2**-s` except for the special case of zero probability.
+expressed as `2**-s` with s being an integer in the range [0, 61]
+except for the special case of zero probability.
 
 If the context is a new root, the initial `tracestate` must be created
 with randomness value `r`, as described above, in the range [0, 61].
@@ -189,9 +192,8 @@ set to the value of `s+1` in the range [1, 63].  If the sampling
 probability is zero (the special case where `s` is undefined), use
 `p=63` the specified value for zero probability.
 
-In both cases, set the `sampled` bit if the outgoing `p` minus one is
-less than the outgoing `r` plus one and `p` is less than 63 (i.e.,
-`p-1 < r+1` and `p < 63` implies sampled).
+In both cases, set the sampled bit if the outgoing `p` minus one is
+less than or equal to the outgoing `r` (i.e., `p-1 <= r`).
 
 If the context is not a new root and the incoming context's `r` value
 is not set, the implementation SHOULD notify the user of an error
@@ -216,12 +218,12 @@ consistently across the trace.
 ### Behavior of the `AlwaysOn` Sampler
 
 The `AlwaysOn` Sampler behaves the same as `TraceIDRatioBased` with
-100% sampling probability (i.e., `s=0` yielding `p=1`).
+100% sampling probability (i.e., `p=1`).
 
 ### Behavior of the `AlwaysOff` Sampler
 
 The `AlwaysOff` Sampler behaves the same as `TraceIDRatioBased` with
-zero probability (i.e., `p=63`, `s` undefined).
+zero probability (i.e., `p=63`).
 
 ## Worked 3-bit example
 
@@ -249,7 +251,7 @@ count. Thus there are six defined values of `r` and `s`.  The
 following table shows `r` and the corresponding selection probability,
 along with the calculated adjusted count for each `s`:
 
-| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=4` | `s=5` | `s=6` |
+| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` |
 | --        | --                        | --    | --    | --    | --    | --    | --    |
 | 0         | 1/2                       | 1     | 0     | 0     | 0     | 0     | 0     |
 | 1         | 1/4                       | 1     | 2     | 0     | 0     | 0     | 0     |
@@ -259,24 +261,24 @@ along with the calculated adjusted count for each `s`:
 | 5         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 32    |
 
 Notice that the sum of `r` selection probability times adjusted count
-in each of the `s=*` columns equals 1.  For example, in the `s=5`
+in each of the `s=*` columns equals 1.  For example, in the `s=4`
 column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
 16/32 + 16/32 = 1`.  In the `s=2` column we have `0*1/2 + 0*1/4 +
 4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
 1/4 + 1/8 + 1/8 = 1`.  We conclude that when `r` is chosen with the
 given probabilities, any choice of `s` produces one expected span.
 
-## Summary
+## Invariant checking
 
 The following table summarizes how the three Sampler cases behave with
 respect to the incoming and outgoing values for `p`, `r`, and
 `sampled`:
 
-| Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`    | Outgoing `p`   | Outgoing `sampled` |
-| --                     | --           | --           | --                 | --              | --             | --                 |
-| Parent                 | unused       | expected     | respected          | passed through  | passed through | passed through     |
-| TraceIDRatio(Non-Root) | used         | unused       | ignored            | passed through  | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
-| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable | set to `s+1`   | set to `p-1 < r+1 && p < 63` |
+| Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`               | Outgoing `p`               | Outgoing `sampled`         |
+| --                     | --           | --           | --                 | --                         | --                         | --                         |
+| Parent                 | unused       | expected     | respected          | checked and passed through | checked and passed through | checked and passed through |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | checked and passed through | set to `s+1`               | set to `p-1 <= r`          |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable            | set to `s+1`               | set to `p-1 <= r`          |
 
 There are several cases where the resulting span's
 `log_head_adjusted_count` is unknown:
@@ -284,19 +286,37 @@ There are several cases where the resulting span's
 | Sampler                | Unknown condition |
 | --                     | --                |
 | Parent                 | no incoming `p`   |
-| TraceIDRatio(Root)     | no incoming `r`   |
-| TraceIDRatio(Non-Root) | none              |
+| TraceIDRatio(Non-Root) | no incoming `r`   |
+| TraceIDRatio(Root)     | none              |
 
-There are cases where the combination of `p` and `r` and `sampled`
-that cannot be generated by the built-in samplers.  The case where
-sampled is true with `p=63` indicating 0% probability may be used when
-recording spans that were selected by a different sampler while a
-probability sampler is also in use.  These cases are known as "zero
-adjusted count" contexts which are sampled with 0% probability.
+The inputs are recognized as out-of-range as follows:
 
-The case where sampled is false with `p=1` indicating 100% probability
-is an illogical condition.  See [Propagating `p` when
-unsampled](#propagating-p-when-unsampled) below.
+| Range invariate | Remedy                           |
+| --              | --                               |
+| `p < 0`         | drop `p` from tracestate         |
+| `p > 63`        | drop `p` from tracestate         |
+| `r < 0`         | drop `r` and `p` from tracestate |
+| `r > 61`        | drop `r` and `p` from tracestate |
+
+There are cases where the combination of `p` and `r` and `sampled` are
+inconsistent with each other.  The `sampled` flag is equivalent to the
+expression `p - 1 <= r`.  When the invariant `sampled <=> p - 1 <= r`
+is violated, the `ParentBased` sampler MUST correct the propagated
+values as discussed below.
+
+The violation is always addressed by honoring the `sampled` flag and
+setting `log_head_adjusted_count` to either 0 (Unknown) or 63 (Zero).
+
+If `sampled` is false and the invariant is bilated, drop `p` from the
+outgoing context to convey unknown head probability.
+
+The case where `sampled` is true with `p=63` indicating 0% probability
+may by regarded as a special case to allow zero adjusted count
+sampling, which permits non-probabilistic sampling to take place in
+the presence of probability sampling.
+
+If `sampled` is true with `p<63`, drop `p` from the outgoing context
+to convey unknown head probability.
 
 ## Prototype
 
@@ -309,9 +329,9 @@ were needed.
 
 ### Not using TraceID randomness
 
-It would be possible, if TraceID were specified to have at least 62
+It would be possible, if TraceID were specified to have at least 61
 uniform random bits, to compute the randomness value described above
-as the number of leading zeros among those 62 random bits.
+as the number of leading zeros among those 61 random bits.
 
 This proposal requires modifying the W3C traceparent specification,
 therefore we do not propose to use bits of the TraceID.

From 1b3ae2353912bb535d5aab1e1cbd74307edad7c2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 8 Sep 2021 16:17:53 -0700
Subject: [PATCH 22/42] clarify

---
 text/trace/0168-sampling-propagation.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 359a590b0..4f122c935 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -308,15 +308,18 @@ The violation is always addressed by honoring the `sampled` flag and
 setting `log_head_adjusted_count` to either 0 (Unknown) or 63 (Zero).
 
 If `sampled` is false and the invariant is bilated, drop `p` from the
-outgoing context to convey unknown head probability.
+outgoing context to convey unknown head probability.  Set
+`log_head_adjusted_count` to 0.
 
 The case where `sampled` is true with `p=63` indicating 0% probability
 may by regarded as a special case to allow zero adjusted count
 sampling, which permits non-probabilistic sampling to take place in
-the presence of probability sampling.
+the presence of probability sampling.  Set `log_head_adjusted_count`
+to 63.
 
 If `sampled` is true with `p<63`, drop `p` from the outgoing context
-to convey unknown head probability.
+to convey unknown head probability.  Set `log_head_adjusted_count` to
+0.
 
 ## Prototype
 

From d0c2697331677485fdfb16dc38fbfc52588e9877 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Thu, 9 Sep 2021 09:11:02 -0700
Subject: [PATCH 23/42] Apply suggestions from code review

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
---
 text/trace/0168-sampling-propagation.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 4f122c935..6c214be4f 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -32,8 +32,8 @@ sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
 Two pieces of information are needed to convey consistent head trace
 sampling probability:
 
-1. The head trace sampling probability.
-2. Source of consistent sampling decisions.
+1. p-value representing the head trace sampling probability.
+2. r-value representing the "randomness" as the source of consistent sampling decisions.
 
 This proposal uses 6 bits of information for each of these and does
 not depend on built-in TraceID randomness, which is not sufficiently
@@ -229,7 +229,7 @@ zero probability (i.e., `p=63`).
 
 The behavior of these tables can be verified by hand using a smaller
 example.  The following table shows how these equations work where
-`r`, `p`, and `s` are limited to 3 bits.
+`r`, `p`, and `s` are limited to 3 bits instead of 6 bits.
 
 Values of `p`, which have the same encoded value and interpretation as
 for the proposed `log_head_adjusted_count` field of OTEP 170, would be
@@ -336,7 +336,7 @@ It would be possible, if TraceID were specified to have at least 61
 uniform random bits, to compute the randomness value described above
 as the number of leading zeros among those 61 random bits.
 
-This proposal requires modifying the W3C traceparent specification,
+However, this would require modifying the W3C traceparent specification,
 therefore we do not propose to use bits of the TraceID.
 
 [This issue has been filed with the W3C trace context group.](https://github.com/w3c/trace-context/issues/463)

From 98f6403d93600217daaa0c9b8c442ba23c129b5d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 9 Sep 2021 14:14:10 -0700
Subject: [PATCH 24/42] rewrite explaination for r-value

---
 text/trace/0168-sampling-propagation.md | 131 +++++++++++++++---------
 1 file changed, 80 insertions(+), 51 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 6c214be4f..35cad9b59 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -40,85 +40,114 @@ not depend on built-in TraceID randomness, which is not sufficiently
 specified for probability sampling at this time.  This proposal closely 
 follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).
 
-### Probability value
+### p-value
 
 To limit the cost of this extension and for statistical reasons
 documented below, we propose to limit head trace sampling probability
 to powers of two.  This limits the available head trace sampling
 probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
 these probabilities as small integer values using the base-2 logarithm
-of the adjusted count.
+of the adjusted count known.
 
 Using six bits of information we can convey known and unknown sampling
 rates as small as 2**-61.  The value 63 is reserved to mean sampling
 with probability 0, which conveys an adjusted count of 0 for the
 associated context.
 
-When propagated, the probability value will be interpreted as shown in
-the folowing table, which uses an offset of +1 in order to place the
-Unknown value at 0:
-
-| Probability Value | Head Probability | Note                   |
-| -----             | -----------      | ----                   |
-| 0                 | Unknown          | Reserved for span data |
-| 1                 | 1                |                        |
-| 2                 | 1/2              |                        |
-| 3                 | 1/4              |                        |
-| ...               | ...              |                        |
-| N                 | 2**(-N+1)        | 1 in 2**(N-1)          |
-| ...               | ...              |                        |
-| 61                | 2**-60           |                        |
-| 62                | 2**-61           |                        |
-| 63                | 0                | Maximum encoded value  |
+When propagated, the "p-value" as it is known will be interpreted as
+shown in the folowing table.  The p-value for known sampling
+probabilities is the negative base-2 logarithm of the probability,
+offset by +1 to so that the 0 p-value can be treated as unknown (for
+backwards compatibility):
+
+| p-value | Head Probability | Note                                                 |
+| -----   | -----------      | ----                                                 |
+| 0       | Unknown          | Do not propagate `p=0`, instead omit from tracestate |
+| 1       | 1                |                                                      |
+| 2       | 1/2              |                                                      |
+| 3       | 1/4              |                                                      |
+| ...     | ...              |                                                      |
+| N       | 2**(-N+1)        | 1 in 2**(N-1)                                        |
+| ...     | ...              |                                                      |
+| 61      | 2**-60           |                                                      |
+| 62      | 2**-61           |                                                      |
+| 63      | 0                | Maximum encoded value                                |
 
 [Described in OTEP
-170](https://github.com/open-telemetry/oteps/pull/170), Span data
-sampled by the `ParentBased` sampler will encode the value that was
-propagated by the parent span as its "probability value" `p`.
+170](https://github.com/open-telemetry/oteps/pull/170), the
+`ParentBased` sampler will use the incoming context's p-value as
+specified here to set the span's `log_head_adjusted_count` field.
 
 The value `p=0` SHOULD NOT be propagated using `tracestate`
 explicitly, because the equivalent interpretation can be obtained by
 omitting `p`.
 
-### Randomness value
+### r-value
 
 With head trace sampling probabilities limited to powers of two, the
 amount of randomness needed per trace context is limited.  A
 consistent sampling decision is accomplished by propagating a specific
-random variable denoted `r`.  The random variable is described by a
-(truncated) geometric distribution having shape parameter `1/2`, listed below:
-
-| `r` Value        | Selection Probability | Sampling probability   |
-| ---------------- | --------------------- | ----                   |
-| 0                | 1/2                   | 1-in-1                 |
-| 1                | 1/4                   | 1-in-2 and above       |
-| 2                | 1/8                   | 1-in-4 and above       |
-| 3                | 1/16                  | 1-in-8 and above       |
-| ...              | ...                   | ...                    |
-| 0 <= `r` <= 60   | 1/(2**(-`r`-1))       | 1-in-2**-`r` and above |
-| ...              | ...                   | ...                    |
-| 58               | 2**-59                | 1-in-2**-58 and above  |
-| 59               | 2**-60                | 1-in-2**-59 and above  |
-| 60               | 2**-61                | 1-in-2**-60 and above  |
-| 61               | 2**-61                | 1-in-2**-61 and above  |
-
-Such a random variable `r` can be generated using the following
-pseudocode.
+random variable known as the r-value.
+
+To develop an intuition for r-values, consider a scenario where every
+bit of the `TraceID` is generated by a uniform random bit generator
+(i.e., every bit is 0 or 1 with equal probability).  An 128-bit
+`TraceID` can therefore be treated as a 128-bit unsigned integer,
+which can be mapped into a fraction with range [0, 1) by dividing by
+2**128, a form known as the TraceID-ratio.  Now, probability sampling
+could be achieved by comparing the TraceID-ratio with the sampling
+probability, setting the `sampled` flag when TraceID-ratio is less
+than the sampling probability.
+
+It is easy to see that with sampling probability 1, all TraceIDs will
+be accepted because TraceID ratios are exclusively less than 1.
+Sampling with probability 50% will select TraceID ratios less than
+0.5, which maps to all TraceIDs less than 2**127 or, equivalently, all
+TraceIDs where the most significant bit is zero.  By the same logic,
+sampling with probability 25% means accepting TraceIDs where the most
+significant two bits are zero.  In general, with exact probability
+`2**-S` is equivalent to selecting TraceIDs with `S` leading zeros in
+this example scenario.
+
+The r-value specified here directly describes the number of leading
+zeros in a random 61-bit string, specified in a way that does not
+require TraceID values to be constructed with random bits in specific
+positions or with hard requirements on their uniformity.  In
+mathematical terms, the r-value is described by a truncated geometric
+distribution having shape parameter `1/2`, listed below:
+
+| `r` Value        | Probability of `r-value` | Implied sampling probabilities |
+| ---------------- | ------------------------ | ----------------------         |
+| 0                | 1/2                      | 1                              |
+| 1                | 1/4                      | 1/2 and above                  |
+| 2                | 1/8                      | 1/4 and above                  |
+| 3                | 1/16                     | 1/8 and above                  |
+| ...              | ...                      | ...                            |
+| 0 <= `r` <= 60   | 1/(2**(-`r`-1))          | 2**-`r` and above              |
+| ...              | ...                      | ...                            |
+| 58               | 2**-59                   | 2**-58 and above               |
+| 59               | 2**-60                   | 2**-59 and above               |
+| 60               | 2**-61                   | 2**-60 and above               |
+| 61               | 2**-61                   | 2**-61 and above               |
+
+Such a random variable `r` can be generated using efficient
+instructions on modern computer architectures.
 
 ```golang
-func nextRandomness() int {
-  r := 0
-  for r < 61 && nextRandomBit() == false {
-    r++
-  }
-  return R
+import (
+	"math/rand"
+	"math/bits"
+)
+
+func nextRValueLeading() int {
+	x := uint64(rand.Int63()) // 63 least-significant bits are random
+	y := x << 1 | 0x7         // 61 most-significant bits are random
+	return bits.LeadingZeros64(y)
 }
 ```
 
-This can be computed from a stream of random bits as the number of
-leading zeros using efficient instructions on modern computer
-architectures.
-
+More examples for calculating r-values are shown in
+[here](https://gist.github.com/jmacd/79c38c1056035c52f6fff7b7fc071274).
 For example, the value 3 means there were three leading zeros and
 corresponds with being sampled at probabilities 1-in-1 through 1-in-8
 but not at probabilities 1-in-16 and smaller.

From 16947f7f098ed52ef5d2ba7601a794458a5f20a6 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 9 Sep 2021 14:16:13 -0700
Subject: [PATCH 25/42] more

---
 text/trace/0168-sampling-propagation.md | 35 ++++++++++++-------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 35cad9b59..ebd65c2aa 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -114,7 +114,7 @@ zeros in a random 61-bit string, specified in a way that does not
 require TraceID values to be constructed with random bits in specific
 positions or with hard requirements on their uniformity.  In
 mathematical terms, the r-value is described by a truncated geometric
-distribution having shape parameter `1/2`, listed below:
+distribution, listed below:
 
 | `r` Value        | Probability of `r-value` | Implied sampling probabilities |
 | ---------------- | ------------------------ | ----------------------         |
@@ -154,17 +154,16 @@ but not at probabilities 1-in-16 and smaller.
 
 ### Proposed `tracestate` syntax
 
-The consistent sampling randomness valuw (`r`) and and head sampling
-probability value (`p`) will be propagated using two bytes of base16 content
-for each of the two fields, as follows:
+The consistent sampling r-value (`r`) and and head sampling
+probability p-value (`p`) will be propagated using two bytes of base16
+content for each of the two fields, as follows:
 
 ```
 tracestate: otel=p:PP;r:RR
 ```
 
-where `PP` are two bytes of base16 probability value and `RR` are two
-bytes of base16 random value.  These values are omitted when they are
-unknown.
+where `PP` are two bytes of base16 p-value and `RR` are two bytes of
+base16 r-value.  These values are omitted when they are unknown.
 
 This proposal should be taken as a recommendation and will be modified
 to [match whatever format OpenTelemtry specifies for its
@@ -183,8 +182,8 @@ tracestate: otel=r:0a;p:03
 translates to
 
 ```
-base16(probability) = 03 // 1-in-4 head probability
-base16(randomness) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
+base16(p-value) = 03 // 1-in-4 head probability
+base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
 ```
 
 Any `TraceIDRatioBased` Sampler configured with probability 2**-9 or
@@ -216,15 +215,15 @@ with randomness value `r`, as described above, in the range [0, 61].
 If the context is not a new root, output a new `tracestate` with the
 same `r` value as the parent context.
 
-When sampled, in both cases, the context's probability value `p` is
-set to the value of `s+1` in the range [1, 63].  If the sampling
-probability is zero (the special case where `s` is undefined), use
-`p=63` the specified value for zero probability.
+When sampled, in both cases, the context's p-value `p` is set to the
+value of `s+1` in the range [1, 63].  If the sampling probability is
+zero (the special case where `s` is undefined), use `p=63` the
+specified value for zero probability.
 
 In both cases, set the sampled bit if the outgoing `p` minus one is
 less than or equal to the outgoing `r` (i.e., `p-1 <= r`).
 
-If the context is not a new root and the incoming context's `r` value
+If the context is not a new root and the incoming context's r-value
 is not set, the implementation SHOULD notify the user of an error
 condition and follow the incoming context's `sampled` flag.
 
@@ -239,10 +238,10 @@ the W3C `sampled` flag and copies the incoming `tracestate` keys to
 the child context.
 
 The span's `log_head_adjusted_count` field is set to the incoming
-value of `p` when both `p` and `r` are defined.  When `r` is not
-defined, the span's `log_head_adjusted_count` MUST be set to 0
-indicating unknown probability, because the decision cannot be made
-consistently across the trace.
+p-value when both `p` and `r` are defined.  When `r` is not defined,
+the span's `log_head_adjusted_count` MUST be set to 0 indicating
+unknown probability, because the decision cannot be made consistently
+across the trace.
 
 ### Behavior of the `AlwaysOn` Sampler
 

From d9a4d5989568b6d0e91b3112fa2419d6f9844f14 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 9 Sep 2021 14:22:42 -0700
Subject: [PATCH 26/42] example

---
 text/trace/0168-sampling-propagation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index ebd65c2aa..2f7be58e7 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -131,7 +131,7 @@ distribution, listed below:
 | 61               | 2**-61                   | 2**-61 and above               |
 
 Such a random variable `r` can be generated using efficient
-instructions on modern computer architectures.
+instructions on modern computer architectures, for example:
 
 ```golang
 import (

From 34ec60493e8123bd63b536e376cdc9f8ff56a906 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 9 Sep 2021 14:40:18 -0700
Subject: [PATCH 27/42] selection probability -> probabilty of r

---
 text/trace/0168-sampling-propagation.md | 32 ++++++++++++-------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 2f7be58e7..0d36338de 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -279,22 +279,22 @@ count. Thus there are six defined values of `r` and `s`.  The
 following table shows `r` and the corresponding selection probability,
 along with the calculated adjusted count for each `s`:
 
-| `r` value | `r` selection probability | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` |
-| --        | --                        | --    | --    | --    | --    | --    | --    |
-| 0         | 1/2                       | 1     | 0     | 0     | 0     | 0     | 0     |
-| 1         | 1/4                       | 1     | 2     | 0     | 0     | 0     | 0     |
-| 2         | 1/8                       | 1     | 2     | 4     | 0     | 0     | 0     |
-| 3         | 1/16                      | 1     | 2     | 4     | 8     | 0     | 0     |
-| 4         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 0     |
-| 5         | 1/32                      | 1     | 2     | 4     | 8     | 16    | 32    |
-
-Notice that the sum of `r` selection probability times adjusted count
-in each of the `s=*` columns equals 1.  For example, in the `s=4`
-column we have `0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 =
-16/32 + 16/32 = 1`.  In the `s=2` column we have `0*1/2 + 0*1/4 +
-4*1/8 + 4*1/16 + 4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 +
-1/4 + 1/8 + 1/8 = 1`.  We conclude that when `r` is chosen with the
-given probabilities, any choice of `s` produces one expected span.
+| `r` value | probability of `r` | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` |
+| --        | --                 | --    | --    | --    | --    | --    | --    |
+| 0         | 1/2                | 1     | 0     | 0     | 0     | 0     | 0     |
+| 1         | 1/4                | 1     | 2     | 0     | 0     | 0     | 0     |
+| 2         | 1/8                | 1     | 2     | 4     | 0     | 0     | 0     |
+| 3         | 1/16               | 1     | 2     | 4     | 8     | 0     | 0     |
+| 4         | 1/32               | 1     | 2     | 4     | 8     | 16    | 0     |
+| 5         | 1/32               | 1     | 2     | 4     | 8     | 16    | 32    |
+
+Notice that the sum of `r` probability times adjusted count in each of
+the `s=*` columns equals 1.  For example, in the `s=4` column we have
+`0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 = 16/32 + 16/32 =
+1`.  In the `s=2` column we have `0*1/2 + 0*1/4 + 4*1/8 + 4*1/16 +
+4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 + 1/4 + 1/8 + 1/8 =
+1`.  We conclude that when `r` is chosen with the given probabilities,
+any choice of `s` produces one expected span.
 
 ## Invariant checking
 

From 48123fe0667c6af1d605a07ae52c26260982ea05 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 9 Sep 2021 16:17:12 -0700
Subject: [PATCH 28/42] typos

---
 text/trace/0168-sampling-propagation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 0d36338de..bc740fa0b 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -47,9 +47,9 @@ documented below, we propose to limit head trace sampling probability
 to powers of two.  This limits the available head trace sampling
 probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
 these probabilities as small integer values using the base-2 logarithm
-of the adjusted count known.
+of the adjusted count.
 
-Using six bits of information we can convey known and unknown sampling
+Using six bits of information we can convey unknown and known sampling
 rates as small as 2**-61.  The value 63 is reserved to mean sampling
 with probability 0, which conveys an adjusted count of 0 for the
 associated context.

From 139f248e58352403d53ee8f6b7d5d5dd80d0af8d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 10 Sep 2021 09:51:46 -0700
Subject: [PATCH 29/42] another example

---
 text/trace/0168-sampling-propagation.md | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index bc740fa0b..e7ce3ee69 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -131,7 +131,8 @@ distribution, listed below:
 | 61               | 2**-61                   | 2**-61 and above               |
 
 Such a random variable `r` can be generated using efficient
-instructions on modern computer architectures, for example:
+instructions on modern computer architectures, for example we may
+compute the number of leading zeros using hardware support:
 
 ```golang
 import (
@@ -146,6 +147,25 @@ func nextRValueLeading() int {
 }
 ```
 
+Or we may compute the number of trialing zeros, for example:
+
+```golang
+import (
+	"math/rand"
+)
+
+func nextRValueLeading() int {
+	x := uint64(rand.Int63())
+	for r := 0; r < 61; r++ {
+		if x & 0x1 == 0x1 {
+			return r
+		}
+		x = x >> 1
+	}
+	return 61
+}
+```
+
 More examples for calculating r-values are shown in
 [here](https://gist.github.com/jmacd/79c38c1056035c52f6fff7b7fc071274).
 For example, the value 3 means there were three leading zeros and

From 2a37c4ccb542f660ccb6fabfa5af3fa6ffbeee1f Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 10 Sep 2021 10:00:38 -0700
Subject: [PATCH 30/42] off-by-ones

---
 text/trace/0168-sampling-propagation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index e7ce3ee69..4f3fd12ba 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -206,9 +206,9 @@ base16(p-value) = 03 // 1-in-4 head probability
 base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
 ```
 
-Any `TraceIDRatioBased` Sampler configured with probability 2**-9 or
+Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
 greater will enable sampling this trace, whereas any
-`TraceIDRatioBased` Sampler configured with probability 2**-10 or less
+`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
 will stop sampling this trace.
 
 ## Internal details

From a9c7500b6baa7f7f6867b8a9fd81bf484f82bfab Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 10 Sep 2021 12:28:05 -0700
Subject: [PATCH 31/42] discuss naming

---
 text/trace/0168-sampling-propagation.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 4f3fd12ba..8860c331b 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -378,6 +378,21 @@ were needed.
 
 ## Trade-offs and mitigations
 
+### Naming question
+
+This proposal changes the logic of the `TraceIDRatioBased` sampler,
+currently part of the OpenTelemetry specification, in a way that makes
+the name no longer meaningful.  The proposed sampler may be named
+`ConsistentSampler` and the existing `TraceIDRatioBased` sampler can
+be deprecated.
+
+Many SDKs already implement the `TraceIDRatioBased` sampler and it has
+been used for probability sampling at trace roots with arbitrary
+(i.e., not power-of-two) probabilities.  Because of this, we may keep
+the current (under-specified) `TraceIDRatioBased` sampler and rename
+it `ProbabilitySampler` to convey that it does behave in a specified
+way with respect to the bits of the TraceID.
+
 ### Not using TraceID randomness
 
 It would be possible, if TraceID were specified to have at least 61

From b11f70e6438f6a860ae0cd18edaef09d9f4ce0e2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Fri, 10 Sep 2021 12:28:52 -0700
Subject: [PATCH 32/42] Apply suggestions from code review

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
---
 text/trace/0168-sampling-propagation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 8860c331b..3d27ae3ed 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -147,14 +147,14 @@ func nextRValueLeading() int {
 }
 ```
 
-Or we may compute the number of trialing zeros, for example:
+Or we may compute the number of trailing zeros instead, for example:
 
 ```golang
 import (
 	"math/rand"
 )
 
-func nextRValueLeading() int {
+func nextRValueTrailing() int {
 	x := uint64(rand.Int63())
 	for r := 0; r < 61; r++ {
 		if x & 0x1 == 0x1 {

From 2a59cfc21869cc7e30cca48c20e33c4cdf79b7ea Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 13 Sep 2021 10:03:09 -0700
Subject: [PATCH 33/42] off-by-zero

---
 text/trace/0168-sampling-propagation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 8860c331b..16ec9a93c 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -236,7 +236,7 @@ If the context is not a new root, output a new `tracestate` with the
 same `r` value as the parent context.
 
 When sampled, in both cases, the context's p-value `p` is set to the
-value of `s+1` in the range [1, 63].  If the sampling probability is
+value of `s+1` in the range [1, 62].  If the sampling probability is
 zero (the special case where `s` is undefined), use `p=63` the
 specified value for zero probability.
 

From 3097dcbaa4c66a1d8586d250f0ef7dfe6ddb3427 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 15 Sep 2021 11:33:06 -0700
Subject: [PATCH 34/42] lint

---
 text/trace/0168-sampling-propagation.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 31908db9e..088d49046 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -12,7 +12,7 @@ head trace sampling probability associated with a context in order to
 build span-to-metrics pipelines when the built-in `ParentBased`
 Sampler is used.  Further motivation for supporting span-to-metrics
 pipelines is presented in [OTEP
-170](https://github.com/open-telemetry/oteps/pull/170).
+170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md).
 
 A consistent trace sampling decision is one that can be carried out at
 any node in a trace, which supports collecting partial traces.
@@ -55,7 +55,7 @@ with probability 0, which conveys an adjusted count of 0 for the
 associated context.
 
 When propagated, the "p-value" as it is known will be interpreted as
-shown in the folowing table.  The p-value for known sampling
+shown in the following table.  The p-value for known sampling
 probabilities is the negative base-2 logarithm of the probability,
 offset by +1 to so that the 0 p-value can be treated as unknown (for
 backwards compatibility):
@@ -74,7 +74,7 @@ backwards compatibility):
 | 63      | 0                | Maximum encoded value                                |
 
 [Described in OTEP
-170](https://github.com/open-telemetry/oteps/pull/170), the
+170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), the
 `ParentBased` sampler will use the incoming context's p-value as
 specified here to set the span's `log_head_adjusted_count` field.
 
@@ -419,11 +419,11 @@ data to avoid the computational cost of hashing TraceIDs.
 
 Restricting head sampling rates to powers of two does not limit tail
 Samplers from using arbitrary probabilities.  The companion [OTEP
-170](https://github.com/open-telemetry/oteps/pull/170) has discussed
+170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md) has discussed
 the use of a `sampler.adjusted_count` attribute that would not be
 limited to power-of-two values.  Discussion about how to represent the
 effective adjusted count for tail-sampled Spans belongs in [OTEP
-170](https://github.com/open-telemetry/oteps/pull/170), not this OTEP.
+170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), not this OTEP.
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
@@ -438,7 +438,7 @@ propagate the `p` value when the context is not sampled, since
 `ParentBased` samplers will not change the decision.  Although one
 use-case was docmented in Google's early Dapper system (known as
 "inflationary sampling", see
-https://github.com/open-telemetry/oteps/pull/170), the same effect can
+https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), the same effect can
 be achieved using a consistent sampling decision in this framework.
 
 ### Default behavior

From 0acc729676bac7e2eec1f14d35d749aa0117ebd9 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 15 Sep 2021 11:40:57 -0700
Subject: [PATCH 35/42] lint

---
 text/trace/0168-sampling-propagation.md | 36 ++++++++++++-------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 088d49046..970fdd81d 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -18,7 +18,7 @@ A consistent trace sampling decision is one that can be carried out at
 any node in a trace, which supports collecting partial traces.
 OpenTelemetry specifies a built-in `TraceIDRatioBased` Sampler that
 aims to accomplish this goal but was left incomplete (see a
-[TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased) 
+[TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased)
 in the v1.0 Trace specification).
 
 We propose to propagate the necessary information alongside the [W3C
@@ -37,7 +37,7 @@ sampling probability:
 
 This proposal uses 6 bits of information for each of these and does
 not depend on built-in TraceID randomness, which is not sufficiently
-specified for probability sampling at this time.  This proposal closely 
+specified for probability sampling at this time.  This proposal closely
 follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).
 
 ### p-value
@@ -136,14 +136,14 @@ compute the number of leading zeros using hardware support:
 
 ```golang
 import (
-	"math/rand"
-	"math/bits"
+    "math/rand"
+    "math/bits"
 )
 
 func nextRValueLeading() int {
-	x := uint64(rand.Int63()) // 63 least-significant bits are random
-	y := x << 1 | 0x7         // 61 most-significant bits are random
-	return bits.LeadingZeros64(y)
+    x := uint64(rand.Int63()) // 63 least-significant bits are random
+    y := x << 1 | 0x7         // 61 most-significant bits are random
+    return bits.LeadingZeros64(y)
 }
 ```
 
@@ -151,18 +151,18 @@ Or we may compute the number of trailing zeros instead, for example:
 
 ```golang
 import (
-	"math/rand"
+    "math/rand"
 )
 
 func nextRValueTrailing() int {
-	x := uint64(rand.Int63())
-	for r := 0; r < 61; r++ {
-		if x & 0x1 == 0x1 {
-			return r
-		}
-		x = x >> 1
-	}
-	return 61
+    x := uint64(rand.Int63())
+    for r := 0; r < 61; r++ {
+        if x & 0x1 == 0x1 {
+            return r
+        }
+        x = x >> 1
+    }
+    return 61
 }
 ```
 
@@ -415,7 +415,7 @@ to have it implemented in multiple languages.
 Hashing is also computationally expensive. This proposal uses extra
 data to avoid the computational cost of hashing TraceIDs.
 
-### Restriction to power-of-two 
+### Restriction to power-of-two
 
 Restricting head sampling rates to powers of two does not limit tail
 Samplers from using arbitrary probabilities.  The companion [OTEP
@@ -438,7 +438,7 @@ propagate the `p` value when the context is not sampled, since
 `ParentBased` samplers will not change the decision.  Although one
 use-case was docmented in Google's early Dapper system (known as
 "inflationary sampling", see
-https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), the same effect can
+[OTEP 170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md#dappers-inflationary-sampler)), the same effect can
 be achieved using a consistent sampling decision in this framework.
 
 ### Default behavior

From fa2ded169d1e69ca51dfe9521fdfa34f739e5633 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 20 Sep 2021 23:56:31 -0700
Subject: [PATCH 36/42] Remove log_head_adjusteed_count; remove the +1 bias for
 p-values; r now in [0, 62]

---
 text/trace/0168-sampling-propagation.md | 223 ++++++++++++------------
 1 file changed, 115 insertions(+), 108 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 970fdd81d..423c84e64 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -21,11 +21,12 @@ aims to accomplish this goal but was left incomplete (see a
 [TODO](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#traceidratiobased)
 in the v1.0 Trace specification).
 
-We propose to propagate the necessary information alongside the [W3C
-sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) using
-`tracestate` with an `otel` vendor tag, which will require
+We propose a Sampler option to propagate the necessary information
+alongside the [W3C sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) 
+using `tracestate` with an `ot` vendor tag, which will require
 (separately) [specifying how the OpenTelemetry project uses
-`tracestate` itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
+`tracestate`
+itself](https://github.com/open-telemetry/opentelemetry-specification/pull/1852).
 
 ## Explanation
 
@@ -35,10 +36,11 @@ sampling probability:
 1. p-value representing the head trace sampling probability.
 2. r-value representing the "randomness" as the source of consistent sampling decisions.
 
-This proposal uses 6 bits of information for each of these and does
-not depend on built-in TraceID randomness, which is not sufficiently
-specified for probability sampling at this time.  This proposal closely
-follows [research by Otmar Ertl](https://arxiv.org/pdf/2107.07703.pdf).
+This proposal uses 6 bits of information to propagate each of these
+and does not depend on built-in TraceID randomness, which is not
+sufficiently specified for probability sampling at this time.  This
+proposal closely follows [research by Otmar
+Ertl](https://arxiv.org/pdf/2107.07703.pdf).
 
 ### p-value
 
@@ -49,38 +51,39 @@ probabilities to 1/2, 1/4, 1/8, and so on.  We can compactly encode
 these probabilities as small integer values using the base-2 logarithm
 of the adjusted count.
 
-Using six bits of information we can convey unknown and known sampling
-rates as small as 2**-61.  The value 63 is reserved to mean sampling
-with probability 0, which conveys an adjusted count of 0 for the
-associated context.
+Using six bits of information we can convey known sampling rates as
+small as 2**-62.  The value 63 is reserved to mean sampling with
+probability 0, which conveys an adjusted count of 0 for the associated
+context.
 
 When propagated, the "p-value" as it is known will be interpreted as
 shown in the following table.  The p-value for known sampling
-probabilities is the negative base-2 logarithm of the probability,
-offset by +1 to so that the 0 p-value can be treated as unknown (for
-backwards compatibility):
-
-| p-value | Head Probability | Note                                                 |
-| -----   | -----------      | ----                                                 |
-| 0       | Unknown          | Do not propagate `p=0`, instead omit from tracestate |
-| 1       | 1                |                                                      |
-| 2       | 1/2              |                                                      |
-| 3       | 1/4              |                                                      |
-| ...     | ...              |                                                      |
-| N       | 2**(-N+1)        | 1 in 2**(N-1)                                        |
-| ...     | ...              |                                                      |
-| 61      | 2**-60           |                                                      |
-| 62      | 2**-61           |                                                      |
-| 63      | 0                | Maximum encoded value                                |
-
-[Described in OTEP
-170](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md), the
-`ParentBased` sampler will use the incoming context's p-value as
-specified here to set the span's `log_head_adjusted_count` field.
-
-The value `p=0` SHOULD NOT be propagated using `tracestate`
-explicitly, because the equivalent interpretation can be obtained by
-omitting `p`.
+probabilities is the negative base-2 logarithm of the probability:
+
+| p-value | Head Probability |
+| -----   | -----------      |
+| 0       | 1                |
+| 1       | 1/2              |
+| 2       | 1/4              |
+| ...     | ...              |
+| N       | 2**-N            |
+| ...     | ...              |
+| 61      | 2**-61           |
+| 62      | 2**-62           |
+| 63      | 0                |
+
+[As specified in OTEP 170 for the Trace data
+model](https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md),
+head sampling probability can be stored in exported Span data to
+enable span-to-metrics pipelines to be built.  Because `tracestate` is
+already encoded in the OpenTelemetry Span, this proposal is requires
+no changes to the Span protocol.  Accepting this proposal means the
+p-value can be derived from `tracesstate` when the head sampling
+probability is known.
+
+An unknown value for `p` cannot be propagated using `tracestate`
+explicitly, simply omitting `p` conveys an unknown head sampling
+probability.
 
 ### r-value
 
@@ -110,25 +113,25 @@ significant two bits are zero.  In general, with exact probability
 this example scenario.
 
 The r-value specified here directly describes the number of leading
-zeros in a random 61-bit string, specified in a way that does not
+zeros in a random 62-bit string, specified in a way that does not
 require TraceID values to be constructed with random bits in specific
 positions or with hard requirements on their uniformity.  In
 mathematical terms, the r-value is described by a truncated geometric
 distribution, listed below:
 
-| `r` Value        | Probability of `r-value` | Implied sampling probabilities |
+| `r` value        | Probability of `r` value | Implied sampling probabilities |
 | ---------------- | ------------------------ | ----------------------         |
 | 0                | 1/2                      | 1                              |
 | 1                | 1/4                      | 1/2 and above                  |
 | 2                | 1/8                      | 1/4 and above                  |
 | 3                | 1/16                     | 1/8 and above                  |
 | ...              | ...                      | ...                            |
-| 0 <= `r` <= 60   | 1/(2**(-`r`-1))          | 2**-`r` and above              |
+| 0 <= r <= 61     | 1/(2**(-r-1))            | 2**(-r) and above              |
 | ...              | ...                      | ...                            |
-| 58               | 2**-59                   | 2**-58 and above               |
 | 59               | 2**-60                   | 2**-59 and above               |
 | 60               | 2**-61                   | 2**-60 and above               |
-| 61               | 2**-61                   | 2**-61 and above               |
+| 61               | 2**-62                   | 2**-61 and above               |
+| 62               | 2**-62                   | 2**-62 and above               |
 
 Such a random variable `r` can be generated using efficient
 instructions on modern computer architectures, for example we may
@@ -147,7 +150,8 @@ func nextRValueLeading() int {
 }
 ```
 
-Or we may compute the number of trailing zeros instead, for example:
+Or we may compute the number of trailing zeros instead, for example
+(not using special instructions):
 
 ```golang
 import (
@@ -179,7 +183,7 @@ probability p-value (`p`) will be propagated using two bytes of base16
 content for each of the two fields, as follows:
 
 ```
-tracestate: otel=p:PP;r:RR
+tracestate: ot=p:PP;r:RR
 ```
 
 where `PP` are two bytes of base16 p-value and `RR` are two bytes of
@@ -193,23 +197,30 @@ chosen because `traceparent` uses base16 encoding.
 
 ### Examples
 
-The following `tracestate` value:
+The following `tracestate` value is accompanied by `sampled=true`:
 
 ```
-tracestate: otel=r:0a;p:03
+tracestate: ot=r:0a;p:03
 ```
 
-translates to
+and translates to
 
 ```
-base16(p-value) = 03 // 1-in-4 head probability
+base16(p-value) = 03 // 1-in-8 head probability
 base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
 ```
 
-Any `TraceIDRatioBased` Sampler configured with probability 2**-10 or
-greater will enable sampling this trace, whereas any
-`TraceIDRatioBased` Sampler configured with probability 2**-11 or less
-will stop sampling this trace.
+A `ParentBased` Sampler will include `ot=r:0a;p:03` in the stored
+`TraceState` field, allowing consumers to count it as with an adjusted
+count of 8 spans.  The `sampled=true` flag remains set.
+
+A `TraceIDRatioBased` Sampler configured with probability 2**-10 or
+greater will enable `sampled=true` and convey a new head sampling
+probability via `tracestate: ot=r:0a;p:0a`. 
+
+A `TraceIDRatioBased` Sampler configured with probability 2**-11 or
+smaller will set `sampled=false` and remove `p` from the tracestate,
+setting `tracestate: ot=r:0a`.
 
 ## Internal details
 
@@ -236,32 +247,27 @@ If the context is not a new root, output a new `tracestate` with the
 same `r` value as the parent context.
 
 When sampled, in both cases, the context's p-value `p` is set to the
-value of `s+1` in the range [1, 62].  If the sampling probability is
+value of `s` in the range [0, 62].  If the sampling probability is
 zero (the special case where `s` is undefined), use `p=63` the
 specified value for zero probability.
 
-In both cases, set the sampled bit if the outgoing `p` minus one is
-less than or equal to the outgoing `r` (i.e., `p-1 <= r`).
+In both cases, set the sampled bit if the outgoing `p` is less than or
+equal to the outgoing `r` (i.e., `p <= r`).
 
 If the context is not a new root and the incoming context's r-value
 is not set, the implementation SHOULD notify the user of an error
 condition and follow the incoming context's `sampled` flag.
 
-The span's `log_head_adjusted_count` field is set to the outgoing `p`
-unless `r` is unknown, in which case it MUST be set to zero (unknown
-probability).
-
 ### Behavior of the `ParentBased` sampler
 
-The `ParentBased` sampler is modified by this proposal.  It honors
+The `ParentBased` sampler is unmodified by this proposal.  It honors
 the W3C `sampled` flag and copies the incoming `tracestate` keys to
-the child context.
+the child context.  If the incoming context has known head sampling
+probability, so does the Span.
 
-The span's `log_head_adjusted_count` field is set to the incoming
-p-value when both `p` and `r` are defined.  When `r` is not defined,
-the span's `log_head_adjusted_count` MUST be set to 0 indicating
-unknown probability, because the decision cannot be made consistently
-across the trace.
+The span's head probability is known when both `p` and `r` are defined
+are defined in the `ot` sub-key of `tracestate`.  When `r` or `p`
+areis not defined, the span's head sampling probability is unknown.
 
 ### Behavior of the `AlwaysOn` Sampler
 
@@ -279,19 +285,17 @@ The behavior of these tables can be verified by hand using a smaller
 example.  The following table shows how these equations work where
 `r`, `p`, and `s` are limited to 3 bits instead of 6 bits.
 
-Values of `p`, which have the same encoded value and interpretation as
-for the proposed `log_head_adjusted_count` field of OTEP 170, would be
-interpreted as follows:
+Values of `p` are interpreted as follows:
 
 | `p` value | Adjusted count |
 | -----     | -----          |
-| 0         | Unknown        |
-| 1         | 1              |
-| 2         | 2              |
-| 3         | 4              |
-| 4         | 8              |
-| 5         | 16             |
-| 6         | 32             |
+| 0         | 1              |
+| 1         | 2              |
+| 2         | 4              |
+| 3         | 8              |
+| 4         | 16             |
+| 5         | 32             |
+| 6         | 64             |
 | 7         | 0              |
 
 Note there are only 6 non-zero, non-unknown values for the adjusted
@@ -299,22 +303,24 @@ count. Thus there are six defined values of `r` and `s`.  The
 following table shows `r` and the corresponding selection probability,
 along with the calculated adjusted count for each `s`:
 
-| `r` value | probability of `r` | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` |
-| --        | --                 | --    | --    | --    | --    | --    | --    |
-| 0         | 1/2                | 1     | 0     | 0     | 0     | 0     | 0     |
-| 1         | 1/4                | 1     | 2     | 0     | 0     | 0     | 0     |
-| 2         | 1/8                | 1     | 2     | 4     | 0     | 0     | 0     |
-| 3         | 1/16               | 1     | 2     | 4     | 8     | 0     | 0     |
-| 4         | 1/32               | 1     | 2     | 4     | 8     | 16    | 0     |
-| 5         | 1/32               | 1     | 2     | 4     | 8     | 16    | 32    |
+| `r` value | probability of `r` | `s=0` | `s=1` | `s=2` | `s=3` | `s=4` | `s=5` | `s=6` |
+| --        | --                 | --    | --    | --    | --    | --    | --    | --    |
+| 0         | 1/2                | 1     | 0     | 0     | 0     | 0     | 0     | 0     |
+| 1         | 1/4                | 1     | 2     | 0     | 0     | 0     | 0     | 0     |
+| 2         | 1/8                | 1     | 2     | 4     | 0     | 0     | 0     | 0     |
+| 3         | 1/16               | 1     | 2     | 4     | 8     | 0     | 0     | 0     |
+| 4         | 1/32               | 1     | 2     | 4     | 8     | 16    | 0     | 0     |
+| 5         | 1/64               | 1     | 2     | 4     | 8     | 16    | 32    | 0     |
+| 6         | 1/64               | 1     | 2     | 4     | 8     | 16    | 32    | 64    |
 
 Notice that the sum of `r` probability times adjusted count in each of
 the `s=*` columns equals 1.  For example, in the `s=4` column we have
-`0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/32 = 16/32 + 16/32 =
-1`.  In the `s=2` column we have `0*1/2 + 0*1/4 + 4*1/8 + 4*1/16 +
-4*1/32 + 4*1/32 = 4/8 + 4/16 + 4/32 + 4/32 = 1/2 + 1/4 + 1/8 + 1/8 =
-1`.  We conclude that when `r` is chosen with the given probabilities,
-any choice of `s` produces one expected span.
+`0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/64 + 16*1/64 =
+16/32 + 16/64 + 16/64 = 1`.  In the `s=2` column we have `0*1/2 +
+0*1/4 + 4*1/8 + 4*1/16 + 4*1/32 + 4*1/64 + 4*1/64 = 4/8 + 4/16 +
+4/32 + 4/64 + 4/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1`.  We conclude
+that when `r` is chosen with the given probabilities, any choice of
+`s` produces one expected span.
 
 ## Invariant checking
 
@@ -325,11 +331,11 @@ respect to the incoming and outgoing values for `p`, `r`, and
 | Sampler                | Incoming `r` | Incoming `p` | Incoming `sampled` | Outgoing `r`               | Outgoing `p`               | Outgoing `sampled`         |
 | --                     | --           | --           | --                 | --                         | --                         | --                         |
 | Parent                 | unused       | expected     | respected          | checked and passed through | checked and passed through | checked and passed through |
-| TraceIDRatio(Non-Root) | used         | unused       | ignored            | checked and passed through | set to `s+1`               | set to `p-1 <= r`          |
-| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable            | set to `s+1`               | set to `p-1 <= r`          |
+| TraceIDRatio(Non-Root) | used         | unused       | ignored            | checked and passed through | set to `s`                 | set to `p <= r`          |
+| TraceIDRatio(Root)     | n/a          | n/a          | n/a                | random variable            | set to `s`                 | set to `p <= r`          |
 
-There are several cases where the resulting span's
-`log_head_adjusted_count` is unknown:
+There are several cases where the resulting span's head sampling
+probability is unknown:
 
 | Sampler                | Unknown condition |
 | --                     | --                |
@@ -344,30 +350,28 @@ The inputs are recognized as out-of-range as follows:
 | `p < 0`         | drop `p` from tracestate         |
 | `p > 63`        | drop `p` from tracestate         |
 | `r < 0`         | drop `r` and `p` from tracestate |
-| `r > 61`        | drop `r` and `p` from tracestate |
+| `r > 62`        | drop `r` and `p` from tracestate |
 
 There are cases where the combination of `p` and `r` and `sampled` are
 inconsistent with each other.  The `sampled` flag is equivalent to the
-expression `p - 1 <= r`.  When the invariant `sampled <=> p - 1 <= r`
-is violated, the `ParentBased` sampler MUST correct the propagated
-values as discussed below.
+expression `p <= r`.  When the invariant `sampled <=> p <= r` is
+violated, the `ParentBased` sampler MUST correct the propagated values
+as discussed below.
 
 The violation is always addressed by honoring the `sampled` flag and
-setting `log_head_adjusted_count` to either 0 (Unknown) or 63 (Zero).
+correcting `p` to either 63 (for zero adjusted count) or unset (for
+unknown adjusted count).
 
-If `sampled` is false and the invariant is bilated, drop `p` from the
-outgoing context to convey unknown head probability.  Set
-`log_head_adjusted_count` to 0.
+If `sampled` is false and the invariant is violated, drop `p` from the
+outgoing context to convey unknown head probability.
 
 The case where `sampled` is true with `p=63` indicating 0% probability
 may by regarded as a special case to allow zero adjusted count
 sampling, which permits non-probabilistic sampling to take place in
-the presence of probability sampling.  Set `log_head_adjusted_count`
-to 63.
+the presence of probability sampling.  Set `p` to 63.
 
-If `sampled` is true with `p<63`, drop `p` from the outgoing context
-to convey unknown head probability.  Set `log_head_adjusted_count` to
-0.
+If `sampled` is true with `p<63` (but `p>r`), drop `p` from the
+outgoing context to convey unknown head probability.
 
 ## Prototype
 
@@ -450,5 +454,8 @@ of `r` and the setting and propagating of `p` in the tracestate.  If
 opt-out, users would have to disable these features to turn them off.
 The cost and convenience of Sampling features depend on this choice.
 
-This author's recommendation is that these behaviors be opt-out, i.e.,
-on-by-default.  This decision should not block this OTEP.
+This author's recommendation is that these behaviors be opt-in at
+first in order to demonstrate their usefulness.  If it proves
+successful, an on-by-default approach could be proposed using a
+modified W3C trace context `traceparent`, as this would allow p-values
+to be propagated cheaply.

From d119c577bd3248a8dc242a3c3edcff60345861bf Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 21 Sep 2021 10:39:34 -0700
Subject: [PATCH 37/42] Use 7/16

---
 text/trace/0168-sampling-propagation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 423c84e64..9295f024e 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -431,8 +431,8 @@ effective adjusted count for tail-sampled Spans belongs in [OTEP
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
-time.  For example, choosing 1/2 sampling half of the time and 1/4
-sampling half of the time leads to an effective sampling rate of 3/8.
+time.  For example, choosing 1/2 sampling 3/4 of the time and 1/4
+sampling 1/4 of the time leads to an effective sampling rate of 7/16.
 
 ### Propagating `p` when unsampled
 

From 5ea047e7e155a8d9aa2d1a59c44436f42b7ae8ee Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 21 Sep 2021 10:40:49 -0700
Subject: [PATCH 38/42] Use 7/16

---
 text/trace/0168-sampling-propagation.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 9295f024e..248951e5b 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -431,8 +431,12 @@ effective adjusted count for tail-sampled Spans belongs in [OTEP
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
-time.  For example, choosing 1/2 sampling 3/4 of the time and 1/4
-sampling 1/4 of the time leads to an effective sampling rate of 7/16.
+time.  For example, choosing 1/2 sampling or 1/4 in proportion 3:1
+leads to an effective sampling rate
+
+```
+1/2 * 0.75 + 1/4 * 0.25 = 7/16
+```
 
 ### Propagating `p` when unsampled
 

From 28779fe2b693751c9fb54e49493218264d0090c5 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 21 Sep 2021 10:40:57 -0700
Subject: [PATCH 39/42] Use 7/16

---
 text/trace/0168-sampling-propagation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 248951e5b..c9082134d 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -432,7 +432,7 @@ effective adjusted count for tail-sampled Spans belongs in [OTEP
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
 time.  For example, choosing 1/2 sampling or 1/4 in proportion 3:1
-leads to an effective sampling rate
+leads to an effective sampling rate of:
 
 ```
 1/2 * 0.75 + 1/4 * 0.25 = 7/16

From 32c384e33ac930ad112b6fd88e04acc18695ad30 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 27 Sep 2021 22:08:31 -0700
Subject: [PATCH 40/42] 5%

---
 text/trace/0168-sampling-propagation.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index c9082134d..686be273c 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -431,11 +431,12 @@ effective adjusted count for tail-sampled Spans belongs in [OTEP
 
 Restricting head sampling rates to powers of two does not limit
 Samplers from using arbitrary effective probabilities over a period of
-time.  For example, choosing 1/2 sampling or 1/4 in proportion 3:1
-leads to an effective sampling rate of:
+time.  For example, a typical trace sampling rate of 5% (i.e., 1 in
+20) can be accomplished by choosing 1/16 sampling 60% of the time and
+1/32 sampling 40% of the time:
 
 ```
-1/2 * 0.75 + 1/4 * 0.25 = 7/16
+1/16 * 0.6 + 1/32 * 0.4 = 0.05
 ```
 
 ### Propagating `p` when unsampled

From f6ffd02a781f6e17c6eb316fbdaac34638d1e898 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 27 Sep 2021 22:14:26 -0700
Subject: [PATCH 41/42] mention w3c trace context issue 467 (randomess bit);
 move issue 463 to default-on discussion

---
 text/trace/0168-sampling-propagation.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 686be273c..52cf26a87 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -406,7 +406,8 @@ as the number of leading zeros among those 61 random bits.
 However, this would require modifying the W3C traceparent specification,
 therefore we do not propose to use bits of the TraceID.
 
-[This issue has been filed with the W3C trace context group.](https://github.com/w3c/trace-context/issues/463)
+See [W3C
+trace context issue 467](https://github.com/w3c/trace-context/issues/467).
 
 ### Not using TraceID hashing
 
@@ -417,7 +418,7 @@ task to define and specify a good enough hashing function, much less
 to have it implemented in multiple languages.
 
 Hashing is also computationally expensive. This proposal uses extra
-data to avoid the computational cost of hashing TraceIDs.
+data to avoid the computational cost of hashing TraceIDs.  
 
 ### Restriction to power-of-two
 
@@ -464,3 +465,9 @@ first in order to demonstrate their usefulness.  If it proves
 successful, an on-by-default approach could be proposed using a
 modified W3C trace context `traceparent`, as this would allow p-values
 to be propagated cheaply.
+
+See [W3C issue trace context issue
+463](https://github.com/w3c/trace-context/issues/463) which is about
+propagating sampling probability in the `traceparent` header, which
+makes it cheap enough to have on-by-default.
+

From 0a296b5f08d51ea25fe627748985f93a69fc3a8b Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 29 Sep 2021 10:31:51 -0700
Subject: [PATCH 42/42] whitespace

---
 text/trace/0168-sampling-propagation.md | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/text/trace/0168-sampling-propagation.md b/text/trace/0168-sampling-propagation.md
index 52cf26a87..3a1ee490c 100644
--- a/text/trace/0168-sampling-propagation.md
+++ b/text/trace/0168-sampling-propagation.md
@@ -22,7 +22,7 @@ aims to accomplish this goal but was left incomplete (see a
 in the v1.0 Trace specification).
 
 We propose a Sampler option to propagate the necessary information
-alongside the [W3C sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag) 
+alongside the [W3C sampled flag](https://www.w3.org/TR/trace-context/#sampled-flag)
 using `tracestate` with an `ot` vendor tag, which will require
 (separately) [specifying how the OpenTelemetry project uses
 `tracestate`
@@ -216,7 +216,7 @@ count of 8 spans.  The `sampled=true` flag remains set.
 
 A `TraceIDRatioBased` Sampler configured with probability 2**-10 or
 greater will enable `sampled=true` and convey a new head sampling
-probability via `tracestate: ot=r:0a;p:0a`. 
+probability via `tracestate: ot=r:0a;p:0a`.
 
 A `TraceIDRatioBased` Sampler configured with probability 2**-11 or
 smaller will set `sampled=false` and remove `p` from the tracestate,
@@ -418,7 +418,7 @@ task to define and specify a good enough hashing function, much less
 to have it implemented in multiple languages.
 
 Hashing is also computationally expensive. This proposal uses extra
-data to avoid the computational cost of hashing TraceIDs.  
+data to avoid the computational cost of hashing TraceIDs.
 
 ### Restriction to power-of-two
 
@@ -470,4 +470,3 @@ See [W3C issue trace context issue
 463](https://github.com/w3c/trace-context/issues/463) which is about
 propagating sampling probability in the `traceparent` header, which
 makes it cheap enough to have on-by-default.
-