Skip to content

Commit

Permalink
fix(opentelemetry): Fix span & sampling propagation (#11092)
Browse files Browse the repository at this point in the history
OK, this was a tricky one, but I _think_ it now works as expected.

This PR fixes to fundamental issues with sampling & propagation that
were uncovered by @Lms24 & myself while trying to use OTEL for remix &
sveltekit:

1. `continueTrace` updates the propagation context, but if there is an
active parent span (even a remote one) this is ignored.
2. Sampling inheritance did not work as expected, due to the fact that
OTEL spans cannot differentiate between `sampled=false` (sampled to be
not recorded) and `sampled=undefined` (no sampling decision yet).

## Update to `continueTrace` & trace propagation

While my first instinct was to ensure that in the trace methods, if we
have remote span we ignore it and look at the propagation context, this
has a bunch of problems - because it means we can run out of sync, if
this is set from outside, etc.

So instead, I now provide a custom `continueTrace` method from
`@sentry/opentelemetry` & `@sentry/node` which should be used instead of
the core one in meta SDKs. This method will, in addition to updating the
propagation context, _also_ create a remote span with the passed in
data, and make it the active span in the callback.

Then, I updated the otel start span APIs to always use that, if it
exists (which was already the behavior we had), PLUS also added behavior
that if there is no active span at all (not even a remote one), _then_
we look at the propagation context.

## Update to sampling inheritance

Previously, we basically did the following:

```ts
const sampled: Boolean | undefined = spanContext.traceFlags === TraceFlags.SAMPLED; 
// this will always be true or false, never undefined
```

Which means that if we create a remote span from a minimal propagation
context:

```ts
// This could be a generated propagation context from a scope
const propagationContext = { spanId: 'xxx', traceId: 'yyy' };

const spanContext: SpanContext = {
  sampled: propagationContext.sampled ? TraceFlags.SAMPLED : TraceFlags.NONE,
}
```

We would later always get `sampled: false`, and inherit this decision
for all downstream spans - instead of treating it as `undefined`, and
going through the sampler, as we actually want it to.

In order to "solve" this, I added a new trace state
`SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING`, which we set if we _know_
this is actually `sampled: false`, and not just unset.

Then, based on this we can interpret `sampled` as being `false` or
`undefined`, respectively.

This is a bit hacky but should work - it means that if we get a sampling
decision from outside we'll treat it as `undefined`, which is OK I would
say. Our own sampler will set this correctly so we inherit correctly as
well, and our propagator does so too.

---------

Co-authored-by: Lukas Stracke <lukas.stracke@sentry.io>
  • Loading branch information
mydea and Lms24 authored Mar 14, 2024
1 parent a76e4a2 commit 0f1ea2f
Show file tree
Hide file tree
Showing 16 changed files with 603 additions and 79 deletions.
7 changes: 2 additions & 5 deletions packages/astro/src/index.types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,8 @@ export declare const getActiveSpan: typeof clientSdk.getActiveSpan;
// eslint-disable-next-line deprecation/deprecation
export declare const getCurrentHub: typeof clientSdk.getCurrentHub;
export declare const getClient: typeof clientSdk.getClient;
export declare const startSpan: typeof clientSdk.startSpan;
export declare const startInactiveSpan: typeof clientSdk.startInactiveSpan;
export declare const startSpanManual: typeof clientSdk.startSpanManual;
export declare const withActiveSpan: typeof clientSdk.withActiveSpan;
export declare const getRootSpan: typeof clientSdk.getRootSpan;
export declare const continueTrace: typeof clientSdk.continueTrace;

export declare const Span: clientSdk.Span;

export declare const metrics: typeof clientSdk.metrics & typeof serverSdk.metrics;
Expand Down
3 changes: 1 addition & 2 deletions packages/core/src/utils/spanUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,7 @@ export function spanIsSampled(span: Span): boolean {
// We align our trace flags with the ones OpenTelemetry use
// So we also check for sampled the same way they do.
const { traceFlags } = span.spanContext();
// eslint-disable-next-line no-bitwise
return Boolean(traceFlags & TRACE_FLAG_SAMPLED);
return traceFlags === TRACE_FLAG_SAMPLED;
}

/** Get the status message to use for a JSON representation of a span. */
Expand Down
5 changes: 4 additions & 1 deletion packages/node-experimental/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ export {
extractRequestData,
} from '@sentry/utils';

// These are custom variants that need to be used instead of the core one
// As they have slightly different implementations
export { continueTrace } from '@sentry/opentelemetry';

export {
addBreadcrumb,
isInitialized,
Expand Down Expand Up @@ -78,7 +82,6 @@ export {
setCurrentClient,
Scope,
setMeasurement,
continueTrace,
getSpanDescendants,
parameterize,
getCurrentScope,
Expand Down
10 changes: 8 additions & 2 deletions packages/node-experimental/test/integration/scope.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ describe('Integration | Scope', () => {
trace: {
span_id: spanId,
trace_id: traceId,
// local span ID from propagation context
...(enableTracing ? { parent_span_id: expect.any(String) } : undefined),
},
}),
}),
Expand Down Expand Up @@ -110,6 +112,8 @@ describe('Integration | Scope', () => {
status: 'ok',
trace_id: traceId,
origin: 'manual',
// local span ID from propagation context
parent_span_id: expect.any(String),
},
}),
spans: [],
Expand Down Expand Up @@ -194,7 +198,8 @@ describe('Integration | Scope', () => {
? {
span_id: spanId1,
trace_id: traceId1,
parent_span_id: undefined,
// local span ID from propagation context
...(enableTracing ? { parent_span_id: expect.any(String) } : undefined),
}
: expect.any(Object),
}),
Expand All @@ -220,7 +225,8 @@ describe('Integration | Scope', () => {
? {
span_id: spanId2,
trace_id: traceId2,
parent_span_id: undefined,
// local span ID from propagation context
...(enableTracing ? { parent_span_id: expect.any(String) } : undefined),
}
: expect.any(Object),
}),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,8 @@ describe('Integration | Transactions', () => {
status: 'ok',
trace_id: expect.any(String),
origin: 'auto.test',
// local span ID from propagation context
parent_span_id: expect.any(String),
},
}),
spans: [expect.any(Object), expect.any(Object)],
Expand Down Expand Up @@ -312,6 +314,8 @@ describe('Integration | Transactions', () => {
status: 'ok',
trace_id: expect.any(String),
origin: 'manual',
// local span ID from propagation context
parent_span_id: expect.any(String),
},
}),
spans: [expect.any(Object), expect.any(Object)],
Expand Down
4 changes: 3 additions & 1 deletion packages/opentelemetry/src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@ import { createContextKey } from '@opentelemetry/api';

export const SENTRY_TRACE_HEADER = 'sentry-trace';
export const SENTRY_BAGGAGE_HEADER = 'baggage';
export const SENTRY_TRACE_STATE_DSC = 'sentry.trace';

export const SENTRY_TRACE_STATE_DSC = 'sentry.dsc';
export const SENTRY_TRACE_STATE_PARENT_SPAN_ID = 'sentry.parent_span_id';
export const SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING = 'sentry.sampled_not_recording';

export const SENTRY_SCOPES_CONTEXT_KEY = createContextKey('sentry_scopes');

Expand Down
2 changes: 1 addition & 1 deletion packages/opentelemetry/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export {
export { isSentryRequestSpan } from './utils/isSentryRequest';

export { getActiveSpan } from './utils/getActiveSpan';
export { startSpan, startSpanManual, startInactiveSpan, withActiveSpan } from './trace';
export { startSpan, startSpanManual, startInactiveSpan, withActiveSpan, continueTrace } from './trace';

// eslint-disable-next-line deprecation/deprecation
export { setupGlobalHub } from './custom/hub';
Expand Down
125 changes: 100 additions & 25 deletions packages/opentelemetry/src/propagator.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import type { Baggage, Context, SpanContext, TextMapGetter, TextMapSetter } from '@opentelemetry/api';
import { context } from '@opentelemetry/api';
import { TraceFlags, propagation, trace } from '@opentelemetry/api';
import { TraceState, W3CBaggagePropagator, isTracingSuppressed } from '@opentelemetry/core';
import type { continueTrace } from '@sentry/core';
import { getClient, getCurrentScope, getDynamicSamplingContextFromClient, getIsolationScope } from '@sentry/core';
import type { DynamicSamplingContext, PropagationContext } from '@sentry/types';
import {
Expand All @@ -16,18 +18,20 @@ import {
SENTRY_TRACE_HEADER,
SENTRY_TRACE_STATE_DSC,
SENTRY_TRACE_STATE_PARENT_SPAN_ID,
SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING,
} from './constants';
import { getScopesFromContext, setScopesOnContext } from './utils/contextData';
import { setIsSetup } from './utils/setupCheck';

/** Get the Sentry propagation context from a span context. */
export function getPropagationContextFromSpanContext(spanContext: SpanContext): PropagationContext {
const { traceId, spanId, traceFlags, traceState } = spanContext;
const { traceId, spanId, traceState } = spanContext;

const dscString = traceState ? traceState.get(SENTRY_TRACE_STATE_DSC) : undefined;
const dsc = dscString ? baggageHeaderToDynamicSamplingContext(dscString) : undefined;
const parentSpanId = traceState ? traceState.get(SENTRY_TRACE_STATE_PARENT_SPAN_ID) : undefined;
const sampled = traceFlags === TraceFlags.SAMPLED;

const sampled = getSamplingDecision(spanContext);

return {
traceId,
Expand Down Expand Up @@ -78,32 +82,18 @@ export class SentryPropagator extends W3CBaggagePropagator {
*/
public extract(context: Context, carrier: unknown, getter: TextMapGetter): Context {
const maybeSentryTraceHeader: string | string[] | undefined = getter.get(carrier, SENTRY_TRACE_HEADER);
const maybeBaggageHeader = getter.get(carrier, SENTRY_BAGGAGE_HEADER);
const baggage = getter.get(carrier, SENTRY_BAGGAGE_HEADER);

const sentryTraceHeader = maybeSentryTraceHeader
const sentryTrace = maybeSentryTraceHeader
? Array.isArray(maybeSentryTraceHeader)
? maybeSentryTraceHeader[0]
: maybeSentryTraceHeader
: undefined;

const propagationContext = propagationContextFromHeaders(sentryTraceHeader, maybeBaggageHeader);

// We store the DSC as OTEL trace state on the span context
const traceState = makeTraceState({
parentSpanId: propagationContext.parentSpanId,
dsc: propagationContext.dsc,
});

const spanContext: SpanContext = {
traceId: propagationContext.traceId,
spanId: propagationContext.parentSpanId || '',
isRemote: true,
traceFlags: propagationContext.sampled === true ? TraceFlags.SAMPLED : TraceFlags.NONE,
traceState,
};
const propagationContext = propagationContextFromHeaders(sentryTrace, baggage);

// Add remote parent span context,
const ctxWithSpanContext = trace.setSpanContext(context, spanContext);
const ctxWithSpanContext = getContextWithRemoteActiveSpan(context, { sentryTrace, baggage });

// Also update the scope on the context (to be sure this is picked up everywhere)
const scopes = getScopesFromContext(ctxWithSpanContext);
Expand All @@ -128,8 +118,13 @@ export class SentryPropagator extends W3CBaggagePropagator {
export function makeTraceState({
parentSpanId,
dsc,
}: { parentSpanId?: string; dsc?: Partial<DynamicSamplingContext> }): TraceState | undefined {
if (!parentSpanId && !dsc) {
sampled,
}: {
parentSpanId?: string;
dsc?: Partial<DynamicSamplingContext>;
sampled?: boolean;
}): TraceState | undefined {
if (!parentSpanId && !dsc && sampled !== false) {
return undefined;
}

Expand All @@ -140,7 +135,11 @@ export function makeTraceState({
? new TraceState().set(SENTRY_TRACE_STATE_PARENT_SPAN_ID, parentSpanId)
: new TraceState();

return dscString ? traceStateBase.set(SENTRY_TRACE_STATE_DSC, dscString) : traceStateBase;
const traceStateWithDsc = dscString ? traceStateBase.set(SENTRY_TRACE_STATE_DSC, dscString) : traceStateBase;

// We also specifically want to store if this is sampled to be not recording,
// or unsampled (=could be either sampled or not)
return sampled === false ? traceStateWithDsc.set(SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING, '1') : traceStateWithDsc;
}

function getInjectionData(context: Context): {
Expand All @@ -161,7 +160,7 @@ function getInjectionData(context: Context): {
dynamicSamplingContext,
traceId: spanContext.traceId,
spanId: spanContext.spanId,
sampled: spanContext.traceFlags === TraceFlags.SAMPLED,
sampled: getSamplingDecision(spanContext),
};
}

Expand All @@ -188,7 +187,7 @@ function getInjectionData(context: Context): {
dynamicSamplingContext,
traceId: spanContext.traceId,
spanId: spanContext.spanId,
sampled: spanContext.traceFlags === TraceFlags.SAMPLED,
sampled: getSamplingDecision(spanContext),
};
}

Expand Down Expand Up @@ -221,3 +220,79 @@ function getDynamicSamplingContext(

return undefined;
}

function getContextWithRemoteActiveSpan(
ctx: Context,
{ sentryTrace, baggage }: Parameters<typeof continueTrace>[0],
): Context {
const propagationContext = propagationContextFromHeaders(sentryTrace, baggage);

// We store the DSC as OTEL trace state on the span context
const traceState = makeTraceState({
parentSpanId: propagationContext.parentSpanId,
dsc: propagationContext.dsc,
sampled: propagationContext.sampled,
});

const spanContext: SpanContext = {
traceId: propagationContext.traceId,
spanId: propagationContext.parentSpanId || '',
isRemote: true,
traceFlags: propagationContext.sampled ? TraceFlags.SAMPLED : TraceFlags.NONE,
traceState,
};

return trace.setSpanContext(ctx, spanContext);
}

/**
* Takes trace strings and propagates them as a remote active span.
* This should be used in addition to `continueTrace` in OTEL-powered environments.
*/
export function continueTraceAsRemoteSpan<T>(
ctx: Context,
options: Parameters<typeof continueTrace>[0],
callback: () => T,
): T {
const ctxWithSpanContext = getContextWithRemoteActiveSpan(ctx, options);

return context.with(ctxWithSpanContext, callback);
}

/**
* OpenTelemetry only knows about SAMPLED or NONE decision,
* but for us it is important to differentiate between unset and unsampled.
*
* Both of these are identified as `traceFlags === TracegFlags.NONE`,
* but we additionally look at a special trace state to differentiate between them.
*/
export function getSamplingDecision(spanContext: SpanContext): boolean | undefined {
const { traceFlags, traceState } = spanContext;

const sampledNotRecording = traceState ? traceState.get(SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING) === '1' : false;

// If trace flag is `SAMPLED`, we interpret this as sampled
// If it is `NONE`, it could mean either it was sampled to be not recorder, or that it was not sampled at all
// For us this is an important difference, sow e look at the SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING
// to identify which it is
if (traceFlags === TraceFlags.SAMPLED) {
return true;
}

if (sampledNotRecording) {
return false;
}

// Fall back to DSC as a last resort, that may also contain `sampled`...
const dscString = traceState ? traceState.get(SENTRY_TRACE_STATE_DSC) : undefined;
const dsc = dscString ? baggageHeaderToDynamicSamplingContext(dscString) : undefined;

if (dsc?.sampled === 'true') {
return true;
}
if (dsc?.sampled === 'false') {
return false;
}

return undefined;
}
13 changes: 9 additions & 4 deletions packages/opentelemetry/src/sampler.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
/* eslint-disable no-bitwise */
import type { Attributes, Context, SpanContext } from '@opentelemetry/api';
import { TraceFlags, isSpanContextValid, trace } from '@opentelemetry/api';
import { isSpanContextValid, trace } from '@opentelemetry/api';
import { TraceState } from '@opentelemetry/core';
import type { Sampler, SamplingResult } from '@opentelemetry/sdk-trace-base';
import { SamplingDecision } from '@opentelemetry/sdk-trace-base';
import { SEMANTIC_ATTRIBUTE_SENTRY_SAMPLE_RATE, hasTracingEnabled } from '@sentry/core';
import type { Client, ClientOptions, SamplingContext } from '@sentry/types';
import { isNaN, logger } from '@sentry/utils';
import { SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING } from './constants';

import { DEBUG_BUILD } from './debug-build';
import { getPropagationContextFromSpanContext } from './propagator';
import { getPropagationContextFromSpanContext, getSamplingDecision } from './propagator';
import { setIsSetup } from './utils/setupCheck';

/**
Expand Down Expand Up @@ -38,6 +39,7 @@ export class SentrySampler implements Sampler {
}

const parentContext = trace.getSpanContext(context);
const traceState = parentContext?.traceState || new TraceState();

let parentSampled: boolean | undefined = undefined;

Expand All @@ -49,7 +51,7 @@ export class SentrySampler implements Sampler {
DEBUG_BUILD &&
logger.log(`[Tracing] Inheriting remote parent's sampled decision for ${spanName}: ${parentSampled}`);
} else {
parentSampled = Boolean(parentContext.traceFlags & TraceFlags.SAMPLED);
parentSampled = getSamplingDecision(parentContext);
DEBUG_BUILD && logger.log(`[Tracing] Inheriting parent's sampled decision for ${spanName}: ${parentSampled}`);
}
}
Expand All @@ -76,6 +78,7 @@ export class SentrySampler implements Sampler {
return {
decision: SamplingDecision.NOT_RECORD,
attributes,
traceState: traceState.set(SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING, '1'),
};
}

Expand All @@ -93,6 +96,7 @@ export class SentrySampler implements Sampler {
return {
decision: SamplingDecision.NOT_RECORD,
attributes,
traceState: traceState.set(SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING, '1'),
};
}

Expand All @@ -112,6 +116,7 @@ export class SentrySampler implements Sampler {
return {
decision: SamplingDecision.NOT_RECORD,
attributes,
traceState: traceState.set(SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING, '1'),
};
}

Expand Down
2 changes: 1 addition & 1 deletion packages/opentelemetry/src/spanProcessor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ function onSpanStart(span: Span, parentContext: Context): void {
let scopes = getScopesFromContext(parentContext);

// We need access to the parent span in order to be able to move up the span tree for breadcrumbs
if (parentSpan) {
if (parentSpan && !parentSpan.spanContext().isRemote) {
addChildSpanToSpan(parentSpan, span);
}

Expand Down
Loading

0 comments on commit 0f1ea2f

Please sign in to comment.