Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a skeleton of the trace data model #2012

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ release.

### Traces

- Adds text specifying how to interpret Span data in the OpenTelemetry trace data model. ([#2012](https://github.com/open-telemetry/opentelemetry-specification/pull/2012))

### Metrics

- Add optional min / max fields to histogram data model.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Technical committee holds regular meetings, notes are held
- Data Specification
- [Semantic Conventions](specification/overview.md#semantic-conventions)
- [Protocol](specification/protocol/README.md)
- [Traces](specification/trace/datamodel.md)
- [Metrics](specification/metrics/datamodel.md)
- [Logs](specification/logs/data-model.md)
- About the Project
Expand Down
169 changes: 169 additions & 0 deletions specification/trace/datamodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Trace Data Model

**Status**: [Mixed](../document-status.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this Mixed? Do we have anything unstable in the trace data model?


<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [Overview](#overview)
- [Glossary](#glossary)
* [Trace](#trace)
* [Span](#span)
* [Root span](#root-span)
* [Context](#context)
* [Span context](#span-context)
* [Trace flags](#trace-flags)
* [Tracestate](#tracestate)
- [Span fields](#span-fields)
* [TraceID](#traceid)
* [SpanID](#spanid)
* [TraceState](#tracestate)
* [ParentSpanID](#parentspanid)
* [Name](#name)
* [SpanKind](#spankind)
* [StartTimeUnixNano](#starttimeunixnano)
* [EndTimeUnixNano](#endtimeunixnano)
* [Attributes](#attributes)
* [Events](#events)
* [Links](#links)
* [Status](#status)

<!-- tocstop -->

## Overview

**Status**: [Stable](../document-status.md)

The OpenTelemetry data model for tracing consists of a protocol
specification for encoding spans, which represent an individual unit
Comment on lines +38 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol specification for encoding spans

This combination sounds very confusing to me. I think you're defining a logical data model that includes entities, their attributes, and relationships between entities. So I don't understand what to make of the words "protocol", "specification", and "encoding" in this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is: whose data model are you trying to describe, the API's or the SDK's? They have different shapes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to write a document that helps someone interpret the OTLP Span protocol without reading a .proto file, where I can add further detail about how to interpret the OTel tracestate field used for probability sampling. Right now the .proto file says this of tracestate:

  // trace_state conveys information about request position in multiple distributed tracing graphs.
  // It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
  // See also https://github.com/w3c/distributed-tracing for more details about this field.

Tangentially, I'm actually not sure how multi-tenancy is expected to work in W3C and it's not a part of the SDK or the API's model. See #1852 (comment)

In any case, I think there should be one model--what differences between the API and the SDK "shape" should be reflected in this kind of document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each layer can have a different logical model of the data it operates on. The API data model would be the smallest surface, SDK's data model could be larger (e.g. in OpenTracing API there was nothing related to sampling, but in Jaeger SDK implementing that API there was is_sampled state on the span context, as well as other fields that were implementation details of the SDKs exposed to its internal interfaces). OTLP as a wire format is another layer which might have yet another data model (e.g. it has things like ResourceSpans, which are not a concept in the API).

So you may be trying to write a document that describes the logical data model of OTLP (aka physical data model since there's no more abstractions left), but such description would be ~90% consisting of the logical data model of the SDK, which in turn is only a small extension of the API data model.

a document that helps someone interpret the OTLP Span protocol without reading a .proto file

For the record, can't say I sympathize with this goal, what's wrong with reading the specification of the protocol to understand the protocol? It's another thing that .proto should not need to explain what a Span is or what Span name requirements are, that should come from the other parts of the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics and logs data model files are written without reference to the SDK and API specifications, and I'm not sure why someone should have to understand either to interpret trace data. The reason this happened in the metrics specification is that there are complicated relationships between fields that can't be documented in a single field. An example in trace is "root span". We need to use this term to explain a bunch of things, but it doesn't have a field in the proto. In metrics, we decided that datamodel.md should be the source of truth, not the .proto.

In practice, I haven't followed the trace specification as closely as I have metrics, and now that I'm invested in adding probability sampling to the specification, I think this is sorely needed. As an example, I've just read through the current api.md and sdk.md to try to understand how tracestate is specified. It's incomplete, and the ground is already unstable because W3C refers to "tenants" and we have no equivalent concepts in OTel.

  • There are APIs given for manipulating tracestate, but no explanation of what it is or why you'd use it (WHY WOULD A USER USE THIS API?)
  • There is no mention of tracestate in a Span Link
  • You have to read between the lines to determine that the Span data contains the tracestate returned by the Sampler, but even then it rests on assumptions. E.g., "A Tracestate that will be associated with the Span through the new
    SpanContext." doesn't actually say that the recorded span object will contain the associated tracestate, but again if you go to the .proto all it says is:
  // trace_state conveys information about request position in multiple distributed tracing graphs.
  // It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
  // See also https://github.com/w3c/distributed-tracing for more details about this field.

So, I'll try again.

of work done in a distributed system.

## Glossary

### Trace

A trace is comprised of a number of spans, connected with each other
through parent-child relationships, that describes a unit of work in a
distributed system.

### Span

Each component in a distributed system contributes a span
corresponding to a named operation, representing its part in the
overall, distributed unit of work.

### Root span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually part of the model? Is there an attribute or trait that tags a span as root span? I think this belongs to the glossary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is in the Glossary section.)

This is why I bring up the question about multi-tenancy, which I don't see as a part of the OpenTelemetry data model at this time (despite @Oberon00's remarks).


A root span is the span that initiates a unit of work in a distributed
system. The root span is considered to have caused all the subsequent
spans belonging to the trace.

### Context

OpenTelemetry defines [Context](../context/context.md) as a means of
passing values for use in telemetry across program execution
boundaries.

### Span context

Span context is the portion of the OpenTelemetry Context that makes up
the tracing data model. This is specified by reference to the [W3C
Comment on lines +70 to +71
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes up the tracing data model.

This is not quite clear. Can you expand/reword?

trace context](https://www.w3.org/TR/trace-context/) specification,
which defines four parts of the span context:

1. TraceID
2. SpanID
3. Trace flags
4. Tracestate

The first three of these fields are included in the W3C trace context
[`traceparent`](https://www.w3.org/TR/trace-context/#traceparent-header)
header.

### Trace flags

The W3C trace context defines one flag at present, `sampled`, which
OpenTelemetry uses to make sampling decisions based on the context.

### Tracestate

The W3C trace context defines a field known as
[`tracestate`](https://www.w3.org/TR/trace-context/#tracestate-header)
which enables extending the context with vendor-specific information.

## Span fields

### TraceID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same TraceId capitalization as the log data model?


**Status**: [Stable](../document-status.md)

The OpenTelemetry TraceID is defined to be equivalent to the W3C trace
context `trace-id` field, consisting of 128-bits of information and
assigned to the new trace when starting a root span.

### SpanID

**Status**: [Stable](../document-status.md)

The OpenTelemetry SpanID is defined to identify the span that is the
parent of a new trace context, equivalent to the W3C trace context
Comment on lines +109 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span that is the parent of a new trace context

This sentence is confusing. What is trace context? How can span be a parent of it? Does every SpanId identify a parent?

`parent-id` identifier in the context of a new span, consisting of
64-bits of informaiton.

### TraceState

**Status**: [Stable](../document-status.md)

The OpenTelemetry Span encodes the `tracestate` that was computed when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a circular definition. Can we give a semantic definition of what it represents rather than how it's populated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that there is a semantic definition for tracestate, I see it as a thing W3C gives vendors to use as they like. We're already in questionable territory, IMO, because we're calling OpenTelemetry a "vendor" when we declare a use of tracestate for ourselves.

See also #1852 (comment) where it seems there are at least two semantic definitions possible ("universal" and "per-tenant")

the Span started.

### ParentSpanID

**Status**: [Stable](../document-status.md)

The OpenTelemetry Span contains a ParentSpanID field which for
non-root spans refers to the W3C `parent-id` identifiers that was in
the trace context when it started (i.e., it is the SpanID of the
parent span for non-root spans).

### Name

**Status**: [Stable](../document-status.md)

The OpenTelemetry Span name is a short, human-readable description of
the work performed within the span's context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the work performed within the span's context.
the work performed by the application and represented by the span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we already have the definition of things like span name elsewhere in the spec. I think we should think about eliminating such duplication and referencing other parts of the spec. To me the reasonable hierarchy would be: start with logical data model that describes what the model entities/attributes represent, which then refer to it in the API spec to describe the operations on those entities (but no longer describing the meaning of entities/attributes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll have to refactor this PR.
There is one sentence in api.md that belongs here, and instead that document can link to this one.


### SpanKind

TODO(issue #1929): complete the span data model text.

### StartTimeUnixNano

TODO(issue #1929): complete the span data model text.

### EndTimeUnixNano

TODO(issue #1929): complete the span data model text.

### Attributes

TODO(issue #1929): complete the span data model text.

Define dropped_attributes_count here.

### Events

TODO(issue #1929): complete the span data model text.

Define dropped_events_count here.

### Links

TODO(issue #1929): complete the span data model text.

Define dropped_links_count here.

### Status

TODO(issue #1929): complete the span data model text.