-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a skeleton of the trace data model #2012
Conversation
Span context is the portion of the OpenTelemetry Context that makes up | ||
the tracing data model. This is specified by reference to the [W3C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes up the tracing data model.
This is not quite clear. Can you expand/reword?
|
||
## Span fields | ||
|
||
### TraceID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same TraceId
capitalization as the log data model?
@@ -0,0 +1,169 @@ | |||
# Trace Data Model | |||
|
|||
**Status**: [Mixed](../document-status.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this Mixed? Do we have anything unstable in the trace data model?
The OpenTelemetry data model for tracing consists of a protocol | ||
specification for encoding spans, which represent an individual unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protocol specification for encoding spans
This combination sounds very confusing to me. I think you're defining a logical data model that includes entities, their attributes, and relationships between entities. So I don't understand what to make of the words "protocol", "specification", and "encoding" in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is: whose data model are you trying to describe, the API's or the SDK's? They have different shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to write a document that helps someone interpret the OTLP Span protocol without reading a .proto
file, where I can add further detail about how to interpret the OTel tracestate
field used for probability sampling. Right now the .proto
file says this of tracestate:
// trace_state conveys information about request position in multiple distributed tracing graphs.
// It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
// See also https://github.com/w3c/distributed-tracing for more details about this field.
Tangentially, I'm actually not sure how multi-tenancy is expected to work in W3C and it's not a part of the SDK or the API's model. See #1852 (comment)
In any case, I think there should be one model--what differences between the API and the SDK "shape" should be reflected in this kind of document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each layer can have a different logical model of the data it operates on. The API data model would be the smallest surface, SDK's data model could be larger (e.g. in OpenTracing API there was nothing related to sampling, but in Jaeger SDK implementing that API there was is_sampled
state on the span context, as well as other fields that were implementation details of the SDKs exposed to its internal interfaces). OTLP as a wire format is another layer which might have yet another data model (e.g. it has things like ResourceSpans, which are not a concept in the API).
So you may be trying to write a document that describes the logical data model of OTLP (aka physical data model since there's no more abstractions left), but such description would be ~90% consisting of the logical data model of the SDK, which in turn is only a small extension of the API data model.
a document that helps someone interpret the OTLP Span protocol without reading a .proto file
For the record, can't say I sympathize with this goal, what's wrong with reading the specification of the protocol to understand the protocol? It's another thing that .proto should not need to explain what a Span is or what Span name requirements are, that should come from the other parts of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics and logs data model files are written without reference to the SDK and API specifications, and I'm not sure why someone should have to understand either to interpret trace data. The reason this happened in the metrics specification is that there are complicated relationships between fields that can't be documented in a single field. An example in trace is "root span". We need to use this term to explain a bunch of things, but it doesn't have a field in the proto. In metrics, we decided that datamodel.md
should be the source of truth, not the .proto
.
In practice, I haven't followed the trace specification as closely as I have metrics, and now that I'm invested in adding probability sampling to the specification, I think this is sorely needed. As an example, I've just read through the current api.md
and sdk.md
to try to understand how tracestate
is specified. It's incomplete, and the ground is already unstable because W3C refers to "tenants" and we have no equivalent concepts in OTel.
- There are APIs given for manipulating tracestate, but no explanation of what it is or why you'd use it (WHY WOULD A USER USE THIS API?)
- There is no mention of tracestate in a Span Link
- You have to read between the lines to determine that the Span data contains the tracestate returned by the Sampler, but even then it rests on assumptions. E.g., "A
Tracestate
that will be associated with theSpan
through the new
SpanContext
." doesn't actually say that the recorded span object will contain the associated tracestate, but again if you go to the.proto
all it says is:
// trace_state conveys information about request position in multiple distributed tracing graphs.
// It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
// See also https://github.com/w3c/distributed-tracing for more details about this field.
So, I'll try again.
corresponding to a named operation, representing its part in the | ||
overall, distributed unit of work. | ||
|
||
### Root span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually part of the model? Is there an attribute or trait that tags a span as root span? I think this belongs to the glossary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is in the Glossary section.)
This is why I bring up the question about multi-tenancy, which I don't see as a part of the OpenTelemetry data model at this time (despite @Oberon00's remarks).
|
||
**Status**: [Stable](../document-status.md) | ||
|
||
The OpenTelemetry Span encodes the `tracestate` that was computed when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a circular definition. Can we give a semantic definition of what it represents rather than how it's populated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that there is a semantic definition for tracestate, I see it as a thing W3C gives vendors to use as they like. We're already in questionable territory, IMO, because we're calling OpenTelemetry a "vendor" when we declare a use of tracestate for ourselves.
See also #1852 (comment) where it seems there are at least two semantic definitions possible ("universal" and "per-tenant")
**Status**: [Stable](../document-status.md) | ||
|
||
The OpenTelemetry Span name is a short, human-readable description of | ||
the work performed within the span's context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the work performed within the span's context. | |
the work performed by the application and represented by the span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we already have the definition of things like span name elsewhere in the spec. I think we should think about eliminating such duplication and referencing other parts of the spec. To me the reasonable hierarchy would be: start with logical data model that describes what the model entities/attributes represent, which then refer to it in the API spec to describe the operations on those entities (but no longer describing the meaning of entities/attributes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll have to refactor this PR.
There is one sentence in api.md
that belongs here, and instead that document can link to this one.
The OpenTelemetry SpanID is defined to identify the span that is the | ||
parent of a new trace context, equivalent to the W3C trace context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span that is the parent of a new trace context
This sentence is confusing. What is trace context
? How can span be a parent of it? Does every SpanId
identify a parent?
I will incorporate the feedback here and re-open the PR when it is ready for review again. @yurishkuro I am most interested in your reply to #2012 (comment) and would be glad for you to join the discussion in #1852 while I continue to edit this document. |
Part of #1929.
Changes
Adds text specifying how to interpret Span data in the OpenTelemetry trace data model.
Related PR #1899
Prior to incorporating the changes of OTEP 168, we need a foundation.