Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to run workflows with OpenTelemetry and ddtrace #733

Open
connected-bkiiskila opened this issue Jan 16, 2025 · 8 comments
Open
Labels
bug Something isn't working

Comments

@connected-bkiiskila
Copy link

What are you really trying to do?

I'm attempting to enable tracing on my worker using the TracingInterceptor and the tracer provided by ddtrace.

Describe the bug

A change upstream with the OpenTelemetry SDK seems to have caused an issue with Temporal's sandbox environment, causing an error stating os.environ.get isn't allowed in a workflow.

The relevant change on the OpenTelemetry side is here - if I run with a version prior to 1.29 then I no longer get an error.

Minimal Reproduction

A workflow implemented on a client with the TracingInterceptor attached and a valid OpenTelemetry SDK configured, in my case I'm using ddtrace with the DD_TRACE_OTEL_ENABLED environment variable set to true. From what I can tell though the call that isn't allowed in the sandbox is coming directly from the OpenTelemetry SDK and not the ddtrace patch.

Environment/Versions

  • OS and processor: macOS Apple Silicon
  • Temporal Version: SDK 1.9.0
  • Running in Docker arm64

Additional context

You can see the stacktrace and error that's displayed on the Temporal UI here: https://gist.github.com/connected-bkiiskila/d11592cb86271b5f343a4d387097d418

@connected-bkiiskila connected-bkiiskila added the bug Something isn't working label Jan 16, 2025
@cretz
Copy link
Member

cretz commented Jan 16, 2025

Thanks! Hrmmm...

Looking at your stack trace, it appears at https://github.com/DataDog/dd-trace-py/blob/eddd51464e8c32db019e239c106317228b65667c/ddtrace/internal/opentelemetry/context.py#L24-L26 opentelemetry.baggage is and others are unfortunately imported at runtime which means you may be inadvertently re-importing things for every workflow run. You should be able to add import opentelemetry.baggage somewhere outside of your workflow (e.g. where you start the worker) to run the module initialization code that ddtrace is running lazily. This will also improve your performance. ddtrace (reasonably) assumes modules are cached and that imports are not rerun in other environments, but that's not the case with workflows.

Though I am a bit surprised our import opentelemetry.baggage.propagation doesn't imply this. You may also have to configure the sandbox restrictions to treat these OTel imports as pass through (https://github.com/temporalio/sdk-python?tab=readme-ov-file#passthrough-modules). It will require testing...

@connected-bkiiskila
Copy link
Author

I'll give that extra import a shot and see if that works and report back

@connected-bkiiskila
Copy link
Author

Hmmm, that doesn't seem to work either - only thing I'm having luck with is the downgrade to <1.29

For that passthrough documentation, would I just be flat out allowing os.environ.get or is there a way to allow specifically the call within the opentelemetry SDK?

@cretz
Copy link
Member

cretz commented Jan 16, 2025

For that passthrough documentation, would I just be flat out allowing os.environ.get or is there a way to allow specifically the call within the opentelemetry SDK?

The passthrough would be for passing through the imports, unrelated to the restrictions happening on os.environ.get. What's happening now, from your stack trace, is that from opentelemetry.baggage import get_all is being run at runtime in the workflow. We want that to have already been cached. You can probably just add this to the top of your workflow file:

with workflow.unsafe.imports_passed_through():
    import opentelemetry.baggage

But this is untested. Even if we do allow os.environ.get through the sandbox, the bigger problem is that you are reimporting OTel libraries for every workflow run which you don't want to do.

@connected-bkiiskila
Copy link
Author

Seems like that <1.29 fix was a fluke and works some of the time for whatever reason so I'm likely going to have to allow the import of os.environ.

@cretz
Copy link
Member

cretz commented Jan 16, 2025

We may also be able to look into it, though it seems to be ddtrace specific in that they are importing at runtime. I do think it is worth doing your best to prevent this import from being treated as a new import for every workflow run (since that means it runs all OTel bootstrap code every run and takes up extra memory).

@connected-bkiiskila
Copy link
Author

Yeah I'd like to avoid it if possible, would you suggest creating a ticket on the ddtrace side and referencing this?

@connected-bkiiskila connected-bkiiskila changed the title [Bug] Unable to run workflows with OpenTelemetry SDK 1.29 [Bug] Unable to run workflows with OpenTelemetry and ddtrace Jan 16, 2025
@cretz
Copy link
Member

cretz commented Jan 16, 2025

ddtrace is not really doing anything wrong as far as a normal Python library goes. https://github.com/DataDog/dd-trace-py/blob/eddd51464e8c32db019e239c106317228b65667c/ddtrace/internal/opentelemetry/context.py#L24-L26 is just a problem in Temporal workflows. But we need to get those imports preloaded and passed through. You may be able to do something like:

import opentelemetry.baggage
import opentelemetry.trace

my_worker = Worker(
  ...,
  workflow_runner=SandboxedWorkflowRunner(
    restrictions=SandboxRestrictions.default.with_passthrough_modules("opentelemetry", "ddtrace")
  )
)

when you create your worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants