Skip to content

Releases: apollographql/router

v1.52.0-rc.0

24 Jul 11:11
Compare
Choose a tag to compare
v1.52.0-rc.0 Pre-release
Pre-release
1.52.0-rc.0

v1.51.0

16 Jul 20:50
Compare
Choose a tag to compare

🚀 Features

Support conditional coprocessor execution per stage of request lifecycle (PR #5557)

The router now supports conditional execution of the coprocessor for each stage of the request lifecycle (except for the Execution stage).

To configure, define conditions for a specific stage by using selectors based on headers or context entries. For example, based on a supergraph response you can configure the coprocessor not to execute for any subscription:

coprocessor:
  url: http://127.0.0.1:3000 # mandatory URL which is the address of the coprocessor
  timeout: 2s # optional timeout (2 seconds in this example). If not set, defaults to 1 second
  supergraph:
    response: 
      condition:
        not:
          eq:
          - subscription
          - operation_kind: string
      body: true

To learn more, see the documentation about coprocessor conditions.

By @bnjjj in #5557

Add option to deactivate introspection response caching (PR #5583)

The router now supports an option to deactivate introspection response caching. Because the router caches responses as introspection happens in the query planner, cached introspection responses may consume too much of the distributed cache or fill it up. Setting this option prevents introspection responses from filling up the router's distributed cache.

To deactivate introspection caching, set supergraph.query_planning.legacy_introspection_caching to false:

supergraph:
  query_planning:
    legacy_introspection_caching: false

By @Geal in #5583

Add 'subgraph_on_graphql_error' selector for subgraph (PR #5622)

The router now supports the subgraph_on_graphql_error selector for the subgraph service, which it already supported for the router and supergraph services. Subgraph service support enables easier detection of GraphQL errors in response bodies of subgraph requests.

An example configuration with subgraph_on_graphql_error configured:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.duration:
          attributes:
            subgraph.graphql.errors: # attribute containing a boolean set to true if response.errors is not empty
              subgraph_on_graphql_error: true

By @bnjjj in #5622

🐛 Fixes

Add response_context in event selector for event_* instruments (PR #5565)

The router now supports creating custom instruments with a value set to event_* and using both a condition executed on an event and the response_context selector in attributes. Previous releases didn't support the response_context selector in attributes.

An example configuration:

telemetry:
  instrumentation:
    instruments:
      supergraph:
        sf.graphql_router.errors:
          value: event_unit
          type: counter
          unit: count
          description: "graphql errors handled by the apollo router"
          condition:
            eq:
            - true
            - on_graphql_error: true
          attributes:
            "operation":
              response_context: "operation_name" # This was not working before

By @bnjjj in #5565

Provide valid trace IDs for unsampled traces in Rhai scripts (PR #5606)

The traceid() function in a Rhai script for the router now returns a valid trace ID for all traces.

Previously, traceid() didn't return a trace ID if the trace wasn't selected for sampling.

By @bnjjj in #5606

Allow query batching and entity caching to work together (PR #5598)

The router now supports entity caching and subgraph batching to run simultaneously. Specifically, this change updates entity caching to ignore a subgraph request if the request is part of a batch.

By @garypen in #5598

Gracefully handle subgraph response with -1 values inside error locations (PR #5633)

This router now gracefully handles responses that contain invalid "-1" positional values for error locations in queries by ignoring those invalid locations.

This change resolves the problem of GraphQL Java and GraphQL Kotlin using { "line": -1, "column": -1 } values if they can't determine an error's location in a query, but the GraphQL specification requires both line and column to be positive numbers.

As an example, a subgraph can respond with invalid error locations:

{
    "data": { "topProducts": null },
    "errors": [{
        "message":"Some error on subgraph",
        "locations": [
            { "line": -1, "column": -1 },
        ],
        "path":["topProducts"]
    }]
}

With this change, the router returns a response that ignores the invalid locations:

{
    "data": { "topProducts": null },
    "errors": [{
        "message":"Some error on subgraph",
        "path":["topProducts"]
    }]
}

By @IvanGoncharov in #5633

Return request timeout and rate limited error responses as structured errors (PR #5578)

The router now returns request timeout errors (408 Request Timeout) and request rate limited errors (429 Too Many Requests) as structured GraphQL errors (for example, {"errors": [...]}). Previously, the router returned these as plaintext errors to clients.

Both types of errors are properly tracked in telemetry, including the apollo_router_graphql_error_total metric.

By @IvanGoncharov in #5578

Fix span names and resource mapping for Datadog trace exporter (Issue #5282)

Note

This is an incremental improvement, but we expect more improvements in Router v1.52.0 after #5609 lands.

The router now uses static span names by default. This change fixes the user experience of the Datadog trace exporter when sending traces with Datadog native configuration.

The router has two ways of sending traces to Datadog:

  1. The OpenTelemetry for Datadog approach (which is the recommended method). This is identified by otlp in YAML configuration, and it is not impacted by this fix.
  2. The "Datadog native" configuration. This is identified by the use of a datadog: key in YAML configuration.

This change fixes a bug in the latter approach that broke some Datadog experiences, such as the "Resources" section of the Datadog APM Service Catalog page.

We now use static span names by default, with resource mappings providing additional context when requested, which enables the desired behavior which was not possible before.

If for some reason you wish to maintain the existing behavior, you must either update your spans and resource mappings, or keep your spans and instead configure the router to use dynamic span names and disable resource mapping.

Enabling resource mapping and fixed span names is configured by the enable_span_mapping and fixed_span_names options:

telemetry:
  exporters:
    tracing:
      datadog:
        enabled: true
        # Enables resource mapping, previously disabled by default, but now enabled.
        enable_span_mapping: true
        # Enables fixed span names, defaults to true.
        fixed_span_names: true

  instrumentation:
    spans:
      mode: spec_compliant

With enable_span_mapping set to true (now default), the following resource mappings are applied:

OpenTelemetry Span Name Datadog Span Operation Name
request http.route
router http.route
supergraph graphql.operation.name
query_planning graphql.operation.name
subgraph subgraph.name
subgraph_request graphql.operation.name
http_request http.route

You can override the default resource mappings by specifying the resource_mapping configuration:

telemetry:
  exporters:
    tracing:
      datadog:
        enabled: true
        resource_mapping:
          # Use `my.span.attribute` as the resource name for the `router` span
          router: "my.span.attribute"

To learn more, see the Datadog trace exporter documentation.

By @bnjjj and @BrynCooke in #5386

📚 Documentation

Update documentation for ignore_other_prefixes (PR #5592)

Update JWT authentication documentation to clarify the behavior of the ignore_other_prefixes configuration option.

By @andrewmcgivery in #5592

v1.51.0-rc.0

10 Jul 20:06
Compare
Choose a tag to compare
v1.51.0-rc.0 Pre-release
Pre-release
1.51.0-rc.0

v2.0.0-alpha.0

09 Jul 15:12
Compare
Choose a tag to compare
v2.0.0-alpha.0 Pre-release
Pre-release
2.0.0-alpha.0

v1.50.0

02 Jul 19:09
8a450dd
Compare
Choose a tag to compare

🚀 Features

Support local persisted query manifests for use with offline licenses (Issue #4587)

Adds experimental support for passing persisted query manifests to use instead of the hosted Uplink version.

For example:

persisted_queries:
  enabled: true
  log_unknown: true
  experimental_local_manifests: 
    - ./persisted-query-manifest.json
  safelist:
    enabled: true
    require_id: false

By @lleadbet in #5310

Support conditions on standard telemetry events (Issue #5475)

Enables setting conditions on standard events.
For example:

telemetry:
  instrumentation:
    events:
      router:
        request:
          level: info
          condition: # Only log the router request if you sent `x-log-request` with the value `enabled`
            eq:
            - request_header: x-log-request
            - "enabled"
        response: off
        error: error
        # ...

Not supported for batched requests.
By @bnjjj in #5476

Make status_code available for router_service responses in Rhai scripts (Issue #5357)

Adds response.status_code on Rhai router_service responses. Previously, status_code was only available on subgraph_service responses.

For example:

fn router_service(service) {
    let f = |response| {
        if response.is_primary() {
            print(response.status_code);
        }
    };

    service.map_response(f);
}

By @IvanGoncharov in #5358

Add new values for the supergraph query selector (PR #5433)

Adds support for four new values for the supergraph query selector:

  • aliases: the number of aliases in the query
  • depth: the depth of the query
  • height: the height of the query
  • root_fields: the number of root fields in the query

You can use this data to understand how your graph is used and to help determine where to set limits.

For example:

telemetry:
  instrumentation:
    instruments:
      supergraph:
        'query.depth':
          description: 'The depth of the query'
          value:
            query: depth
          unit: unit
          type: histogram

By @garypen in #5433

Add the ability to drop metrics using otel views (PR #5531)

You can drop specific metrics if you don't want these metrics to be sent to your APM using otel views.

telemetry:
  exporters:
    metrics:
      common:
        service_name: apollo-router
        views:
          - name: apollo_router_http_request_duration_seconds # Instrument name you want to edit. You can use wildcard in names. If you want to target all instruments just use '*'
            aggregation: drop

By @bnjjj in #5531

Add operation_name selector for router service in custom telemetry (PR #5392)

Adds an operation_name selector for the router service.
Previously, accessing operation_name was only possible through the response_context router service selector.

For example:

telemetry:
  instrumentation:
    instruments:
      router:
        http.server.request.duration:
          attributes:
            graphql.operation.name:
              operation_name: string

By @bnjjj in #5392

🐛 Fixes

Fix Cache-Control aggregation and age calculation in entity caching (PR #5463)

Enhances the reliability of caching behaviors in the entity cache feature by:

  • Ensuring the proper calculation of max-age and s-max-age fields in the Cache-Control header sent to clients.
  • Setting appropriate default values if a subgraph does not provide a Cache-Control header.
  • Guaranteeing that the Cache-Control header is aggregated consistently, even if the plugins is disabled entirely or on specific subgraphs.

By @Geal in #5463

Fix telemetry events when trace isn't sampled and preserve attribute types (PR #5464)

Improves accuracy and performance of event telemetry by:

  • Displaying custom event attributes even if the trace is not sampled
  • Preserving original attribute type instead of converting it to string
  • Ensuring http.response.body.size and http.request.body.size attributes are treated as numbers, not strings

⚠️ Exercise caution if you have monitoring enabled on your logs, as attribute types may have changed. For example, attributes like http.response.status_code are now numbers (200) instead of strings ("200").

By @bnjjj in #5464

Enable coprocessors for subscriptions (PR #5542)

Ensures that coprocessors correctly handle subscriptions by preventing skipped data from being overwritten.

By @bnjjj in #5542

Improve accuracy of query_planning.plan.duration (PR #5)

Previously, the apollo.router.query_planning.plan.duration metric inaccurately included additional processing time beyond query planning. The additional time included pooling time, which is already accounted for in the metric. After this update, apollo.router.query_planning.plan.duration now accurately reflects only the query planning duration without additional processing time.

For example, before the change, metrics reported:

2024-06-21T13:37:27.744592Z WARN  apollo.router.query_planning.plan.duration 0.002475708
2024-06-21T13:37:27.744651Z WARN  apollo.router.query_planning.total.duration 0.002553958

2024-06-21T13:37:27.748831Z WARN  apollo.router.query_planning.plan.duration 0.001635833
2024-06-21T13:37:27.748860Z WARN  apollo.router.query_planning.total.duration 0.001677167

Post-change metrics now accurately reflect:

2024-06-21T13:37:27.743465Z WARN  apollo.router.query_planning.plan.duration 0.00107725
2024-06-21T13:37:27.744651Z WARN  apollo.router.query_planning.total.duration 0.002553958

2024-06-21T13:37:27.748299Z WARN  apollo.router.query_planning.plan.duration 0.000827
2024-06-21T13:37:27.748860Z WARN  apollo.router.query_planning.total.duration 0.001677167

By @xuorig and @lrlna in #5530

Remove deno_crypto package due to security vulnerability (Issue #5484)

Removes deno_crypto due to the vulnerability reported in curve25519-dalek.
Since the router exclusively used deno_crypto for generating UUIDs using the package's random number generator, this vulnerability had no impact on the router.

By @Geal in #5483

Add content-type header to failed auth checks (Issue #5496)

Adds content-type header when returning AUTH_ERROR from authentication service.

By @andrewmcgivery in #5497

Implement manual caching for AWS Security Token Service credentials (PR #5508)

In the AWS Security Token Service (STS), the CredentialsProvider chain includes caching, but this functionality was missing for AssumeRoleProvider.
This change introduces a custom CredentialsProvider that functions as a caching layer with these rules:

  • Cache Expiry: Credentials retrieved are stored in the cache based on their credentials.expiry() time if specified, or indefinitely (ever) if not.
  • Automatic Refresh: Five minutes before cached credentials expire, an attempt is made to fetch updated credentials.
  • Retry Mechanism: If credential retrieval fails, another attempt is scheduled after a one-minute interval.
  • (Coming soon, not included in this change) Manual Refresh: The CredentialsProvider will expose a refresh_credentials() function. This can be manually invoked, for instance, upon receiving a 401 error during a subgraph call.

By @o0Ignition0o in #5508

📃 Configuration

Align entity caching configuration structure for subgraph overrides (PR #5474)

Aligns the entity cache configuration structure to the same all/subgraphs over...

Read more

v1.50.0-rc.0

27 Jun 07:48
Compare
Choose a tag to compare
v1.50.0-rc.0 Pre-release
Pre-release
1.50.0-rc.0

v1.49.1

19 Jun 16:19
9b152bd
Compare
Choose a tag to compare

Important

If you have enabled Distributed query plan caching, this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🔒 Security

Replace dependency included in security advisory (Issue #5484)

This removes our use of a dependency that was cited in security advisories RUSTSEC-2024-0344 and GHSA-x4gp-pqpj-f43q.

We have carefully analyzed our usages and determined that Apollo Router is not impacted. We only relied on different functions from the same dependency that were easily replaced. Despite lack of impact, we have opted to remove the dependency entirely out of an abundance of caution. This not only clears the warning on our side immediately, but also provides a clear path forward in the event that this shows up in any of our user's own scans.

Users may upgrade at their own discretion, though as it was determined there is no impact, upgrading is not being explicitly recommended.

See the corresponding GitHub issue.

By @Geal in #5483

🐛 Fixes

Update to Federation v2.8.1 (PR #5483)

The above security fix was in router-bridge which had already received a Federation version bump. This bump takes Federation to v2.8.1, which fixes a performance-related matter in composition. However, it does not impact query planning, which means this particular update is a no-op and this is simply a symbolic bump of the number itself, rather than any functional change.

By @Geal in #5483

v1.49.1-rc.0

19 Jun 15:12
Compare
Choose a tag to compare
v1.49.1-rc.0 Pre-release
Pre-release
1.49.1-rc.0

v1.49.0

18 Jun 18:01
c808b30
Compare
Choose a tag to compare

🚀 Features

Override tracing span names using custom span selectors (Issue #5261)

Adds the ability to override span names by setting the otel.name attribute on any custom telemetry selectors .

This example changes the span name to router:

telemetry:
  instrumentation:
    spans:
      router:
        otel.name:
           static: router # Override the span name to router 

By @bnjjj in #5365

Add description and units to standard instruments (PR #5407)

This PR adds description and units to standard instruments available in the router. These descriptions and units have been copy pasted directly from the OpenTelemetry semantic conventions and are needed for better integrations with APMs.

By @bnjjj in #5407

Add with_lock() method to Extensions to facilitate avoidance of timing issues (PR #5360)

In the case that you necessitated writing custom Rust plugins, we've introduced with_lock() which explicitly restricts the lifetime of the Extensions lock.

Without this method, it was too easy to run into issues interacting with the Extensions since we would inadvertently hold locks for too long. This was a source of bugs in the router and caused a lot of tests to be flaky.

By @garypen in #5360

Add support for unix_ms_now in Rhai customizations (Issue #5182)

Rhai customizations can now use the unix_ms_now() function to obtain the current Unix timestamp in milliseconds since the Unix epoch.

For example:

fn supergraph_service(service) {
    let now = unix_ms_now();
}

By @shaikatzz in #5181

🐛 Fixes

Improve error message produced when subgraphs responses don't include an expected content-type header value (Issue #5359)

To enhance debuggability when a subgraph response lacks an expected content-type header value, the error message now includes additional details.

Examples:

HTTP fetch failed from 'test': subgraph response contains invalid 'content-type' header value \"application/json,application/json\"; expected content-type: application/json or content-type: application/graphql-response+json
HTTP fetch failed from 'test': subgraph response does not contain 'content-type' header; expected content-type: application/json or content-type: application/graphql-response+json

By @IvanGoncharov in #5223

Performance improvements for demand control (PR #5405)

Removes unneeded logic in the hot path for our recently released public preview of demand control feature to improve performance.

By @BrynCooke in #5405

Skip hashing the entire schema on every query plan cache lookup (PR #5374)

This fixes performance issues when looking up query plans for large schemas.

Important

If you have enabled Distributed query plan caching, this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

By @Geal in #5374

Optimize GraphQL instruments (PR #5375)

When processing selectors for GraphQL instruments, heap allocations should be avoided for optimal performance. This change removes Vec allocations that were previously performed per field, yielding significant performance improvements.

By @BrynCooke in #5375

Log metrics overflow as a warning rather than an error (Issue #5173)

If a metric has too high a cardinality, the following is displayed as a warning instead of an error:

OpenTelemetry metric error occurred: Metrics error: Warning: Maximum data points for metric stream exceeded/ Entry added to overflow

By @bnjjj in #5287

Add support of response_context selectors for error conditions (PR #5288)

Provides the ability to configure custom instruments. For example:

http.server.request.timeout:
  type: counter
  value: unit
  description: "request in timeout"
  unit: request
  attributes:
    graphql.operation.name:
      response_context: operation_name
  condition:
    eq:
    - "request timed out"
    - error: reason

By @bnjjj in #5288

Inaccurate apollo_router_opened_subscriptions counter (PR #5363)

Fixes the apollo_router_opened_subscriptions counter which previously only incremented. The counter now also decrements.

By @bnjjj in #5363

📃 Configuration

🛠 Maintenance

Skip GraphOS tests when Apollo key not present (PR #5362)

Some tests require APOLLO_KEY and APOLLO_GRAPH_REF to execute successfully.
These are now skipped if these env variables are not present allowing external contributors to the router to successfully run the entire test suite.

By @BrynCooke in #5362

📚 Documentation

Standard instrument configuration documentation for subgraphs (PR #5422)

Added documentation about standard instruments available at the subgraph service level:

  • http.client.request.body.size - A histogram of request body sizes for requests handled by subgraphs.
  • http.client.request.duration - A histogram of request durations for requests handled by subgraphs.
  • http.client.response.body.size - A histogram of response body sizes for requests handled by subgraphs.

These instruments are configurable in router.yaml:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.body.size: true # (default false)
        http.client.request.duration: true # (default false)
        http.client.response.body.size: true # (default false)

By @bnjjj in #5422

Update docs frontmatter for consistency and discoverability (PR #5164)

Makes title case consistent for page titles and adds subtitles and meta-descriptions are updated for better discoverability.

By @Meschreiber in #5164

🧪 Experimental

Warm query plan cache using persisted queries on startup (Issue #5334)

Adds support for the router to use persisted queries to warm the query plan cache upon startup using a new experimental_prewarm_query_plan_cache configuration option under persisted_queries.

To enable:

persisted_queries:
  enabled: true
  experimental_prewarm_query_plan_cache: true

By @lleadbet in #5340

Apollo reporting signature enhancements (PR #5062)

Adds a new experimental configuration option to turn on some enhancements for the Apollo reporting stats report key:

  • Signatures will include the full normalized form of input objects
  • Signatures will include aliases
  • Some small normalization improvements

This new configuration (telemetry.apollo.experimental_apollo_signature_normalization_algorithm) only works when in experimental_apollo_metrics_generation_mode: new mode and we don't yet recommend enabling it while we continue to verify that the new functionality works as expected.

By @bonnici in #5062

Add experimental support for sending traces to Studio via OTLP (PR #4982)

As the ecosystem around OpenTelemetry (OTel) has been expanding rapidly, we are evaluating a migration of Apollo's internal
tracing system to use an OTel-based protocol.

In the short-term, benefits include:

  • A comprehensive way to visualize the router execution path in GraphOS Studio.
  • Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
  • Additional metadata such as subgraph fetch details, router idle / busy timing, and more.

Long-term, we see this as a strategic enhancement to consolidate these two disparate tracing systems.
This will pave the way for future enhancements to more easily plug into the Studio trace visualizer.

Configuration

This change adds a new configuration option experimental_otlp_tracing_sampler. This can be used to send
a percentage of traces via OTLP instead...

Read more

v1.49.0-rc.1

17 Jun 09:50
Compare
Choose a tag to compare
v1.49.0-rc.1 Pre-release
Pre-release
1.49.0-rc.1