Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The component is causing dependent relayers to crash due to a failed health check on the amplifier API endpoint #2

Open
puhtaytow opened this issue Nov 20, 2024 · 4 comments
Labels
amplifier api integration everything related to Amplifier API Integration component bug Something isn't working enhancement New feature or request

Comments

@puhtaytow
Copy link
Member

puhtaytow commented Nov 20, 2024

Is your feature request related to a problem? Please describe.

As stated in the issue title, the amplifier API integration component is causing dependent relayers to crash due to a failed health check on the amplifier API endpoint.

2024-11-20T10:31:48.103497Z ERROR start_and_wait_for_shutdown: relayer_engine: /Users/if_you_know_you_know/.cargo/git/checkouts/axelar-relayer-core-ee05e092797b9627/bb7071c/crates/relayer-engine/src/lib.rs:53: A task returned an error, shutting down the system err=
   0: Reqwest error error sending request for url (https://amplifier-devnet-amplifier.devnet.axelar.dev/health)
   1: error sending request for url (https://amplifier-devnet-amplifier.devnet.axelar.dev/health)
   2: client error (SendRequest)
   3: http2 error
   4: keep-alive timed out
   5: operation timed out

Location:
   /Users/if_you_know_you_know/.cargo/git/checkouts/axelar-relayer-core-ee05e092797b9627/bb7071c/crates/relayer-amplifier-api-integration/src/healthcheck.rs:13

Describe the solution you'd like

To begin with, we should implement support for failover scenarios by allowing multiple amplifier API endpoints. In case of a primary endpoint failure, the system should automatically try with other endpoints.

The health check mechanism should not be able to terminate the entire process. Instead, it should monitor, log the failure, and retry across available endpoints in the event of a failure.

Additional context

The affected component impacts all dependent relayers. A decision to re-evaluate this at later stages was made by quorum consensus.

@puhtaytow puhtaytow added enhancement New feature or request amplifier api integration everything related to Amplifier API Integration component labels Nov 20, 2024
@puhtaytow
Copy link
Member Author

@roberts-pumpurs @ctoyan @asmie @frenzox @eloylp forgive me if i missed somebody 🙏

@puhtaytow puhtaytow added the bug Something isn't working label Nov 20, 2024
@eloylp
Copy link
Member

eloylp commented Nov 20, 2024

by allowing multiple amplifier API endpoints.

Lacking context here. Isn't the amplifier API behind a load balancer ?

@puhtaytow
Copy link
Member Author

by allowing multiple amplifier API endpoints.

Lacking context here. Isn't the amplifier API behind a load balancer ?

It should be, but its still common practice to provide two endpoints as load balancer is a single point of failure.

@eloylp
Copy link
Member

eloylp commented Nov 21, 2024

🤔 If the amplifier API is behind a load balancer, how do we know we are reaching the same instance we are retrieving data at /health ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amplifier api integration everything related to Amplifier API Integration component bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants