-
Notifications
You must be signed in to change notification settings - Fork 96
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
# Description From the original issue: > When a solver repeatedly wins consecutive auctions but fails to settle its solutions on-chain, it can lead to system downtime. To prevent this, the autopilot must have the capability to temporarily exclude such solvers from participating in competitions. This ensures no single solver can disrupt the system's operations. This PR implements it by introducing a new struct, which checks whether the solver is allowed to participate in the next competition by using two different approaches: 1. Moved the existing `Authenticator`'s `is_solver` on-chain call into the new struct. 2. Introduced a new strategy, which finds a non-settling solver using a SQL query. It selects 3 last auctions(configurable) with a deadline until the current block to avoid selecting pending settlements and checks if all of the auctions were settled by the same solver/solvers(in case of multiple winners). This strategy caches the results to avoid redundant DB queries. This query relies on the auction_id column from the settlements table, which gets updated separately by the `Observer` struct, so the cache gets updated only once the `Observer` has some result. These validators are called sequentially to avoid redundant RPC calls to `Authenticator`. So it first checks for the DB-based validator cache and, only then, sends the RPC call. Once one of the strategies says the solver is not allowed to participate, it gets deny-listed for 5m(configurable). Each validator can be enabled/disabled separately in case of any issue. ## Metrics Added a metric that gets populated by the DB-based validator once a solver is marked as banned. The idea is to create an alert that is sent if there are more than 4 such occurrences for the last 30 minutes for the same solver, meaning it should be considered disabling the solver. # Open discussions 1. Since the current SQL query filters out auctions where a deadline has not been reached, the following case is possible: The solver gets banned, while the same solver has a pending settlement. In case this gets settled, the solver remains banned. While this is a niche case, it would be better to unblock the solver before the cache TTL deadline is reached. This has not been implemented in the current PR since some refactoring is required in the Observer struct. If this is approved, it can be implemented quickly. 2. Whether it makes sense to introduce a metrics-based strategy similar to the bad token detector's where the solver gets banned in case >95%(or similar) of settlements fail. ## How to test A new SQL query test. Existing e2e tests. ## Related Issues Fixes #3221 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced advanced solver participation controls with configurable eligibility checks, integrating both on-chain and database validations. - Enabled asynchronous real-time notifications for settlement updates, enhancing system responsiveness. - Added metrics tracking to monitor auction participation and performance. - Chores - Updated internal dependencies and restructured driver configuration. - Reorganized the database schema to support improved auction and settlement processing. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Loading branch information
1 parent
97ddf0c
commit 59a1010
Showing
14 changed files
with
628 additions
and
29 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
crates/autopilot/src/domain/competition/participation_guard/db.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
use { | ||
crate::{ | ||
domain::{Metrics, eth}, | ||
infra, | ||
}, | ||
ethrpc::block_stream::CurrentBlockWatcher, | ||
std::{ | ||
collections::HashMap, | ||
sync::Arc, | ||
time::{Duration, Instant}, | ||
}, | ||
}; | ||
|
||
/// Checks the DB by searching for solvers that won N last consecutive auctions | ||
/// but never settled any of them. | ||
#[derive(Clone)] | ||
pub(super) struct Validator(Arc<Inner>); | ||
|
||
struct Inner { | ||
persistence: infra::Persistence, | ||
banned_solvers: dashmap::DashMap<eth::Address, Instant>, | ||
ttl: Duration, | ||
last_auctions_count: u32, | ||
drivers_by_address: HashMap<eth::Address, Arc<infra::Driver>>, | ||
} | ||
|
||
impl Validator { | ||
pub fn new( | ||
persistence: infra::Persistence, | ||
current_block: CurrentBlockWatcher, | ||
competition_updates_receiver: tokio::sync::mpsc::UnboundedReceiver<()>, | ||
ttl: Duration, | ||
last_auctions_count: u32, | ||
drivers_by_address: HashMap<eth::Address, Arc<infra::Driver>>, | ||
) -> Self { | ||
let self_ = Self(Arc::new(Inner { | ||
persistence, | ||
banned_solvers: Default::default(), | ||
ttl, | ||
last_auctions_count, | ||
drivers_by_address, | ||
})); | ||
|
||
self_.start_maintenance(competition_updates_receiver, current_block); | ||
|
||
self_ | ||
} | ||
|
||
/// Update the internal cache only once the competition auctions table is | ||
/// updated to avoid redundant DB queries on each block or any other | ||
/// timeout. | ||
fn start_maintenance( | ||
&self, | ||
mut competition_updates_receiver: tokio::sync::mpsc::UnboundedReceiver<()>, | ||
current_block: CurrentBlockWatcher, | ||
) { | ||
let self_ = self.clone(); | ||
tokio::spawn(async move { | ||
while competition_updates_receiver.recv().await.is_some() { | ||
let current_block = current_block.borrow().number; | ||
let non_settling_solvers = match self_ | ||
.0 | ||
.persistence | ||
.find_non_settling_solvers(self_.0.last_auctions_count, current_block) | ||
.await | ||
{ | ||
Ok(non_settling_solvers) => non_settling_solvers, | ||
Err(err) => { | ||
tracing::warn!(?err, "error while searching for non-settling solvers"); | ||
continue; | ||
} | ||
}; | ||
|
||
let now = Instant::now(); | ||
let non_settling_solver_names: Vec<&str> = non_settling_solvers | ||
.iter() | ||
.filter_map(|solver| self_.0.drivers_by_address.get(solver)) | ||
.map(|driver| { | ||
Metrics::get() | ||
.non_settling_solver | ||
.with_label_values(&[&driver.name]); | ||
// Check if solver accepted this feature. This should be removed once the | ||
// CIP making this mandatory has been approved. | ||
if driver.requested_timeout_on_problems { | ||
tracing::debug!(solver = ?driver.name, "disabling solver temporarily"); | ||
self_ | ||
.0 | ||
.banned_solvers | ||
.insert(driver.submission_address, now); | ||
} | ||
driver.name.as_ref() | ||
}) | ||
.collect(); | ||
|
||
tracing::debug!(solvers = ?non_settling_solver_names, "found non-settling solvers"); | ||
} | ||
tracing::error!("stream of settlement updates terminated unexpectedly"); | ||
}); | ||
} | ||
} | ||
|
||
#[async_trait::async_trait] | ||
impl super::Validator for Validator { | ||
async fn is_allowed(&self, solver: ð::Address) -> anyhow::Result<bool> { | ||
if let Some(entry) = self.0.banned_solvers.get(solver) { | ||
return Ok(entry.elapsed() >= self.0.ttl); | ||
} | ||
|
||
Ok(true) | ||
} | ||
} |
74 changes: 74 additions & 0 deletions
74
crates/autopilot/src/domain/competition/participation_guard/mod.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
mod db; | ||
mod onchain; | ||
|
||
use { | ||
crate::{ | ||
arguments::DbBasedSolverParticipationGuardConfig, | ||
domain::eth, | ||
infra::{self, Ethereum}, | ||
}, | ||
std::sync::Arc, | ||
}; | ||
|
||
/// This struct checks whether a solver can participate in the competition by | ||
/// using different validators. | ||
#[derive(Clone)] | ||
pub struct SolverParticipationGuard(Arc<Inner>); | ||
|
||
struct Inner { | ||
/// Stores the validators in order they will be called. | ||
validators: Vec<Box<dyn Validator + Send + Sync>>, | ||
} | ||
|
||
impl SolverParticipationGuard { | ||
pub fn new( | ||
eth: Ethereum, | ||
persistence: infra::Persistence, | ||
competition_updates_receiver: tokio::sync::mpsc::UnboundedReceiver<()>, | ||
db_based_validator_config: DbBasedSolverParticipationGuardConfig, | ||
drivers: impl IntoIterator<Item = Arc<infra::Driver>>, | ||
) -> Self { | ||
let mut validators: Vec<Box<dyn Validator + Send + Sync>> = Vec::new(); | ||
|
||
if db_based_validator_config.enabled { | ||
let current_block = eth.current_block().clone(); | ||
let database_solver_participation_validator = db::Validator::new( | ||
persistence, | ||
current_block, | ||
competition_updates_receiver, | ||
db_based_validator_config.solver_blacklist_cache_ttl, | ||
db_based_validator_config.solver_last_auctions_participation_count, | ||
drivers | ||
.into_iter() | ||
.map(|driver| (driver.submission_address, driver.clone())) | ||
.collect(), | ||
); | ||
validators.push(Box::new(database_solver_participation_validator)); | ||
} | ||
|
||
let onchain_solver_participation_validator = onchain::Validator { eth }; | ||
validators.push(Box::new(onchain_solver_participation_validator)); | ||
|
||
Self(Arc::new(Inner { validators })) | ||
} | ||
|
||
/// Checks if a solver can participate in the competition. | ||
/// Sequentially asks internal validators to avoid redundant RPC calls in | ||
/// the following order: | ||
/// 1. DB-based validator: operates fast since it uses in-memory cache. | ||
/// 2. Onchain-based validator: only then calls the Authenticator contract. | ||
pub async fn can_participate(&self, solver: ð::Address) -> anyhow::Result<bool> { | ||
for validator in &self.0.validators { | ||
if !validator.is_allowed(solver).await? { | ||
return Ok(false); | ||
} | ||
} | ||
|
||
Ok(true) | ||
} | ||
} | ||
|
||
#[async_trait::async_trait] | ||
trait Validator: Send + Sync { | ||
async fn is_allowed(&self, solver: ð::Address) -> anyhow::Result<bool>; | ||
} |
20 changes: 20 additions & 0 deletions
20
crates/autopilot/src/domain/competition/participation_guard/onchain.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
use crate::{domain::eth, infra::Ethereum}; | ||
|
||
/// Calls Authenticator contract to check if a solver has a sufficient | ||
/// permission. | ||
pub(super) struct Validator { | ||
pub eth: Ethereum, | ||
} | ||
|
||
#[async_trait::async_trait] | ||
impl super::Validator for Validator { | ||
async fn is_allowed(&self, solver: ð::Address) -> anyhow::Result<bool> { | ||
Ok(self | ||
.eth | ||
.contracts() | ||
.authenticator() | ||
.is_solver(solver.0) | ||
.call() | ||
.await?) | ||
} | ||
} |
Oops, something went wrong.