Skip to content

Commit

Permalink
Add ADR 010: DetectionBatcher & DetectionBatchStream
Browse files Browse the repository at this point in the history
Signed-off-by: declark1 <44146800+declark1@users.noreply.github.com>
  • Loading branch information
declark1 committed Mar 5, 2025
1 parent e5ac6db commit 24a125c
Showing 1 changed file with 49 additions and 0 deletions.
49 changes: 49 additions & 0 deletions docs/architecture/adrs/010-detection-batcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# ADR 010: DetectionBatcher & DetectionBatchStream

This ADR documents the addition of two new abstractions to handle batching (fka "aggregation") of streaming detection results.

1. `DetectionBatcher`
A trait to implement pluggable batching logic for a `DetectionBatchStream`. It includes an associated `Batch` type, enabling implementations to return different types of batches.

2. `DetectionBatchStream`
A stream adapter that wraps multiple detection streams and produces a stream of batches using a `DetectionBatcher`.

## Motivation

To support initial streaming requirements outlined in ADR 002, we implemented the `Aggregator` and `Tracker` components.

1. `Aggregator` handles batching detections and building results. Internally, it is implemented as 3 actors:

- `AggregationActor`
Aggregates detections and sends them to result channel

- `GenerationActor`
Consumes generations from the generation stream and provides them to the `ResultActor`

- `ResultActor`
Builds results from batches of detections and sends them to result channel

2. `Tracker` wraps a BTreeMap and contains batching logic. It is used internally by the `AggregationActor`.

The primary issue with these components is that they were designed specifically for the *Streaming Classification With Generation* task and lack flexibility to be extended to additional streaming use cases that require batching detections, e.g.
- A use case may require different batching logic
- A use case may need to use different containers to implement it's batching logic
- A use case may need to return a different batch type
- A use case may need to build a different result type

Additionally, actors are not used in other areas of this codebase and it introduces concepts that may be unfamiliar to new contributors, further increasing the learning curve.

## Decisions

1. The `DetectionBatcher` trait replaces the `Tracker`, enabling flexible and pluggable batching logic tailored to different use cases.

2. The `DetectionBatchStream`, a stream adapter, replaces the `Aggregator`, enabling more flexiblity as it is generic over `DetectionBatcher`.

3. The task of building results is decoupled and delegated to the task handler as a post-batching task. Instead of using an actor to accumulate and own generation/chat completion message state, a task handler can use a shared vec instead, e.g. `Arc<RwLock<Vec<T>>>`, or other approach per use case requirements.

## Notes
1. The existing *Streaming Classification With Generation* batching logic has been re-implemented in `MaxProcessedIndexBatcher`, a `DetectionBatcher` implementation.

## Status

Pending

0 comments on commit 24a125c

Please sign in to comment.