Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore black box tests removed from sequencer-relayer #1008

Closed
Fraser999 opened this issue Apr 25, 2024 · 0 comments · Fixed by #1033
Closed

Restore black box tests removed from sequencer-relayer #1008

Fraser999 opened this issue Apr 25, 2024 · 0 comments · Fixed by #1033
Labels
code-quality sequencer-relayer pertaining to the astria-sequencer-relayer crate

Comments

@Fraser999
Copy link
Contributor

Fraser999 commented Apr 25, 2024

We need to restore the black box tests removed as part of #963. Ensure we also restore the single test added in f236c79 and subsequently deleted when merging main into #1001.

They should be re-implemented using the new grpc-mock crate.

@Fraser999 Fraser999 added sequencer-relayer pertaining to the astria-sequencer-relayer crate code-quality labels Apr 25, 2024
github-merge-queue bot pushed a commit that referenced this issue Apr 25, 2024
## Summary
The sequencer-relayer has been updated to submit blobs directly to the
Celestia app via its gRPC interface rather than using the Celestia node
as a proxy.

## Background
There have been issues experienced in the past when using the Celestia
node for blob submissions related to sequences and timeouts. We want to
avoid such issues by "cutting out the middle man" and controlling the
flow of submissions directly inside the relayer.

## Changes
This PR adds a lot of new proto files and their corresponding generated
Rust sources, although they have been reduced to the minimal set of
messages/rpcs required for this new feature.

Aside from these, most of the new code has been isolated to a new module
`celestia_client` inside the `relayer` module. It should be reasonably
self-contained, with a view to possibly extracting it to a separate
crate in the future, and/or providing a generic client API should we
wish to add handling for more chains in the future.

At a high level, the new client is constructed via a builder, which may
need multiple attempts to create the client in the case that the
Celestia app is unreachable. Retries are controlled via `tryhard`,
maxing out at 30s and with infinite attempts. A shutdown signal can
cancel this, allowing for graceful exit. The builder retrieves initial
info from the Celestia app (e.g. chain ID, gas costs). The immutable
chain ID is held as a member of the client, but the other values (being
variable) are cached in the `State` struct and can be retrieved and
cached again should an attempt to submit blobs fail.

The `CelestiaClient` only has one public method: `try_submit`. This is
called in the existing `relayer::write::submit_with_retry` function in
much the same way as previously. The main difference here is that the
client's `try_submit` method is also passed a tokio watch receiver
containing the optional last error. When first submitting a given batch
of blobs, the watch channel holds `None`. On each failed attempt, the
`on_retry` closure sends the error returned by `try_submit`. On a fresh
batch of blobs, a new watch channel is constructed, again initialized
with `None`. The purpose of providing the last error to the client is
that it can be interrogated for a failure due to the fee being too low.
Such an error holds information on the required fee, and on reattempt,
the client uses this value in preference to its own calculated fee.

The general flow inside `try_submit` is as follows:
- if the previous submit attempt failed, fetch the cost info from the
Celestia app and cache it
- fetch our account number and sequence number from the Celestia app
- create, sign and send the blob transaction via the Celestia
`BroadcastTx` method
- poll the Celestia app once per second via the `GetTx` method

## Testing
Most of the new functionality has unit tests. The `CelestiaClient`
mainly has member methods calling gRPC requests, however handling the
responses is delegated to free functions. As well as code clarity, this
allows for comprehensive unit tests of these functions without the need
for running/mocking a Celestia server. However, there are still missing
tests for the `try_submit` method. That might be best done via the new
`astria-grpc-mock` crate, or at a higher level by refactoring the
existing black box tests (almost all of which have been removed for
now).

I have run the sequencer against a local network running on k8s and
observed the blob submission to behave as expected via the logging. This
includes temporarily changing the fee to a low value for the first
submission attempt, which caused the retry behaviour to identify and use
the correct fee from the error response of the first attempt. I have
also had the smoke test pass locally.

## Breaking Changelist
- `ASTRIA_SEQUENCER_RELAYER_CELESTIA_BEARER_TOKEN` has been replaced by
`ASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_KEY_FILE` which expects a path to
a hex-encoded secp256k1 secret key corresponding to an account on the
Celestia chain.
- `ASTRIA_SEQUENCER_RELAYER_CELESTIA_ENDPOINT` has been replaced by
`ASTRIA_SEQUENCER_RELAYER_CELESTIA_APP_GRPC_ENDPOINT` which represents
the endpoint of the gRPC server on the Celestia app.

## Further Discussion
- integration testing/black box testing. [Discussed, agreed to restore
them in follow-up. Ticketed at #1008.]
- the polling period for `GetTx` after a successful `BroadcastTx` is
hard-coded to one second. Seems like this should be configurable via an
env var. [Discussed, and the period is now variable depending upon the
result of `GetTx`. Also, given that the Celestia app will likely be
managed by us, we can control its rate-limits, meaning there's less
likelihood of needing to change the polling period in the relayer.]

## Related Issues
Closes #921.
github-merge-queue bot pushed a commit that referenced this issue Apr 30, 2024
## Summary
Removes `astria-celestia-client` as a direct dependency from
astria-conductor and astria-sequencer-relayer.

## Background
We are no longer depending on any of the extension-traits defined in
astria-celestia-client. It is merely used for constructing Celestia
namespaces and reexporting of celestia crates.

## Changes
- Move the `namespace_v0*` construction utilities to `astria-core`
behind a `celestia` feature.
- Remove `astria-celestia-client` from `astria-conductor` and
`astria-sequencer-relayer`.
- The `astria-celestia-mock` crate remains until
#1008 is done.
 
## Testing
Not necessary, just refactoring. All tests still pass.
github-merge-queue bot pushed a commit that referenced this issue Apr 30, 2024
## Summary
A new configuration env var has been added to allow filtering data
submission to a subset of rollups.

## Background
This allows more flexibility in what is actually submitted to the
Celestia chain.

## Changes
A new env var was added, mapped to a new field
`Config::rollup_id_filter`. This represents a list of rollups whose
transactions should be included in blobs submitted to Celestia. Rollups
not in the list do not have their data submitted. However, if the list
is empty, no filtering is applied; all rollups have their data
submitted.

The collection of filtered rollup IDs is passed down to the
`write::conversion::convert` function where sequencer blocks are
converted to blobs. The blobs are filtered inside this function, partly
since this is relatively early in the flow, but also so that the
`ConversionInfo` struct can carry information about blobs being
excluded. This data is logged at the start of `write::submit_blobs` in
an info-level log message. It might be worthwhile adding metrics around
the number of excluded rollups per submission, but I didn't see a clear
benefit to that and it can be added in a follow-up PR if required.

## Testing
- Added unit tests covering parsing of the config string to a collection
of `RollupId`s.
- ~Added a black box test where a sequencer block is constructed with
several transactions, half from included rollups and half from excluded
ones. Existing black box tests ensure that when the filter is empty, no
filtering is done.~ These all got removed in #963, and should be
restored as part of #1008.

## Breaking Changelist
- Added `ASTRIA_SEQUENCER_RELAYER_ONLY_INCLUDE_ROLLUPS` which can be
empty or can be a comma-separated list.
github-merge-queue bot pushed a commit that referenced this issue May 3, 2024
## Summary
The black box tests were removed in a recent PR. This PR reinstates them
using the new mock gRPC framework `astria-grpc-mock`.

## Background
The tests would have needed heavy refactoring in #963, and we wanted to
avoid adding a lot more code to that already-large PR. We decided to
temporarily delete them and reinstate them using `astria-grpc-mock` to
mock responses from the Celestia app.

## Changes
The tests removed in #963 and a single test briefly added in #1001 then
removed again have been restored. I also added a new test to check the
relayer shuts down in a timely manner.

Previously the tests leaned heavily on counts of blobs received by the
mock Celestia app. The current tests retain these checks, but also query
the `/status` endpoint of the sequencer-relayer to confirm its state.
There was also a single test previously which checked the state of the
postsubmit.json file. I didn't think this was necessary given that we're
querying the state in a more "black box" way now (via the http server),
but I didn't remove the function to perform this check
(`TestSequencerRelayer::assert_state_files_are_as_expected`) pending a
decision on whether to reinstate that check or not inside the
`later_height_in_state_leads_to_expected_relay` test.

As @SuperFluffy predicted, all the new protobuf packages added in #963
had to be added to the `pbjson_build` builder to generate the serde
impls required for using them in `astria-grpc-mock`. This accounts for
the vast majority of the new LoC in this PR.

I made a few small changes to the mock framework:
- added `Mock::up_to_n_times` to avoid failures due to a single-use mock
response being sent multiple times
- added `DynamicResponse` to support constructing mock responses which
can be passed the relevant gRPC request
- changed `MockGuard::wait_until_satisfied` to take `self` by ref rather
than value, since if it consumes self and `wait_until_satisfied` times
out, the `MockGuard` panics in its `Drop` impl, meaning we don't get a
good indication from e.g. a `tokio::time::timeout` as to what went wrong

Most of the new functionality lives in `MockCelestiaAppServer` and
`TestSequencerRelayer`. For the former, all gRPCs except for
`BroadcastTx` and `GetTx` are set up to always return valid-enough
responses - i.e. these don't need to be mounted individually in every
test case. In the case of `MockCelestiaAppServer`, the new functionality
is a combination of support for mounting mock responses and functions
with timeouts to query the `/status`, `/healthz` and `/readyz` http
endpoints of the sequencer-relayer.

## Testing
These changes are tests.

Closes #1008.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-quality sequencer-relayer pertaining to the astria-sequencer-relayer crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant