Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning timeouts to reflect real-world scenarios #399

Merged
merged 4 commits into from
Jan 29, 2025

Conversation

zajko
Copy link
Contributor

@zajko zajko commented Jan 28, 2025

  • Changing error messages in case of node disconnection;
  • cleaning up unused properties
  • removing request_limit and the logic attached to that since we don't actually handle multiple in-flight requests to binary-port
  • Removing the possibility to define "infinite" as a valid retry amount in node client connector since it can lead to deadlocks. That allowed removal of RpcServerConfigTarget, NodeClientConfigTarget, ExponentialBackoffConfigTarget and MaxAttemptsTarget since we don't need custom code for deserialization of the config file.
  • Added some metrics to track unwanted events (timeouts on connection/sending/receiving data from binary port, detecting response id mismatch)
  • Changed buckets definitions in RESPONSE_TIME_MS_BUCKETS constant
  • Added MAX_COMPONENT_STARTUP_TIMEOUT_SECS guard in case one of the components hangs on startup

…used properties; removing `request_limit` and the logic attached to that since we don't actually handle multiple in-flight requests; Removing the possibility to define "infinite" as a valid retry amount in node client connector since it can lead to deadlocks
@@ -1059,6 +1069,7 @@ impl FramedNodeClient {
.await
.map_err(|_| Error::RequestFailed("timeout".to_owned()))?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd log the actual error here for easier debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's very informative, although I see that register_timeout is in the wrong place

@zajko zajko merged commit 55151e6 into casper-network:feat-2.0 Jan 29, 2025
2 checks passed
zajko added a commit that referenced this pull request Feb 21, 2025
* Storing ApiVersion for sse events of the node from which the event was read. It will be stored in the event_log table. (#244)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* RPC sidecar changes (#231)

* Moving rpc sidecar to event sidecar workspace. Both og those servers will work on one binary
* Updating schemars version because the old one is having bugs (if there is a name collision for components schema only one will be saved)
* Copying casper_types to this project. The reason is that casper_types in release <= 4.0.1 depends on old schemars
* Copying casper_types for 2.0 release to this project. The reason is that rpc sidecar has to use the new types definitions, but for now they are not released (and it's not clear if they will be released prior to node 2.0 release).
* Changing RpcError implementation to fix tests. Some alignments of codestyle to make clippy happy.
* Moving casper-types dependencies to workspace level

* Sync changes from node branch

* Update the schema file

* Delete protocol.md

* Move a DbId fix

* Change error message

* Changes to versioning

* Sync changes to types

* Switch to having a single binary

* Moving config files, fixing compilation issues

* bump 'juliet' to '0.2.1'

* Sync casper-types changes

* Changing RPC sidecar config so that the rpc_server.node_client.exponential_backoff will take a new parameter called max_attempts. I tcan be either "infinite" or a positive, non-zero number.

* Storing ApiVersion in event_log table. Removing is_big_integer_id config from DDLConfiguration because it's no longer needed (new version of sea_query handles the situation of defining big_integer and autoincrement)

* Revert "Storing ApiVersion in event_log table. Removing is_big_integer_id con…"

* Update for node review changes (#15)

* Update for node changes

* Fix lints

* Cleanup

* Cover all values in tag roundtrip tests

* Moving admin server out from the sse sidecar. They are spinned up separately from sse events server. Also the database initialization happens separetely. Is sse events server is defined a storage definition is required. If rest api server is defined a storage definition is required.

* Fix GlobalStateRequest::random

* Changes explicit BoxFuture casting to calling 'boxed()' method

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>
Co-authored-by: Rafał Chabowski <rafal@casperlabs.io>
Co-authored-by: Rafał Chabowski <88321181+rafal-ch@users.noreply.github.com>
Co-authored-by: zajko <ja.zajko@gmail.com>

* Replacing 'deploy' naming with 'transaction'. Replaced /deploy/(...) REST API endpoint with /transaction/deploy/(...) and /transaction/version1/(...) endpoints. (#245)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Move merkle proof to casper-types, fix compile errors and cleanup (#247)

* Move merkle proof to casper-types and cleanup

* Remove casper types stuff

* Add an addressable entity endpoint (#249)

* Bump versions (#250)

* Sync node fix (#251)

* Update for InformationRequest::Transaction change

* Fix test failures due to version mismatches

* Run formatter

* Bump commit

* Adding network_name as a field in event_log. It is fetched from `/sta… (#248)

* Adding network_name as a field in event_log. It is fetched from `/status` endpoint of the node to which sidecar connects to.

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Changed REST API endpoint so that they don't return simply the stored event. Instead we wrap the sse events in an envelope which looks like: (#252)

```
    {
        "header": {
            "api_version": "2.0.0",
            "network_name": "casper"
        },
        "payload": {(...)}
    }
    ```

Events in endpoints that return lists (like signatures for block) will also have each individual element of the list wrapped in such envelope.
In the above the header fields:
* "api_version" is the api version which was reported in the ApiVersion message for the node that we fetched the event from.
* "network_name" is the "chainspec_name" field that was returned in the "/status" endpoint for the node that we fetched the event from.

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Update usage of ReactorState (#256)

* Update usage of ReactorState

* Update schema

* rpc-sidecar: include new style bid records in auction info

In Condor, we introduced new bid records in global state that are stored
under the `Key::BidAddr` key type. We need to include these records when
creating the auction info.

Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>

* Work done: (#257)

* restructured metrics (moved to separate module)
* added rest-api specific metrics (current connections, response times)
* added db specific metrics (amount of data fetched from "raw" fields for events)
* added rpc specific metrics

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Fix auction state tests (#260)

* Fix auction state tests

* Fix a clippy lint

* Disable part of the workflow for now

* Workaround for non-PRs

* Added `enable_server` flag to all the internal components of sidecar so it's easier to fine-tune what should actually run. If configuration for all components is either missing or has `enable_server = false`, sidecar exits since it has nothing to do. Also did some refactoring of initialization code. (#262)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Bump casper types and refresh test schema file (#263)

* Bump casper types and refresh schema file

* Fix test data

* Switch block reporting changes (#261)

* Switch block reporting changes

* Correct typo in name

* Bump types

* Updated oauth definition of endpoints. Reworded some things in documentation. (#264)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* * Fixed import of dependency in `sqlite_database` - it caused bulding a package to fail (#265)

* Removing lingering references to "casper-event-sidecar". The whole project should now be called "casper-sidecar"
* Moving packaging configuration from "event-sidecar" module to "sidecar" which aggrgates the final binary

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Bump casper types

* Add support for the new speculative execution results

* Use workspace-wide temporary patch for casper-types

* Refreshed documentation so it reflects how users should interact with sidecar after 2.0 update (#266)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Speculative execution no longer needs 'block_identifier'

* Re-add temporarily commented code

* Remove stray debug artifact

* Bumping dependencies of sidecar (#269)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Update RPC schema

* Bump `juliet` version to `0.3`

* Added metrics that trace response time of RPC methods. Removed PATH_ABSTRACTION_TIMES_SECONDS which was a temporary check and ment to be removed. Made databse initialization lazy. This ensures that if no db-dependant component starts there will be no DB connection initialization (since it's not needed) (#272)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Update JSON schema

* Update RPC schema to cover `gas_price_tolerance`

* Remove stray debug print

* Leverage new dictionary request (#267)

* Leverage dictionary request in the node

* Fix compilation errors

* Update git ref

* Fix lint

* Re-enable support for `latest_switch_block_hash` in status

* Update dependencies

* Updated casper-types to include deserialization fixes (#276)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Use EntityAddr instead of AddressableEntityHash (#275)

* Use EntityAddr instead of AddressableEntityHash

* Fix audit

* Binary port balance query (#274)

* Use binary port balance request

* Point at branch for now

* Add holds to the response

* Cleanup

* Point at feat-2.0

* Add missing PurseIdentifier variant

* Added 'rpc.discover' method to speculative json rpc api (#278)

* Added 'rpc.discover' method to speculative json rpc api

* Aligned test to changes from feat-2.0

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Correct RPC method names (#279)

* Correct RPC method names

* Update schema file

* Bringing back /main /deployss /sigs events to sidecars SSE server. For now it provides 2.x SSE events. (#281)

Applying code review suggestions

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Remove 'juliet' from the RPC sidecar (#289)

* Thread payload via Framed tokio transport

* Remove juliet-related parameters from config

* Add binary port message timeout to config

* Add error handling in 'send_request'

* Fix typo

* Tests in sidecar no longer use juliet

* Clean-up

* Respect the request limit

* Update RPC schema

* Update speculative execution server schema

* Bring back reconnection loop

* Add separate timeout for client readiness

* Update RPC schema

* Update speculative execution server schema

* Point at git branch temporarily

---------

Co-authored-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>

* Point casper-node dependency back at the main repo (#291)

* Execution result should be null (#287)

* To make the json RPC more consistent all occurrences of 'skip_serializing_if' for Option type were removed. As a general rule we return 'null' for None and the three exceptions from json RPC are inconsistencies


---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Bump casper-types (#293)

* Balance cleanup (#292)

* Making 'emulate_legacy_sse_apis' config value optional, defaulting to no emulations enabled (#295)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Update casper node dependencies (#297)

* Upgrade casper node dependencies

* Rename for consistency

* Update references to request name

* Mapping for legacy filters (#290)

* Implementing mapping of contemporary 2.x sse events to legacy 1.x /events/main /events/deploys and /events/sigs formats

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Bump casper-node dependencies (#299)

* Add test for binary port reconnection

* Fix reconnection for node client

* Updated flow, wording, and some configs in the main README

* Edited USAGE and formatting error

* Add TOC in main README

* Improved flow in main README

* Removed duplication in the ETC_README for node operators

* Updated example config files

* Remove "event" sidecar wording from files, since it does more now

* Cleaned up rpc_sidecar/README.md

* Add logo and license to the main README

* Minor spelling update

* Add back details after checking with the devs.

* Initial modification in Cargo

* Update dependencies to address block restructure

* Fix translation of BlockV2's to legacy blocks

* Update `read_balance` binary port request after updates to `GlobalStateRequest`

* Fix tests compilation issues

* Fix tests broken by changes to address block restructure, ignore 2 tests that are failing to timeouts

* run cargo fmt

* Address failing test

* Address cargo fmt check

* Address PR ci issue

* Remove unwrap in state_get_auction_info (#302)

* Retrieve entity named keys and entry points (#296)

* Retrieve entity named keys

* Bump casper-node deps

* Fix post-merge error

* Bump casper-node dependencies

* Retrieve entry points

* Point at feat-2.0

* Added documentation for legacy translation. Fixed removed tests. Adde… (#301)

* Added documentation for legacy translation. Fixed removed tests. Added more unit tests for the 2.x to 1.x translation process
---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* expand error enums to capture the new values being sent by the binary port.

* Update error handling for `account_put_deploy` and `account_put_transaction` to account for the expanded errors from the binary port

* Update RPC error enum to support the expanded binary port errors.

* Remove TODO

* Fix pre-1.5 auction state retrieval (#310)

* Editing pass with open questions

* Bump casper-node dependencies (#309)

* Bump casper-node dependencies

* Bump again

* Generate binary messages with incrementing IDs

* Handle request id mismatch scenario

* Add tests related to request id

* Prevent deserialization of the original request when validating request id

* Make tests compatible with the "request id" related change

* Added comment message to all the legacy endpoints whic explains the deprecation and refers to documentation for limitation (#311)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Prevent potential infinite loop when retrying to get a response from binary port

* Bump casper-node dependencies (#316)

* Bump casper-node dependencies

* Fix compile errors

* Fix broken fixtures

* Correct the StandardDeployHashesTranslator

* Update mock

* Bump casper-node dependencies (#317)

* Bump casper-node dependencies

* Update schema

* Updates to final configs for current version.

* Resolved questions and using full labels

* Validate config to check whether any server is enabled (#320)

* Validate config to check whether any server is enabled

* Modify to check for runnable components

* Review feedback

Co-authored-by: Adam Stone <97986246+ACStoneCL@users.noreply.github.com>

* Review feedback

Co-authored-by: Adam Stone <97986246+ACStoneCL@users.noreply.github.com>

* Update diagram labels for consistency

* Add a brief introduction

* Minor error fix

* Temporarily point to forked node repo with necessary changes

* Promote binary port error from `u8` to `u16`

* Update `casper-binary-port` and `casper-types` dependencies

* Satisfy clippy

* Update formatting

* Review feedback incl. updated highlighting

* casper-json-rpc updated highlighting and cleanup

* Review feedback

* Review feedback - remove HTML

* Implement the reward endpoint (#321)

* Implement a reward endpoint

Signed-off-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>

* Map new errors

* Error code update

* Update error handling

* Make errors more consistent

---------

Signed-off-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>

* Update expected schemas for min/max delegation amounts

* Point `casper-types` and `casper-binary-port` back to `feat-2.0`

* Update expected schemas for min/max delegation amounts

* Update dependencies

* Add installation steps

* Add usage steps for replaying events

* Temporarily point to repo with binary port changes

* Bring back dependencies to `feat-2.0`

* Update schemas to cover `transaction_category`

* Update test fixture to cover `transaction_category`

* Bypass security audits for now.

* Handle error codes recently added to binary port

* Making SSE event storage optional. From now the [sse_server] config section has a disable_event_persistence property. If set to true, Sidecar will not store events to database. [storage.sqlite_config] and [storage.postgresql_config] are optional. Also [storage.sqlite_config] and [storage.postgresql_config] have enabled property. If set to false, Sidecar will treat them as if they are not defined at all. Changed the name of [storage.storage_path] to [storage.storage_folder] (#324)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Fixing Sidecar config which allows the whole [storage] section to be not-present. [storage] can be omitted if sse server is either disabled or not provided. (#330)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Expose delegation rate in reward responses (#328)

* Expose delegation rate in reward responses

* Point back at casper-node feat-2

* Expose protocol version in status (#334)

* Expose protocol version in status

* Update schema

* Point back at casper-node feat-2

* Bump dependencies for binary port and types

* Bump `vergen` and `reqwest` to dodge RUSTSEC issues

* Adapt `lagging_clients_should_be_disconnected()` test to updated request dependency

* Return the switch block hash with rewards response (#336)

* Return the switch block hash with rewards response

Signed-off-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>

* Point back at casper-node feat-2

---------

Signed-off-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>

* Handle purse not found error (#337)

* Bump casper-types dependency (#338)

* Bump casper-types dependency

* Allow audit exception due to no fix available

* Get entity get package (#341)

* Add package and amend get entity endpoint

* Repointing node dependencies

---------

Co-authored-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>
Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Updating casper-node dependency to use new Transaction definition; updating tests (#345)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* add initial retry in request handler

* update gitignore

* update reconnect test

* reduce code duplication

* add config typing for addresses

* fix linter errors

* add optional network name to config

* revert unintended changes

* Updating dependency to casper-node; fixing `state_get_auction_info` code so that it can handle the new `SeigniorageRecipientsV1` and `SeigniorageRecipientsV2` types (#355)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* validate network name on SSE connection

* validate network name when connecting RPC client

* split address config into IP + port

* update example config files

* updating dependencies to reflect newest "casper-types" (#362)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* make network_name a top-level config option

* use local duplicate of AuctionState struct

* add keepalive checks

* include request id in the keepalive payload

* consume the response

* Aligning code with recent changes to casper-types, fixing auction_info tests (#366)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* apply formatting

* Aligning sidecar to latest changes in casper-node (#367)

* Aligning sidecar to latest changes in casper-node

* Fixing schema tests

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* fix tests

* post-merge fixes

* update json schema

* Repointing casper-node to v2.0.0-rc5 (#369)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Aligning code with recent changes in the node (#374)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* handle standard interrupt signals

* disable lint

* Bumping dependencies (#378)

* Bumping dependencies

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* aligning with node changes (#382)

* aligning with node changes

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Repointing casper-node dependency to dev; applying fixes to be aligned with newest node code (#385)

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Cutting rc3 (#389)

* Introduced "state_get_auction_info_v2" json rpc method. (#392)

* Introduced "state_get_auction_info_v2" json rpc method.

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Refactoring "state_get_auction_info" and "state_get_auction_info_v2" … (#393)

* Refactoring "state_get_auction_info" and "state_get_auction_info_v2" to make less binary port requests

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>

* Added missing error variants to handle error codes returned by the node. (#394)

* Added missing error variants to handle error codes returned by the node.

* Removing catch-all statement

* Refreshing casper-node dependency, refactoring tests for 'chain_get_e… (#395)

* Refreshing casper-node dependency, refactoring tests for 'chain_get_era_info_by_switch_block'

* applying CR remarks

* Aligning timeouts to reflect real-world scenarios (#399)

* Changing error messages in case of node disconnection;
* cleaning up unused properties
* removing request_limit and the logic attached to that since we don't actually handle multiple in-flight requests to binary-port
* Removing the possibility to define "infinite" as a valid retry amount in node client connector since it can lead to deadlocks. That allowed removal of RpcServerConfigTarget, NodeClientConfigTarget, ExponentialBackoffConfigTarget and MaxAttemptsTarget since we don't need custom code for deserialization of the config file.
* Added some metrics to track unwanted events (timeouts on connection/sending/receiving data from binary port, detecting response id mismatch)
* Changed buckets definitions in RESPONSE_TIME_MS_BUCKETS constant
* Added MAX_COMPONENT_STARTUP_TIMEOUT_SECS guard in case one of the components hangs on startup
* Making keepalive loop use the standard mechnism of sending messages to gain retries and id-checks
* Aligning message_timeout_secs

* rename all occurences of 'signed block' to 'block with signatures'

* fmt

* Aligning sidecar with various changes made to nodes binary port (#402)

* Aligning sidecar with various changes made to nodes binary port

* update cargo.toml

* rename binary request to command

---------

Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>
Co-authored-by: igor-casper <152597353+igor-casper@users.noreply.github.com>
Co-authored-by: igor-casper <igor@casper.network>

* Apply suggestions from code review

Co-authored-by: Maciek <19913370+wojcik91@users.noreply.github.com>

* Applying code review

* Applying CR remarks

* Added missing import

* increase keepalive request rate across configs

* Update CI publish deb to new infra.

* Bumping version of casper-sidecar to allow tag based publish test.

* Cleanup pass

* Making version mask *.*.* and remove unneeded rust cache.

---------

Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
Signed-off-by: Jacek Malec <145967538+jacek-casper@users.noreply.github.com>
Co-authored-by: Jakub Zajkowski <jakub@casperlabs.io>
Co-authored-by: jacek-casper <145967538+jacek-casper@users.noreply.github.com>
Co-authored-by: Rafał Chabowski <rafal@casperlabs.io>
Co-authored-by: Rafał Chabowski <88321181+rafal-ch@users.noreply.github.com>
Co-authored-by: Alexandru Sardan <alexandru@casperlabs.io>
Co-authored-by: Alex Sardan <86006646+alsrdn@users.noreply.github.com>
Co-authored-by: ipopescu <iulia@casperlabs.io>
Co-authored-by: Karan Dhareshwar <karan@casperlabs.io>
Co-authored-by: Zach Showalter <zach@casperlabs.io>
Co-authored-by: Karan Dhareshwar <42871449+darthsiroftardis@users.noreply.github.com>
Co-authored-by: Zach Showalter <zacshowa@gmail.com>
Co-authored-by: Joe Sacher <321623+sacherjj@users.noreply.github.com>
Co-authored-by: Iulia Popescu <ipopescu@users.noreply.github.com>
Co-authored-by: Adam Stone <97986246+ACStoneCL@users.noreply.github.com>
Co-authored-by: Maciej Wójcik <maciek@wjck.pl>
Co-authored-by: Maciek <19913370+wojcik91@users.noreply.github.com>
Co-authored-by: igor-casper <igor@casper.network>
Co-authored-by: igor-casper <152597353+igor-casper@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants