fix!: explictly close rpc connections when changing base node #6662

sdbondi · 2024-11-01T07:19:11Z

Description

fix!: explictly close rpc connections when changing base node
establish relay quickly
fix wallet connectivity logic

Motivation and Context

How Has This Been Tested?

What process can a PR reviewer use to test or verify this change?

Breaking Changes

None
Requires data directory on base node to be deleted
Requires hard fork
Other - Please specify

BREAKING CHANGE: RPC protocol adds the "check bytes" periodically sent by the server. These bytes will cause errors in older clients.

github-actions · 2024-11-01T07:30:03Z

Test Results (Integration tests)

2 files 1 errors 9 suites 1h 34m 39s ⏱️
18 tests 14 ✅ 0 💤 4 ❌
26 runs 14 ✅ 0 💤 12 ❌

For more details on these parsing errors and failures, see this check.

Results for commit e96337b.

♻️ This comment has been updated with latest results.

github-actions · 2024-11-01T07:30:22Z

Test Results (CI)

3 files 126 suites 9m 47s ⏱️
1 160 tests 1 160 ✅ 0 💤 0 ❌
3 480 runs 3 480 ✅ 0 💤 0 ❌

Results for commit e96337b.

♻️ This comment has been updated with latest results.

It is an anti-pattern to keep an idle rpc session open. This commit periodically closes idle sessions. RPC establishment was optimised to not require a full RTT, allowing us to establish new sessions as needed

hansieodendaal

Hi there @sdbondi, looking good, also running this branch for my testing. Just some comments.

applications/minotari_console_wallet/src/lib.rs

base_layer/wallet/src/connectivity_service/base_node_peer_manager.rs

base_layer/wallet/src/connectivity_service/handle.rs

base_layer/wallet/src/connectivity_service/service.rs

integration_tests/src/wallet_process.rs

chore: reduce logs

test: cucumber logs

SWvheerden

Looks good, a few comments and questions

base_layer/core/src/base_node/service/service.rs

SWvheerden · 2024-11-07T06:20:01Z

base_layer/core/src/base_node/service/service.rs

+                            propagation_source,
+                            // Some other error occurred, we will not propagate, but we'll also not penalise the peer
+                            // for a failure not related to the message contents.
+                            gossipsub::MessageAcceptance::Ignore,


following the above logic, should we also not reject on a. ban reason?

Fair point, though I'm trying to avoid any issues with future gossiping. We handle the validation failure above. So this is not a result of validation failing, we're presumably banning the peer for some other reason which may not be related to the block in the gossip message e.g. IO error when requesting the block.

I would say that reducing their gossip score (which may result in them being removed from the mesh) because we could not connect to their RPC is not correct. However since we were not able to validate the message itself, we cannot Accept it either. Open to thoughts

SWvheerden · 2024-11-07T06:21:31Z

base_layer/core/src/chain_storage/blockchain_database.rs

            *smt = blockchain_db.db_write_access()?.calculate_tip_smt()?;
-            warn!(target: LOG_TARGET, "Finished loading SMT into memory from stored db");
+            warn!(target: LOG_TARGET, "Finished loading SMT into memory from stored db in {:.2?}", timer.elapsed());


haha yeah, we should go back to caching it.
Doing it every block becomes too intensive, but we should have a checkpoint like every 100 blocks or so,

lol yeah just curious if the loading of the SMT was the reason for the base node taking forever to start up. Improvement definitely needed here.

Long startup is probably waiting for 5 pings from nodes to decide on a sync peer

no it's before that it takes 20s on our test esme network (from the log above)

Is it the smt that takes that long?

SWvheerden · 2024-11-07T06:24:16Z

base_layer/core/src/mempool/service/service.rs

+                target: LOG_TARGET,
+                "Failed to handle incoming transaction message: {:?}", e
+            );
+            // NOTE: We assume that all errors are due to a "bad" transaction.


I feel like this might be a bit broad...
But we can use this for now, just we have to monitor it

Yeah could be a problem - let's see.

SWvheerden · 2024-11-07T06:26:22Z

base_layer/wallet/src/base_node_service/config.rs

@@ -41,7 +41,7 @@ impl Default for BaseNodeServiceConfig {
    fn default() -> Self {
        Self {
            base_node_monitor_max_refresh_interval: Duration::from_secs(30),
-            base_node_rpc_pool_size: 10,
+            base_node_rpc_pool_size: 3,
            event_channel_size: 250,


is this perhaps not too low?
I think a wallet can have the following running at a single time:

TMS validation

OMS validation

base node status check

base node tip

UTXO scanning

I am not sure if I miss any here.

RPC is designed to share clients, basically you'll just have to wait for the previous request to complete before yours is started. Since this I've changed the behaviour to drop RPCs once they are not being used so perhaps a good number is the number of services using it - by your count 6.
Isnt "base node tip" using messaging protocol through the liveness service and not RPC? Will audit this and come up with a minimal number.

Seems it's used in 8 places, difficult to know how many are required at the same time. In my rather basic testing it never got above 3 and if it is ever a problem in any of the services to wait some bounded time for responses. To be safe I'll set it to 8.

network/rpc_framework/src/server/mod.rs

more logging persist changes to peer_manager in the watch updates to config files

feat: wallet connectivity improvements

hansieodendaal · 2024-11-07T14:14:53Z

I conducted many system-level stress tests with this branch and feel that the libp2p implementation is pretty solid for almost all use cases, and am happy to approve this PR. Outstanding issues are logged in https://github.com/orgs/tari-project/projects/25/views/1.

hansieodendaal

Nice.

ACK

…roject#6662)

sdbondi added 2 commits November 1, 2024 11:18

explictly close rpc connections when changing base node

c887f36

add reachability arg to cli

e36b472

sdbondi force-pushed the libp2p-fixes branch 3 times, most recently from 91582eb to b25e245 Compare November 1, 2024 07:21

wallet connectivity logic, update libp2p

7c794db

sdbondi force-pushed the libp2p-fixes branch from b25e245 to 7c794db Compare November 1, 2024 11:16

sdbondi added 2 commits November 1, 2024 15:54

establish a relay quickly

e180947

fix server not detecting end of rpc sessions

eeb3dca

sdbondi changed the title ~~fix: explictly close rpc connections when changing base node~~ fix!: explictly close rpc connections when changing base node Nov 1, 2024

sdbondi marked this pull request as ready for review November 1, 2024 15:02

sdbondi requested a review from a team as a code owner November 1, 2024 15:02

sdbondi added 5 commits November 3, 2024 08:48

fix: mempool sync handles multiple connections to same peer

da8ca40

feat: api to report gossip message validation

29541d3

report message validation result for mempool and block prop

d802314

fix(wallet): close RPC connections after use

8d94b41

It is an anti-pattern to keep an idle rpc session open. This commit periodically closes idle sessions. RPC establishment was optimised to not require a full RTT, allowing us to establish new sessions as needed

mempool sync protocol fixes

e4dafaa

hansieodendaal reviewed Nov 4, 2024

View reviewed changes

sdbondi and others added 10 commits November 4, 2024 16:27

Fix potential panic in messaging and substream behaviour

b59b5ae

Default wallet reachability = Private

42254d0

select next peer if attempted

5dab84a

adjust connection limits per application

e597204

Reduce logs

ab076ce

Merge pull request #2 from hansieodendaal/ho_logs

d23743c

chore: reduce logs

Fix cucumber logs

2d5582d

more log4rs

96a9d38

Merge pull request #3 from hansieodendaal/cucumber_logs

b81761a

test: cucumber logs

fix unable to connect after remote disconnect

c510744

sdbondi added 2 commits November 6, 2024 17:24

connection reliability

c7b5e86

raise base node inbound connection limit default to 50

a3eba44

sdbondi force-pushed the libp2p-fixes branch from fef8114 to a3eba44 Compare November 6, 2024 13:51

SWvheerden reviewed Nov 7, 2024

View reviewed changes

sdbondi and others added 10 commits November 7, 2024 10:58

review: remove comments out code

fbd5a40

review: increase client rpc pool size to 8

6b4cca8

Merge branch 'feat-libp2p' into libp2p-fixes

c4f3f00

wallet connectivity

4a4ca82

more logging persist changes to peer_manager in the watch updates to config files

fix peer_id not updating

f4e4304

further refinements

2f404b6

remove mark_changed from Watch

b465563

remove receiver_mut from Watch

a5edcdc

review comments

37ed40a

Merge pull request #6 from hansieodendaal/ho_wallet_connectivity_2

e96337b

feat: wallet connectivity improvements

hansieodendaal approved these changes Nov 7, 2024

View reviewed changes

SWvheerden merged commit 50bd9ae into tari-project:feat-libp2p Nov 8, 2024
15 of 17 checks passed

sdbondi added a commit to sdbondi/tari that referenced this pull request Nov 8, 2024

fix!: explictly close rpc connections when changing base node (tari-p…

dec7698

…roject#6662)

sdbondi added a commit to sdbondi/tari that referenced this pull request Nov 8, 2024

fix!: explictly close rpc connections when changing base node (tari-p…

02ae76e

…roject#6662)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: explictly close rpc connections when changing base node #6662

fix!: explictly close rpc connections when changing base node #6662

sdbondi commented Nov 1, 2024 •

edited

Loading

github-actions bot commented Nov 1, 2024 •

edited

Loading

github-actions bot commented Nov 1, 2024 •

edited

Loading

hansieodendaal left a comment

SWvheerden left a comment

SWvheerden Nov 7, 2024

sdbondi Nov 7, 2024 •

edited

Loading

SWvheerden Nov 7, 2024

sdbondi Nov 7, 2024

SWvheerden Nov 7, 2024

sdbondi Nov 7, 2024

SWvheerden Nov 7, 2024

SWvheerden Nov 7, 2024

sdbondi Nov 7, 2024

SWvheerden Nov 7, 2024

sdbondi Nov 7, 2024 •

edited

Loading

sdbondi Nov 7, 2024

hansieodendaal commented Nov 7, 2024

hansieodendaal left a comment

fix!: explictly close rpc connections when changing base node #6662

fix!: explictly close rpc connections when changing base node #6662

Conversation

sdbondi commented Nov 1, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

What process can a PR reviewer use to test or verify this change?

Breaking Changes

github-actions bot commented Nov 1, 2024 • edited Loading

Test Results (Integration tests)

github-actions bot commented Nov 1, 2024 • edited Loading

Test Results (CI)

hansieodendaal left a comment

Choose a reason for hiding this comment

SWvheerden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdbondi Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdbondi Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hansieodendaal commented Nov 7, 2024

hansieodendaal left a comment

Choose a reason for hiding this comment

sdbondi commented Nov 1, 2024 •

edited

Loading

github-actions bot commented Nov 1, 2024 •

edited

Loading

github-actions bot commented Nov 1, 2024 •

edited

Loading

sdbondi Nov 7, 2024 •

edited

Loading

sdbondi Nov 7, 2024 •

edited

Loading