diff --git a/docs/run/start/quickstart-builder-api.mdx b/docs/adv/advanced/quickstart-builder-api.mdx similarity index 100% rename from docs/run/start/quickstart-builder-api.mdx rename to docs/adv/advanced/quickstart-builder-api.mdx diff --git a/docs/adv/troubleshooting/client_configurations.md b/docs/adv/troubleshooting/client_configurations.md index b0416404af..49b52c23d8 100644 --- a/docs/adv/troubleshooting/client_configurations.md +++ b/docs/adv/troubleshooting/client_configurations.md @@ -14,7 +14,7 @@ Many execution, consensus, and validator clients need custom flags or parameters ### Consensus Client -Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../../run/start/quickstart-builder-api.mdx#consensus-clients). +Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../advanced/quickstart-builder-api.mdx#consensus-clients). ### Validator Client @@ -27,7 +27,7 @@ Required flags: ### Consensus Client -Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../../run/start/quickstart-builder-api.mdx#consensus-clients). +Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../advanced/quickstart-builder-api.mdx#consensus-clients). ### Validator Client @@ -56,7 +56,7 @@ Required flags: ### Consensus Client -Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../../run/start/quickstart-builder-api.mdx#consensus-clients). +Nothing specific for distributed validators is required. If you are configuring MEV-boost, consult the settings you need [here](../advanced/quickstart-builder-api.mdx#consensus-clients). ### Validator Client diff --git a/docs/adv/troubleshooting/errors.md b/docs/adv/troubleshooting/errors.md new file mode 100644 index 0000000000..2c1310becd --- /dev/null +++ b/docs/adv/troubleshooting/errors.md @@ -0,0 +1,291 @@ +--- +sidebar_position: 1 +description: Errors & Resolutions +--- + +# Errors & Resolutions + +All operators should try to restart their nodes and should check if they are on the latest stable version before attempting anything other configuration change. You can restart and update with the following commands: + +```shell +docker compose down +git pull +docker compose up +``` + +You can check your logs using +```shell +docker compose logs +``` + +## ENRs & Keys + +### How do I get my ENR if I want to generate it again? + +`cd` to the directory where your private keys are located (ex: `cd /path/to/charon/enr/private/key`) + +Run `docker run --rm -v "$(pwd):/opt/charon" obolnetwork/charon:v1.2.0 enr`. This prints the ENR on your screen. + +### What do I do if lose my `charon-enr-private-key`? + +For now, ENR rotation/replacement is not supported, it will be supported in a future release. Therefore, its advised to always keep a backup of your `charon-enr-private-key ` in a secure location (ex: cloud storage, USB Flash drive, etc.). + +### I can't find the keys anywhere +The `charon-enr-private-key` is generated inside a hidden folder `.charon`. To view it, run `ls -al` in your terminal. This step may be a bit different for Windows. +Else, if you are on macOS, press `Cmd + Shift + .` to view the `.charon` folder in the Finder application. + + +## Lighthouse + + +### Lighthouse says "downloading historical blocks" + +This means that Lighthouse is still syncing which will throw a lot of errors down the line. Wait for the sync before moving further. + + +### Lighthouse gives the error `failed to request attester duties` + +This indicates there is something wrong with your Lighthouse beacon node. This might be because the request buffer is full as your node is never starting consensus since it never gets the duties. + + +### Lighthouse gives the error `not enough time for a discovery seach` + +This could be linked to a internet connection being too slow or relying on a slow third-party service such as Infura. + + +## Beacon Node + +### `Error communicating with Beacon Node API` & `Error while connecting to beacon node event stream` + +This is likely due to Lighthouse not done syncing, wait and try again once synced. Can also be linked to Teku keystore issue. + +### Clock sync issues +Either your clock server time is off, or you are talking to a remote beacon client that is super slow (this is why we advise against using services like Infura). + + +### My beacon node API is flaky with lots of errors and timeouts + +A good quality beacon node API is critical to validator performance. It is always advised to run your own beacon node to ensure low latencies to boost validator performance. +Using 3rd party services like Infura's beacon node API has significant disadvantages since the quality is often low. Requests often return 500s or timeout. This results in lots of warnings and errors and failed duties. Running a local beacon node is always preferred. + +## Charon Errors + +### `Attester failed in consensus component` + +The required number of operators defined in your cluster-lock file is +probably not online to sign successfully. Make sure all operators are +running the latest version of Charon. To check if some peers are not online: +`docker logs charon-distributed-validator-node-charon-1 2>&1 | grep 'absent'` + +### `Load private key` + +Make sure you have successfully run a DKG before running the node. The key +should be created and placed in the right directory during the ceremony. +Also, make sure you are working in the right directory: +`charon-distributed-validator-node`. + +### `Failed to confirm node connection` +Wait for Teku & Lighthouse sync to be complete. + + +### `Reserve relay circuit: reservation failed` + +`RESERVATION_REFUSED` is returned by the libp2p relay when some maximum +limit has been reached. This is most often due to "maximum reservations per IP/peer". +This is when your Charon node is restarting or in some error loop and constantly +attempting to create new relay reservations reaching the maximum. + +To fix this error, stop your Charon node for 30mins before restarting it. +This should allow the relay enough time to reset your IP/peer limits and +should then allow new reservations. This could also be due to the relay +being overloaded in general, so reaching a server wide "maximum connections" +limit. This is an issue with relay scalability and we are working in a long +term fix for this. + +### `Error opening relay circuit: NO_RESERVATION` + +Error opening relay circuit NO_RESERVATION (204)` indicates the peer +isn't connected to the relay, so the the Charon client cannot connect to the +peer via the relay. That might be because the peer is offline or the peer is +configured to connect to a different relay. + +To fix this error, ensure the peer is online and configured with the exact +same `--p2p-relays` flag. + +### `Couldnt fetch duty data from the beacon node` + +`msgFetcher` indicates a duty failed in the fetcher component when +it failed to fetch the required data from the beacon node API. This indicates +a problem with the upstream beacon node. + +### `Couldnt aggregate attestation due to failed attester duty` + +`msgFetcherAggregatorNoAttData` indicates an attestation aggregation +duty failed in the fetcher component since it couldn't fetch the prerequisite +attestation data. This indicates the associated attestation duty failed to obtain +a cluster agreed upon value. + +### `Couldnt aggregate attestation due to insufficient partial v2 committee subscriptions` + +`msgFetcherAggregatorZeroPrepares` indicates an attestation aggregation +duty failed in the fetcher component since it couldn't fetch the prerequisite +aggregated v2 committee subscription. This indicates the associated prepare aggregation +duty failed due to no partial v2 committee subscription submitted by the cluster +validator clients. + +### `Couldnt aggregate attestation due to failed prepare aggregator duty` + +`msgFetcherAggregatorFailedPrepare` indicates an attestation aggregation +duty failed in the fetcher component since it couldn't fetch the prerequisite +aggregated v2 committee subscription. This indicates the associated prepare aggregation +duty failed. + +### `Couldnt propose block due to insufficient partial randao signatures` + +`msgFetcherProposerFewRandaos` indicates a block proposer duty failed +in the fetcher component since it couldn't fetch the prerequisite aggregated +RANDAO. This indicates the associated randao duty failed due to insufficient +partial randao signatures submitted by the cluster validator clients. + +### `Couldnt propose block due to zero partial randao signatures` + +`msgFetcherProposerZeroRandaos` indicates a block proposer duty failed +in the fetcher component since it couldn't fetch the prerequisite aggregated +RANDAO. This indicates the associated randao duty failed due to no partial randao +signatures submitted by the cluster validator clients. + +### `Couldnt propose block due to failed randao duty` + +`msgFetcherProposerZeroRandaos` indicates a block proposer duty failed +in the fetcher component since it couldn't fetch the prerequisite aggregated +RANDAO. This indicates the associated randao duty failed. + +### `Consensus algorithm didn't complete` + +`msgConsensus` indicates a duty failed in consensus component. This +could indicate that insufficient honest peers participated in consensus or p2p +network connection problems. + +### `Signed duty not submitted by local validator client` error + +`msgValidatorAPI` indicates that partial signature were never submitted +by the local validator client. This could indicate that the local validator client +is offline, or has connection problems with Charon, or has some other problem. +See validator client logs for more details. + +### `Bug: partial signature database didn't trigger partial signature exchange` + +`msgParSigDBInternal` indicates a bug in the partial signature database +as it is unexpected. + +### `No partial signatures received from peers` + +`msgParSigEx` indicates that no partial signature for the duty was +received from any peer. This indicates all peers are offline or p2p network connection +problems. + +### `Insufficient partial signatures received, minimum required threshold not reached` + +`msgParSigDBThreshold` indicates that insufficient partial signatures +for the duty was received from peers. This indicates problems with peers or p2p +network connection problems. + +### `Bug: threshold aggregation of partial signatures failed due to inconsistent signed data` + +`msgSigAgg` indicates that BLS threshold aggregation of sufficient +partial signatures failed. This indicates inconsistent signed data. This indicates +a bug in Charon as it is unexpected. + +### `Existing private key lock file found, another charon instance may be running on your machine` + +When you turn on the `--private-key-file-lock` option in Charon, it +checks for a special file called the private key lock file. This file has the +same name as the ENR private key file but with a `.lock` extension. +If the private key lock file exists and is not older than 5 seconds, Charon won't +run. It doesn't allow running multiple Charon instances with the same ENR private +key. If the private key lock file has a timestamp older than 5 seconds, Charon +will replace it and continue with its work. If you`re sure that no other Charon +instances are running, you can delete the private key lock file. + +### `Validator api 5xx response: mismatching validator client key share index, Mth key share submitted to Nth charon peer` + +The issue revolves around an invalid setup or deployment, where the +validators private key shares don't match the ENR private key. There may +have been a mix-up during deployment, leading to a mismatching validator +client key share index. + +For example:Imagine node N is Alice, and node M is Bob, the error would read: +` mismatching validator client key share index, Bob`s key share submitted to Alice`s charon node ` +Bob`s private key share(s) are imported to a VC that is connected to +Alice`s Charon node. This is a invalid setup/deployment. +Alice`s Charon node should only be connected to Alice`s VC. + +Check the partial public key shares of each node inside +cluster-lock.json and see that matches with the public key inside +`node(num)/validator_keys/keystore-0.json`. + +## Grafana + +### How to fix the Grafana dashboard? + +Sometimes, Grafana dashboard doesn't load any data first time around. +You can solve this by following the steps below: +- Click the Wheel Icon > Datasources. +- Click prometheus. +- Change the "Access" field from `Server (default)` to `Browser`. Press "Save & Test". It should fail. +- Change the "Access" field back to `Server (default)` and press "Save & Test". You should be presented with a green success icon saying "Data source is working" and you can return to the dashboard page. + +### `N/A` & `No data` in validator info panel +Can be linked to a Teku keystore issue. + + +## Prometheus + +### `Unauthorized: authentication error: invalid token` + You can ignore this error unless you have been contacted by the Obol Team + with monitoring credentials. In that case, follow [Monitoring your Node](../../run/running/monitoring.md) in our guides. It does not affect cluster performance or prevent the cluster from running. + + +## Docker + +### How to fix `permission denied` errors? + +Permission denied errors can come up in a variety of manners, particularly +on Linux and WSL for Windows systems. In the interest of security, the +charon docker image runs as a non-root user, and this user often does not +have the permissions to write in the directory you have checked out the code +to. This can be generally be fixed with some of the following: +- Running docker commands with `sudo`, if you haven't [setup docker to be run as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/) +- Changing the permissions of the `.charon` folder with the commands: + - `mkdir .charon` (if it doesn't already exist); + - `sudo chmod -R 666 .charon`. + +### I see a lot of errors after running `docker compose up` + It`s because both Nethermind and Lighthouse start syncing and so there's + connectivity issues among the containers. Simply let the containers run for + a while. You won't observe frequent errors when Nethermind finishes syncing. You + can also add a second beacon node endpoint for something like Infura by + adding a comma separated API URL to the end of + `CHARON_BEACON_NODE_ENDPOINTS` in the docker-compose.yml. + +### How do I fix the `plugin "loki" not found` error? + If you get the following error when calling `docker compose up`: + +`Error response from daemon: error looking up logging plugin loki: plugin "loki" not found`. + +Then it probably means that the Loki docker driver isn't installed. In that case, run the following command to install loki: + +`docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions`. + + +## Relay + +### ` Resolve IP of p2p external host flag: lookup replace.with.public.ip.or.hostname:no such host` + +Replace `replace.with.public.ip.or.hostname` in the +relay/docker-compose.yml with your real public IP or DNS hostname. + +### ` Timeout resolving bootnode ENR: context deadline exceeded` +The relay you are trying to connect to your peers via is offline or +unreachable. + diff --git a/docs/adv/troubleshooting/errors.mdx b/docs/adv/troubleshooting/errors.mdx deleted file mode 100644 index 1f93600b70..0000000000 --- a/docs/adv/troubleshooting/errors.mdx +++ /dev/null @@ -1,694 +0,0 @@ ---- -sidebar_position: 1 -description: Errors & Resolutions ---- - -# Errors & Resolutions - -All operators should try to restart their nodes and should check if they are on the latest stable version before attempting anything other configuration change as we are still in beta and frequently releasing fixes. You can restart and update with the following commands: - -```shell -docker compose down -git pull -docker compose up -``` - -You can check your logs using - -```shell -docker compose logs -``` - -
- -

ENRs & Keys

-
-
- -

What is an ENR?

-
-

- An ENR is shorthand for an{" "} - Ethereum Node Record. - It is a way to represent a node on a public network, with a reliable - mechanism to update its information. -
-
- At Obol we use ENRs to identify Charon nodes to one another such that they - can form clusters with the right Charon nodes and not impostors. ENRs have - private keys they use to sign updates to the - data contained - in their ENR. This private key is by default found at - .charon/charon-enr-private-key - , and should be kept secure, and not checked into version control. -
-
- An ENR looks something like this: -

-        
-          enr:-JG4QAgAOXjGFcTIkXBO30aUMzg2YSo1CYV0OH8Sf2s7zA2kFjVC9ZQ_jZZItdE8gA-tUXW-rWGDqEcoQkeJ98Pw7GaGAYFI7eoegmlkgnY0gmlwhCKNyGGJc2VjcDI1NmsxoQI6SQlzw3WGZ_VxFHLhawQFhCK8Aw7Z0zq8IABksuJEJIN0Y3CCPoODdWRwgj6E
-        
-      
-

-
-
- -

- How do I get my ENR if I want to generate it again? -

-
- -
-
- -

- {" "} - What do I do if lose my charon-enr-private-key?{" "} -

-
- -
-
- -

I can't find the keys anywhere

-
- -
-
-
- -

Lighthouse

-
-
- -

Downloading historical blocks

-
-

- This means that Lighthouse is still syncing which will throw a lot of - errors down the line. Wait for the sync before moving further. -

-
-
- -

- Failed to request attester duties error -

-
-

- Indicates there is something wrong with your Lighthouse beacon node. This - might be because the request buffer is full as your node is never starting - consensus since it never gets the duties. -

-
-
- -

- Not enough time for a discovery seach error -

-
-

- This could be linked to a internet connection being too slow or relying on - a slow third-party service such as Infura. -

-
-
-
- -

Beacon Node

-
-
- -

- Error communicating with Beacon Node API &{" "} - Error while connecting to beacon node event stream -

-
{" "} - This is likely due to Lighthouse not done syncing, wait and try again once - synced. Can also be linked to Teku keystore issue. -
-
- -

Clock sync issues

-
{" "} - Either your clock server time is off, or you are talking to a remote beacon - client that is super slow (this is why we advise against using services like - Infura). -
-
- -

- My beacon node API is flaky with lots of errors and timeouts -

-
- A good quality beacon node API is critical to validator performance. It is always - advised to run your own beacon node to ensure low latencies to boost validator - performance. -
-
- Using 3rd party services like Infura's beacon node API has significant - disadvantages since the quality is often low. Requests often return 500s or - timeout (Charon times out after 2s). This results in lots of warnings and - errors and failed duties. Running a local beacon node is always preferred. - We are not yet considering increasing the 2s timeout since that can have - knock-on effects. -
-
-
- -

Charon

-
-
- -

- Attester failed in consensus component error -

-
{" "} - The required number of operators defined in your cluster-lock file is - probably not online to sign successfully. Make sure all operators are - running the latest version of Charon. To check if some peers are not online:{" "} - - {" "} - docker logs charon-distributed-validator-node-charon-1 2>&1 | grep 'absent'{" "} - -
-
- -

- Load private key error -

-
{" "} - Make sure you have successfully run a DKG before running the node. The key - should be created and placed in the right directory during the ceremony. - Also, make sure you are working in the right directory:{" "} - charon-distributed-validator-node. -
-
- -

- Failed to confirm node connection error -

-
{" "} - Wait for Teku & Lighthouse sync to be complete. -
-
- -

- Reserve relay circuit: reservation failed error -

-
- RESERVATION_REFUSED is returned by the libp2p relay when some maximum - limit has been reached. This is most often due to "maximum reservations per IP/peer". - This is when your Charon node is restarting or in some error loop and constantly - attempting to create new relay reservations reaching the maximum. -
-
- To fix this error, stop your Charon node for 30mins before restarting it. - This should allow the relay enough time to reset your IP/peer limits and - should then allow new reservations. This could also be due to the relay - being overloaded in general, so reaching a server wide "maximum connections" - limit. This is an issue with relay scalability and we are working in a long - term fix for this. -
-
- -

- Error opening relay circuit: NO_RESERVATION error -

-
- Error opening relay circuit NO_RESERVATION (204) indicates the peer - isn't connected to the relay, so the the Charon client cannot connect to the - peer via the relay. That might be because the peer is offline or the peer is - configured to connect to a different relay. -
-
- To fix this error, ensure the peer is online and configured with the exact - same --p2p-relays flag. -
-
- -

- Couldn't fetch duty data from the beacon node error -

-
- msgFetcher indicates a duty failed in the fetcher component when - it failed to fetch the required data from the beacon node API. This indicates - a problem with the upstream beacon node. -
-
- -

- Couldn't aggregate attestation due to failed attester duty{" "} - error -

-
- msgFetcherAggregatorNoAttData indicates an attestation aggregation - duty failed in the fetcher component since it couldn't fetch the prerequisite - attestation data. This indicates the associated attestation duty failed to obtain - a cluster agreed upon value. -
-
- -

- - Couldn't aggregate attestation due to insufficient partial v2 - committee subscriptions - {" "} - error -

-
- msgFetcherAggregatorZeroPrepares indicates an attestation aggregation - duty failed in the fetcher component since it couldn't fetch the prerequisite - aggregated v2 committee subscription. This indicates the associated prepare aggregation - duty failed due to no partial v2 committee subscription submitted by the cluster - validator clients. -
-
- -

- - Couldn't aggregate attestation due to failed prepare aggregator duty - {" "} - error -

-
- msgFetcherAggregatorFailedPrepare indicates an attestation aggregation - duty failed in the fetcher component since it couldn't fetch the prerequisite - aggregated v2 committee subscription. This indicates the associated prepare aggregation - duty failed. -
-
- -

- - Couldn't propose block due to insufficient partial randao signatures - {" "} - error -

-
- msgFetcherProposerFewRandaos indicates a block proposer duty failed - in the fetcher component since it couldn't fetch the prerequisite aggregated - RANDAO. This indicates the associated randao duty failed due to insufficient - partial randao signatures submitted by the cluster validator clients. -
-
- -

- - Couldn't propose block due to zero partial randao signatures - {" "} - error -

-
- msgFetcherProposerZeroRandaos indicates a block proposer duty failed - in the fetcher component since it couldn't fetch the prerequisite aggregated - RANDAO. This indicates the associated randao duty failed due to no partial randao - signatures submitted by the cluster validator clients. -
-
- -

- Couldn't propose block due to failed randao duty error -

-
- msgFetcherProposerZeroRandaos indicates a block proposer duty failed - in the fetcher component since it couldn't fetch the prerequisite aggregated - RANDAO. This indicates the associated randao duty failed. -
-
- -

- Consensus algorithm didn't complete error -

-
- msgConsensus indicates a duty failed in consensus component. This - could indicate that insufficient honest peers participated in consensus or p2p - network connection problems. -
-
- -

- Signed duty not submitted by local validator client error -

-
- msgValidatorAPI indicates that partial signature were never submitted - by the local validator client. This could indicate that the local validator client - is offline, or has connection problems with Charon, or has some other problem. - See validator client logs for more details. -
-
- -

- - Bug: partial signature database didn't trigger partial signature - exchange - {" "} - error -

-
- msgParSigDBInternal indicates a bug in the partial signature database - as it is unexpected. -
-
- -

- No partial signatures received from peers error -

-
- msgParSigEx indicates that no partial signature for the duty was - received from any peer. This indicates all peers are offline or p2p network connection - problems. -
-
- -

- - Insufficient partial signatures received, minimum required threshold - not reached - {" "} - error -

-
- msgParSigDBThreshold indicates that insufficient partial signatures - for the duty was received from peers. This indicates problems with peers or p2p - network connection problems. -
-
- -

- - Bug: threshold aggregation of partial signatures failed due to - inconsistent signed data - {" "} - error -

-
- msgSigAgg indicates that BLS threshold aggregation of sufficient - partial signatures failed. This indicates inconsistent signed data. This indicates - a bug in Charon as it is unexpected. -
-
- -

- - Existing private key lock file found, another charon instance may be - running on your machine - {" "} - error -

-
- When you turn on the --private-key-file-lock option in Charon, it - checks for a special file called the private key lock file. This file has the - same name as the ENR private key file but with a .lock extension. - If the private key lock file exists and is not older than 5 seconds, Charon won't - run. It doesn't allow running multiple Charon instances with the same ENR private - key. If the private key lock file has a timestamp older than 5 seconds, Charon - will replace it and continue with its work. If you're sure that no other Charon - instances are running, you can delete the private key lock file. -
-
- -

- - Validator api 5xx response: mismatching validator client key share - index, Mth key share submitted to Nth charon peer - {" "} - error -

-
-

- The issue revolves around an invalid setup or deployment, where the - validators private key shares don't match the ENR private key. There may - have been a mix-up during deployment, leading to a mismatching validator - client key share index. -

-

For example:

- -
-
-
- -

Teku

-
-
- -

- Teku keystore file error{" "} -

-
{" "} - Teku sometimes logs an error which looks like{" "} - - Keystore file /opt/charon/validator_keys/keystore-0.json.lock already in - use - - . This can be solved by deleting the file(s) ending with .lock in - the folder .charon/validator_keys. It is caused by an unsafe shut - down of Teku (usually by double pressing Ctrl+C to shutdown containers - faster). -
-
-
- -

Grafana

-
-
- -

- {" "} - How to fix the Grafana dashboard?{" "} -

-
{" "} - Sometimes, Grafana dashboard doesn't load any data first time around. You - can solve this by following the steps below:{" "} - -
-
- -

- N/A & No data in validator info panel -

-
{" "} - Can be linked to a Teku keystore issue. -
-
-
- -

Prometheus

-
-
- -

- Unauthorized: authentication error: invalid token -

-
{" "} - You can ignore this error unless you have been contacted by the Obol Team - with monitoring credentials. In that case, follow{" "} - Getting Started Monitoring your Node in - our advanced guides. It does not affect cluster performance or prevent the - cluster from running. -
-
-
- -

Docker

-
-
- -

- {" "} - How to fix permission denied errors?{" "} -

-
{" "} - Permission denied errors can come up in a variety of manners, particularly - on Linux and WSL for Windows systems. In the interest of security, the - charon docker image runs as a non-root user, and this user often does not - have the permissions to write in the directory you have checked out the code - to. This can be generally be fixed with some of the following:{" "} - -
-
- -

- {" "} - I see a lot of errors after running docker compose up -

-
{" "} - It's because both Geth and Lighthouse start syncing and so there's - connectivity issues among the containers. Simply let the containers run for - a while. You won't observe frequent errors when Geth finishes syncing. You - can also add a second beacon node endpoint for something like Infura by - adding a comma separated API URL to the end of{" "} - CHARON_BEACON_NODE_ENDPOINTS in the docker-compose.yml. -
-
- -

- {" "} - How do I fix the plugin "loki" not found error? -

-
{" "} - If you get the following error when calling docker compose up: -
- - Error response from daemon: error looking up logging plugin loki: plugin - "loki" not found - - .
- Then it probably means that the Loki docker driver isn't installed. In that - case, run the following command to install loki: -
- - docker plugin install grafana/loki-docker-driver:latest --alias loki - --grant-all-permissions - - . -
-
-
- -

Relay

-
-
- -

- - {" "} - Resolve IP of p2p external host flag: lookup replace.with.public.ip.or.hostname: - no such host{" "} - {" "} - error -

-
{" "} - Replace replace.with.public.ip.or.hostname in the - relay/docker-compose.yml with your real public IP or DNS hostname. -
-
- -

- Timeout resolving bootnode ENR: context deadline exceeded {" "} - error -

-
{" "} - The relay you are trying to connect to your peers via is offline or - unreachable. -
-
-
- -

Lodestar

-
-
- -

- warn: Potential next epoch attester duties reorg error -

-
{" "} - Lodestar logs these warnings because Charon is not able to return proper{" "} - dependent_root value in getAttesterDuties API - response whenever Lodestar calls this API. This is because Charon uses{" "} - go-eth2-client for all the beacon API calls and it doesn't - provide dependent_root value in responses. We have reported - this to them{" "} - here. -
-
diff --git a/docs/learn/charon/networking.mdx b/docs/learn/charon/networking.mdx index 16eb429cac..ad54d53cc7 100644 --- a/docs/learn/charon/networking.mdx +++ b/docs/learn/charon/networking.mdx @@ -11,7 +11,7 @@ This document describes Charon's networking model which can be divided into two ## Internal Validator Stack -Internal Validator Stack
+Internal Validator Stack
Charon is a middleware DVT client and is therefore connected to an upstream beacon node and a downstream validator client is connected to it. Each operator should run the whole validator stack (all 4 client software types), either on the same machine or on different machines. The networking between diff --git a/docs/learn/futher-reading/ethereum_and_dvt.md b/docs/learn/futher-reading/ethereum_and_dvt.md index 7a62ccaa77..873b4895c5 100644 --- a/docs/learn/futher-reading/ethereum_and_dvt.md +++ b/docs/learn/futher-reading/ethereum_and_dvt.md @@ -26,7 +26,7 @@ The Ethereum website serves as a hub for all things Ethereum, catering to indivi If you haven’t yet heard, Distributed Validator Technology, or DVT, is the next big thing on The Merge section of the Ethereum roadmap. Learn more about this in our blog post: [What is DVT and How Does It Improve Staking on Ethereum?](https://blog.obol.tech/what-is-dvt-and-how-does-it-improve-staking-on-ethereum/) -Image Alt Text +Image Alt Text ***Vitalik's Ethereum Roadmap*** diff --git a/docs/learn/intro/faq.mdx b/docs/learn/intro/faq.mdx index a3acf5e448..27681baee1 100644 --- a/docs/learn/intro/faq.mdx +++ b/docs/learn/intro/faq.mdx @@ -134,6 +134,16 @@ A cluster is a group of nodes that act together as one or several validators whi It is possible to migrate your Charon node to another machine running the same config by moving the `.charon` folder with its contents to your new machine. Make sure the EL and CL on the new machine are synced before proceeding to the move to minimize downtime. +### What is an ENR? + +An ENR is shorthand for an [Ethereum Node Record](https://eips.ethereum.org/EIPS/eip-778). It is a way to represent a node on a public network, with a reliable mechanism to update its information. + +At Obol we use ENRs to identify Charon nodes to one another such that they can form clusters with the right Charon nodes and not impostors. ENRs have private keys they use to sign updates to the [data contained in their ENR](https://enr-viewer.com/). This private key is by default found at `.charon/charon-enr-private-key`, and should be kept secure, and not checked into version control. + +An ENR looks something like this: +`enr:-JG4QAgAOXjGFcTIkXBO30aUMzg2YSo1CYV0OH8Sf2s7zA2kFjVC9ZQ_jZZItdE8gA-tUXW-rWGDqEcoQkeJ98Pw7GaGAYFI7eoegmlkgnY0gmlwhCKNyGGJc2VjcDI1NmsxoQI6SQlzw3WGZ_VxFHLhawQFhCK8Aw7Z0zq8IABksuJEJIN0Y3CCPoODdWRwgj6E` + + ## Distributed Key Generation ### What are the min and max numbers of operators for a Distributed Validator? @@ -186,4 +196,4 @@ Another aspect to be aware of is how the splitting of principal from rewards wor You can check if the containers on your node are outputting errors by running `docker compose logs` on a machine with a running cluster. -Diagnose some common errors and view their resolutions [here](../../adv/troubleshooting/errors.mdx). +Diagnose some common errors and view their resolutions [here](../../adv/troubleshooting/errors.md). diff --git a/docs/learn/intro/obol-vs-others.md b/docs/learn/intro/obol-vs-others.md index f9dde57760..5283ff5e21 100644 --- a/docs/learn/intro/obol-vs-others.md +++ b/docs/learn/intro/obol-vs-others.md @@ -21,12 +21,6 @@ In an Obol DV cluster, nodes use LibP2P to communicate directly with each other, ![Cluster Independence](/img/ClusterIndependence.png) -## Cluster independance: No reliance on a common P2P gossip network - -In an Obol DV cluster, nodes use LibP2P to communicate directly with each other, and communications are end-to-end encrypted with TSL. This direct communication of nodes within a cluster improves latency, and makes cluster communications harder to attack with a denial of service (DOS) attack. It also allows an Obol DV cluster to be run within a private network. This may allow cost savings on data egress costs, for operators running cluster nodes across multiple locations of a single cloud provider, for example. - -![Gossip Network](/img/GossipNetwork.png) - ## Works with existing validator clients and PKI We built Obol’s DV implementation as a secure and trust-minimised middleware architecture. Our middleware client, Charon, doesn’t replace anything in the client stack, instead it sits between the consensus and validator clients. Node operators integrating the Charon DVT middleware into their stack can continue to use the same clients and private key infrastructure as before, albeit with a different key generation method. diff --git a/docs/run/prepare/deployment-best-practices.md b/docs/run/prepare/deployment-best-practices.md index fc998c791d..a6808a5857 100644 --- a/docs/run/prepare/deployment-best-practices.md +++ b/docs/run/prepare/deployment-best-practices.md @@ -70,7 +70,7 @@ Cluster sizes that allow for Byzantine Fault Tolerance are recommended as they a ## MEV-Boost Relays -MEV relays are configured at the Consensus Layer or MEV-boost client level. Refer to our [guide](../../run/start/quickstart-builder-api.mdx) to ensure all necessary configuration has been applied to your clients. As with all validators, low latency during proposal opportunities is extremely important. By default, MEV-Boost waits for all configured relays to return a bid, or will timeout if any have not returned a bid within 950ms. This default timeout is generally too slow for a distributed cluster (think of this time as additive to the time it takes the cluster to come to consensus, both of which need to happen within a 2 second window for optimal proposal broadcasting). It is likely better to only list relays that are located geographically near your node, so that once all relays respond (e.g. in < 50ms) your cluster will move forward with the proposal. +MEV relays are configured at the Consensus Layer or MEV-boost client level. Refer to our [guide](../../adv/advanced/quickstart-builder-api.mdx) to ensure all necessary configuration has been applied to your clients. As with all validators, low latency during proposal opportunities is extremely important. By default, MEV-Boost waits for all configured relays to return a bid, or will timeout if any have not returned a bid within 950ms. This default timeout is generally too slow for a distributed cluster (think of this time as additive to the time it takes the cluster to come to consensus, both of which need to happen within a 2 second window for optimal proposal broadcasting). It is likely better to only list relays that are located geographically near your node, so that once all relays respond (e.g. in < 50ms) your cluster will move forward with the proposal. Use Charon's [`test mev` command](../../run/prepare/test-command.mdx#test-mev-relay) to test a number of your preferred relays, and select the two or three relays with the lowest latency to your node(s), you do not need to have the same relays on each node in a cluster. diff --git a/docs/run/prepare/how_where_DVs.md b/docs/run/prepare/how_where_DVs.md index cdecdde8e1..648a64569c 100644 --- a/docs/run/prepare/how_where_DVs.md +++ b/docs/run/prepare/how_where_DVs.md @@ -20,7 +20,6 @@ description: How and where to run DVs ## Quickstart Guides - [Run a DV alone](../start/quickstart_alone.mdx) - [Run a DV as a group](../start/quickstart_group.mdx) -- [Run a DV using the SDK](../../adv/advanced/quickstart-sdk.mdx) ## CL+VC Combinations: @@ -31,19 +30,12 @@ description: How and where to run DVs - 🟠: Duties may fail for this combination - πŸ”΄: One or more duties fails consistently -| Consensus πŸ‘‡ Validator πŸ‘‰ | Teku v24.8.0 | Lighthouse v5.3.0[^lhagg] | Lodestar v1.20.2 | Nimbus v24.7.0 | Prysm [PR](https://github.com/prysmaticlabs/prysm/pull/13995) | Remarks | -|-------------------------|--------------|-------------------|------------------|----------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| -| Teku v24.8.0 | 🟑 | 🟑 | βœ… | βœ… | 🟠 | Teku `bn` needs the `--validators-graffiti-client-append-format=DISABLED` flag in order to produce blocks properly. Teku `vc` are only failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale.| -| Lighthouse v5.3.0 | 🟑 | 🟑 | βœ… | βœ… | 🟠 | Lighthouse `vc` are only failing aggregation duties, which are not directly penalised but impact network density at high scale. | -| Nimbus v24.7.0 | 🟑 | 🟑 | βœ… | βœ… | βœ… | Nimbus beacon nodes requires that you add the following flag to **charon run**: `charon run --feature-set-enable=json_requests` | -| Prysm v5.0.3 | 🟑 | 🟑 | βœ… | βœ… | βœ… | Prysm `validator` needs a particular [pull request](https://github.com/prysmaticlabs/prysm/pull/13995) merged and released for aggregation duties to succeed. | -| Lodestar v1.20.2 | 🟑 | 🟑 | βœ… | βœ… | πŸ”΄ | | +| Validator πŸ‘‰ Consensus πŸ‘‡ | Teku v24.10.3 | Lighthouse v5.3.0 | Lodestar v1.23.0 | Nimbus v24.10.0 | Prysm v5.1.2 | Remarks | +|---------------------------|---------------|-------------------|------------------|-----------------|--------------|---------| +| Teku v24.10.3 | βœ… | 🟑 | βœ… | βœ… | 🟠 | Teku `beacon node` needs the `--validators-graffiti-client-append-format=DISABLED` flag in order to produce blocks properly. Teku `validator client` is only failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale. | +| Lighthouse v5.3.0 | βœ… | 🟑 | βœ… | βœ… | βœ… | Lighthouse `validator client` is only failing aggregation duties, which are not directly penalised but impact network density at high scale. | +| Lodestar v1.23.0 | βœ… | 🟑 | βœ… | βœ… | 🟠 | | +| Nimbus v24.10.0 | βœ… | 🟑 | βœ… | βœ… | 🟠 | | +| Prysm v5.1.2 | βœ… | 🟑 | βœ… | βœ… | βœ… | Prysm `validator client` is failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale. In some combinations rare failures of attestation and proposal duties were observed (0-2% per epoch). | -[^lhagg]: sync committee and aggregator duties are not yet supported in a DV setup by Lighthouse, all other duties work as expected. - - -### Note: - \ No newline at end of file +Note: for the most recent compatability information, please see the [release notes](https://github.com/ObolNetwork/charon/releases/) from the most recent release of Charon. \ No newline at end of file diff --git a/docs/run/running/activate-dv.mdx b/docs/run/running/activate-dv.mdx index f1627775fd..f3732606f7 100644 --- a/docs/run/running/activate-dv.mdx +++ b/docs/run/running/activate-dv.mdx @@ -22,16 +22,5 @@ Use any of the following tools to deposit. Please use the third-party tools at y * Obol Distributed Validator Launchpad * ethereum.org Staking Launchpad -* From a SAFE Multisig:
-(Repeat these steps for every validator to deposit in your cluster) - * From the SAFE UI, click on New Transaction then Transaction Builder to create a new custom transaction - * Enter the beacon chain contract for Deposit on mainnet - you can find it here - * Fill the transaction information - * Set amount to 32 in ETH - * Use your deposit-data.json to fill the required data : pubkey,withdrawal credentials,signature,deposit_data_root. Make sure to prefix the input with 0x to format them in bytes - * Click on Add transaction - * Click on Create Batch - * Click on Send Batch, you can click on Simulate to check if the transaction will execute successfully - * Get the minimum threshold of signatures from the other addresses and execute the custom transaction The activation process can take a minimum of 16 hours, with the maximum time to activation being dictated by the length of the activation queue, which can be weeks. diff --git a/docs/run/start/quickstart_group.mdx b/docs/run/start/quickstart_group.mdx index c6f7fa197a..7acd44d5c3 100644 --- a/docs/run/start/quickstart_group.mdx +++ b/docs/run/start/quickstart_group.mdx @@ -44,7 +44,7 @@ Please make sure to create a backup of the private key at `.charon/charon-enr-pr ::: :::tip -If instead of being shown your `enr` you see an error saying `permission denied` then you may need to [update your docker permissions](../../adv/troubleshooting/errors.mdx#docker-permission-denied-error) to allow the command to run successfully. +If instead of being shown your `enr` you see an error saying `permission denied` then you may need to [update your docker permissions](../../adv/troubleshooting/errors.md#docker-permission-denied-error) to allow the command to run successfully. ::: diff --git a/docusaurus.config.js b/docusaurus.config.js index 6270ebd574..19433966cf 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -233,6 +233,10 @@ const config = { to: '/sdk', from: '/docs/sdk', }, + { + to: 'next/docs/adv/advanced/quickstart-builder-api', + from: '/docs/run/start/quickstart-builder-api', + }, //Redirect from multiple old paths to the new path // { // to: '/docs/newDoc2', diff --git a/static/img/CharonProtocolLayers.png b/static/img/CharonProtocolLayers.png new file mode 100644 index 0000000000..11f57d6b6e Binary files /dev/null and b/static/img/CharonProtocolLayers.png differ diff --git a/static/img/EthereumRoadmapDec2023.jpeg b/static/img/EthereumRoadmapDec2023.jpeg new file mode 100644 index 0000000000..9e0f4e929f Binary files /dev/null and b/static/img/EthereumRoadmapDec2023.jpeg differ diff --git a/static/img/ObolvsOthers.png b/static/img/ObolvsOthers.png index c8e58ae7c6..bd29eea448 100644 Binary files a/static/img/ObolvsOthers.png and b/static/img/ObolvsOthers.png differ diff --git a/static/img/workflow.jpg b/static/img/workflow.jpg index a93d8396c3..6fc2be2520 100644 Binary files a/static/img/workflow.jpg and b/static/img/workflow.jpg differ diff --git a/versioned_docs/version-v1.2.0/run/prepare/how_where_DVs.md b/versioned_docs/version-v1.2.0/run/prepare/how_where_DVs.md index cdecdde8e1..f1293d9ec8 100644 --- a/versioned_docs/version-v1.2.0/run/prepare/how_where_DVs.md +++ b/versioned_docs/version-v1.2.0/run/prepare/how_where_DVs.md @@ -31,19 +31,12 @@ description: How and where to run DVs - 🟠: Duties may fail for this combination - πŸ”΄: One or more duties fails consistently -| Consensus πŸ‘‡ Validator πŸ‘‰ | Teku v24.8.0 | Lighthouse v5.3.0[^lhagg] | Lodestar v1.20.2 | Nimbus v24.7.0 | Prysm [PR](https://github.com/prysmaticlabs/prysm/pull/13995) | Remarks | -|-------------------------|--------------|-------------------|------------------|----------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| -| Teku v24.8.0 | 🟑 | 🟑 | βœ… | βœ… | 🟠 | Teku `bn` needs the `--validators-graffiti-client-append-format=DISABLED` flag in order to produce blocks properly. Teku `vc` are only failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale.| -| Lighthouse v5.3.0 | 🟑 | 🟑 | βœ… | βœ… | 🟠 | Lighthouse `vc` are only failing aggregation duties, which are not directly penalised but impact network density at high scale. | -| Nimbus v24.7.0 | 🟑 | 🟑 | βœ… | βœ… | βœ… | Nimbus beacon nodes requires that you add the following flag to **charon run**: `charon run --feature-set-enable=json_requests` | -| Prysm v5.0.3 | 🟑 | 🟑 | βœ… | βœ… | βœ… | Prysm `validator` needs a particular [pull request](https://github.com/prysmaticlabs/prysm/pull/13995) merged and released for aggregation duties to succeed. | -| Lodestar v1.20.2 | 🟑 | 🟑 | βœ… | βœ… | πŸ”΄ | | - -[^lhagg]: sync committee and aggregator duties are not yet supported in a DV setup by Lighthouse, all other duties work as expected. - - -### Note: - \ No newline at end of file +| Validator πŸ‘‰ Consensus πŸ‘‡ | Teku v24.10.3 | Lighthouse v5.3.0 | Lodestar v1.23.0 | Nimbus v24.10.0 | Prysm v5.1.2 | Remarks | +|---------------------------|---------------|-------------------|------------------|-----------------|--------------|---------| +| Teku v24.10.3 | βœ… | 🟑 | βœ… | βœ… | 🟠 | Teku `beacon node` needs the `--validators-graffiti-client-append-format=DISABLED` flag in order to produce blocks properly. Teku `validator client` is only failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale. | +| Lighthouse v5.3.0 | βœ… | 🟑 | βœ… | βœ… | βœ… | Lighthouse `validator client` is only failing aggregation duties, which are not directly penalised but impact network density at high scale. | +| Lodestar v1.23.0 | βœ… | 🟑 | βœ… | βœ… | 🟠 | | +| Nimbus v24.10.0 | βœ… | 🟑 | βœ… | βœ… | 🟠 | | +| Prysm v5.1.2 | βœ… | 🟑 | βœ… | βœ… | βœ… | Prysm `validator client` is failing aggregation duties 50% of the time, which are not directly penalised but impact network density at high scale. In some combinations rare failures of attestation and proposal duties were observed (0-2% per epoch). | + +Note: for the most recent compatability information, please see the [release notes](https://github.com/ObolNetwork/charon/releases/) from the most recent release of Charon. \ No newline at end of file