Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus service fails to start #842

Closed
cndolo opened this issue Jun 24, 2021 · 6 comments
Closed

Consensus service fails to start #842

cndolo opened this issue Jun 24, 2021 · 6 comments
Assignees

Comments

@cndolo
Copy link

cndolo commented Jun 24, 2021

Hi!
I would like to operate a validator and have followed the build instructions as meticulously as possible.
The service starts but then fails with the error panicked at 'Failed starting consensus service.
I do not understand the lengthy error message and would appreciate any help in order to get the validator up and running.

This is the command I am using to run the service:
I have attached the output as a text file.

SGX_MODE=HW IAS_MODE=DEV target/release/consensus-service \
 --client-responder-id cndolo:443     --peer-responder-id cndolo.com:8443 \
  --network /etc/network.toml     --ias-api-key="xxxx"   \
  --ias-spid="yyy"  \
  --peer-listen-uri='mcp://0.0.0.0:8443/'     --msg-signer-key my-private-key \
  --sealed-block-signing-key  0x2c1a561c4ab64cbc04bfa445cdf7bed9b2ad6f6b04d38d3137f3622b29fdb30e 
  --admin-listen-uri=insecure-mca://127.0.0.1:9091/ --origin-block-path ledger/ \
  --ledger-path /home/cn/mobcoin/validator-db/

Some notes on the parameters:

  • ias-api-key - Passing the Primary key associated with my Intel subscription, is that correct?
  • sealed-block-signing - MRSIGNER in the release notes.
  • client-responder-id /peer-responder-id - Currently these are not actual domains as my node is not reachable.

I am running the most recent release(v1.1.0) on Ubuntu 20.04 LTS and built the service using the provided docker file.
Thank you!
consensus-service.log

@jcape
Copy link
Contributor

jcape commented Jun 24, 2021

Hi,

MobileCoin services with enclaves first attempt to attest to themselves, and cache the report from Intel inside the enclave for service to clients (this is refreshed on a periodic basis). This is done in order to ensure we have some flexibility in the event IAS goes down (i.e. an IAS service outage is not a MobileCoin outage), as well as preventing starting on a system that will not work as-is.

If the self-attestation / self-verification fails, then it will print the VerificationReportData structure (i.e. the contents of the report under signature). The Intel IAS report you're seeing indicates 4 SAs that your system is vulnerable to that are preventing this from completing:

  • INTEL-SA-00161, which is the L1TF "Foreshadow" vulnerability---this will require a BIOS and Firmware update.
  • INTEL-SA-00219, which is a permissions issue in integrated graphics---this will require disabling integrated graphics in the BIOS.
  • INTEL-SA-00289, which is the voltage settings fault injection, aka "PlunderVolt"---this will also require a BIOS/Firmware update
  • INTEL-SA-00334, which is the LVI vulnerability. This has been mitigated in our build process, so enclaves produced by the Foundation are known to be secure.

So the steps to start moving forward will be:

  1. Update BIOS and the relevant Ubuntu intel-firmware package.
  2. Configure the BIOS to disable the integrated graphics package (or use a machine without integrated graphics)

@jcape jcape self-assigned this Jun 24, 2021
@cndolo
Copy link
Author

cndolo commented Jun 27, 2021

Thank you for your helpful response. I have been working on your recommendations, i.e.

Update BIOS and the relevant Ubuntu intel-firmware package.

The UEFI BIOS was up-to-date, however there was a microcode update I had not yet installed.

Configure the BIOS to disable the integrated graphics package (or use a machine without integrated graphics)

I am working on a Lenovo T470 ThinkPad, and AFAIK it is not possible to disable the integrated graphics in the BIOS.
Does this mean I will simply not be able to run the consensus service on this machine?

The error message has now changed (despite 2) to SGX_ERROR_NO_DEVICE. Am I right to assume that this is not a MobileCoin issue but rather something related to the configuration of my SGX?

@jcape
Copy link
Contributor

jcape commented Jun 30, 2021

I am working on a Lenovo T470 ThinkPad, and AFAIK it is not possible to disable the integrated graphics in the BIOS.
Does this mean I will simply not be able to run the consensus service on this machine?

Unfortunately yes, Intel-SA-00219 boils down to "On-board GPUs have unrestricted read access to the first 8 bytes of every cache line in the CPU," which undermines the privacy enhancements we're getting out of SGX. There's a good technical discussion of this SA at the Dalek Cryptography Github, which ultimately concludes that mitigating it in any userland runtime is simply impractical.

It may be possible to disable the GPU and make Xorg/Wayland use pure software rendering, but obviously that significantly reduces the usefulness of the machine.

The error message has now changed (despite 2) to SGX_ERROR_NO_DEVICE. Am I right to assume that this is not a MobileCoin issue but rather something related to the configuration of my SGX?

In this case, this error is likely a missing or unloaded SGX driver. We're currently using the EPID driver which ships with SGX 2.13.3 (link is for Ubuntu 20.04). As an out-of-tree kernel driver, it will need to be re-installed with each kernel upgrade.

@cndolo
Copy link
Author

cndolo commented Jul 1, 2021

Thank you!

Unfortunately yes, Intel-SA-00219 boils down to "On-board GPUs have unrestricted read access to the first 8 bytes of every cache line in the CPU," which undermines the privacy enhancements we're getting out of SGX.

Does this apply to simulation mode as well?

I have made some progress and I am now able to run the sample programs provided by Intel both in Simulation and Hardware mode.

However, consensus-service in SW mode (using the command above) panics with the error SGX_ERROR_INVALID_PARAMETER - is it referring to one of the passed CL arguments? I have attached the backtrace.

consensus-service.log

@jcape
Copy link
Contributor

jcape commented Jul 1, 2021

Simulation mode should run, yes.

One thing I missed in the original post: the IAS_MODE and SGX_MODE environment variables in your original post are compile-time, and have no effect on runtime. The READMEs aren't super-clear on this point, because the examples all use cargo run (which would rebuild when those variables change).

My suspicion is that the build itself is still hardware, and it's complaining because the aesmd service is not running---my guess is that Intel won't provision new keys to the host if the GPU is still turned on.

@cndolo
Copy link
Author

cndolo commented Jul 4, 2021

I had since figured out that the environment variables are compile time directives and had been passing them cargo build. Once the aesm service was running, I runcargo clean before building in SW mode. The consensus-service's behaviour is now as expected, i.e. remote attestation fails (kindly correct me if I am wrong).

OTOH, HW mode still fails with the same errors as when I first opened this issue. I am certain that there are no BIOS or firmware updates to install but the GPU is still on. So I seem to be running around in circles here and my machine is likely not capable of running a validator.

Thanks a lot for your time. If I get the appropriate hardware in the near future, I will look into this again.

@jcape jcape closed this as completed Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants