-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributable failures (feature 36/37) #1044
base: master
Are you sure you want to change the base?
Conversation
I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible? |
I don't have test vectors yet, but I can produce them. Will add them to this PR when ready. Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful? |
4b48481
to
24b10d5
Compare
@thomash-acinq added a happy fat error test vector. |
24b10d5
to
76dbf21
Compare
09-features.md
Outdated
@@ -41,6 +41,7 @@ The Context column decodes as follows: | |||
| 20/21 | `option_anchor_outputs` | Anchor outputs | IN | `option_static_remotekey` | [BOLT #3](03-transactions.md) | | |||
| 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions | IN | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]| | |||
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] | | |||
| 28/29 | `option_fat_error` | Can generate/relay fat errors in `update_fail_htlc` | IN | | [BOLT #4][bolt04-fat-errors] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?
I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.
Added fat error signaling to the PR. |
76dbf21
to
2de919a
Compare
I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
implying that we need to concatenate them in that order. But in your code you follow a different order:
I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet. |
Oh great catch! Will produce a new vector. |
2de919a
to
bcf022b
Compare
@thomash-acinq updated vector |
Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139 |
bcf022b
to
6bf0729
Compare
@thomash-acinq, shall we pick up attributable errors again where we left? |
Yes we should. I'm on vacation right now but I will find time for it in the next weeks. |
Looking at this with fresh eyes, I am wondering if a variation on this could be a interesting option (not sure if it has already been brought up previously - this projects now spans multiple years): Extend Legacy intermediate nodes would just drop the attributable error again. Intermediate nodes that do support attr errs would process both failure fields. If such an intermediate nodes encounters a missing attr errs tlv field, it will fill that field with the wrapped legacy failure, to support partial paths. For legacy senders, the legacy failure with its single hmac will always be available. Senders supporting attr errs would process the attr err tlv field, if available. The advantage of this setup is that no signaling in the forward path is needed. Nodes could still advertise the feature in their node announcement, but perhaps even that isn't necessary. At some point, senders may want to avoid nodes not supporting attributable errors. They may be able to identify such nodes via attr errors that wrap a legacy failure, indicating that the node directly downstream doesn't support them. Perhaps it isn't even necessary to duplicate the failure message itself in the attr errs tlv field. Instead it can take it from the legacy field (minus the legacy hmac), so that the attr errs tlv field just contains the payloads and hmacs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great proposal! Quite an elegant idea
There are some elusive details that make this work, which might need some more coverage
`-` | `-` | `-` | `hmac_0'_1` | `hmac_0'_0` | `hmac_1'_0` | ||
|
||
The former `hmac_x'_y` now becomes `hmac_x+1_y`. The left-most hmac for | ||
each hop is discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could elaborate a bit more on why these hmacs can be discarded / considered irrelevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bit more explanation.
04-onion-routing.md
Outdated
and verifies the HMAC that corresponds to the hop's position in the path, using | ||
each hop's `um` key. | ||
|
||
When the origin node encounters a payload that signals that it is a final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also include the "unhappy" path here, focusing on:
- why origin attributes blame/error on first decoding failure (if that's still the intended way in which things work)
- how an intermediary node that maliciously flips bits cannot shake the blame off themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are origin attributes?
I think I've elaborated on this more in the latest iteration of the PR.
04-onion-routing.md
Outdated
The former `hmac_x'_y` now becomes `hmac_x+1_y`. The left-most hmac for | ||
each hop is discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the forwarding node always pick the correct pruning positions if it doesn't know its own position in the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the forwarding node can pick the correct pruning positions. The block of hmacs that the forwarding node receives contains series of hmacs for all possible path lengths up to 20 hops. The forwarding node obviously knows that it is also part of the path, so there can never be an additional 20 hops (this would bring the total to 21). This also means that the last hmac for every hmac series won't ever be useful.
Will try to add some more text to explain this.
04-onion-routing.md
Outdated
|
||
At each step backwards, one hmac for every hop can be pruned. Rather than | ||
holding on to 20 * 20 = 400 hmacs, pruning reduces the total space requirement | ||
to 210 hmacs. More on pruning below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to 210 hmacs
there was the (max_hops * (max_hops + 1)) / 2
formula in a previous diff, explaining how this number was produced, seems to have been trimmed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added explanation of 210.
04-onion-routing.md
Outdated
channel. | ||
The per-hop payload consists of the following fields: | ||
* [`byte`:`payload_source`] | ||
* [`uint32`:`hold_time_ms`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I now realize is that even if there's a node on the path that messes with the data and/or hmacs, the upstream nodes will still have valid hmacs and also valid hold_time_ms
.
Information that isn't all that useful in that case though. The sender will simply proceed with penalizing the bad node, retry, and only process timing information after "proper" failures.
If we go for the idea described in #1044 (comment), it may be unnecessary to keep that first payload byte that is either 0 or 1 to indicate intermediate or failing node. Because the 'legacy' hmac is still present, that can also be used to identify the failing node. |
d12376b
to
373b9f9
Compare
Thanks for your review @GeorgeTsagk. Comments addressed. |
PR rebased and updated to reflect the new approach with a tlv extension to |
calculated. The redundant HMACs will cover portions of the zero-initialized | ||
data. | ||
|
||
Finally a new key is generated, using the key type `ammagext`. This key is then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to derive a new key rather than continue pulling bits for chacha from the ammag
key, to minimize the chance of somehow reusing the stream. Alternatively could use a different nonce, but I believe that would be new in lightning?
a82caca
to
bdf9e56
Compare
bdf9e56
to
c2459c7
Compare
Updated test vectors |
Failure attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.
Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.
This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.
For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html.
LND implementation: lightningnetwork/lnd#7139
LDK implementation: lightningdevkit/rust-lightning#3611
Eclair implementation: ACINQ/eclair#2519