Attributable failures (feature 36/37) #1044

joostjager · 2022-11-21T11:28:26Z

Failure attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.

For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html.

LND implementation: lightningnetwork/lnd#7139

LDK implementation: lightningdevkit/rust-lightning#3611

Eclair implementation: ACINQ/eclair#2519

04-onion-routing.md

thomash-acinq · 2022-12-06T14:53:54Z

I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible?
The design seems good to me, but as I've said previously, I think keeping hop payloads and hmacs for 8 nodes only (instead of 27) is enough for almost all use cases and would give us huge size savings.

joostjager · 2022-12-06T15:41:32Z

I don't have test vectors yet, but I can produce them. Will add them to this PR when ready.

Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful?

joostjager · 2022-12-06T16:53:09Z

@thomash-acinq added a happy fat error test vector.

04-onion-routing.md

joostjager · 2022-12-07T09:27:19Z

09-features.md

@@ -41,6 +41,7 @@ The Context column decodes as follows:
 | 20/21 | `option_anchor_outputs`          | Anchor outputs                                            | IN       | `option_static_remotekey` | [BOLT #3](03-transactions.md)         |
 | 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions   | IN       | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]|
 | 26/27 | `option_shutdown_anysegwit`         | Future segwit versions allowed in `shutdown`              | IN       |                   | [BOLT #2][bolt02-shutdown]   |
+| 28/29 | `option_fat_error`               | Can generate/relay fat errors in `update_fail_htlc`       | IN       |                   | [BOLT #4][bolt04-fat-errors]   |


I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?

I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.

joostjager · 2022-12-07T09:27:48Z

Added fat error signaling to the PR.

04-onion-routing.md

thomash-acinq · 2022-12-09T17:12:30Z

I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
In the spec you write that the hmac covers

failure_len, failuremsg, pad_len and pad.

The first y+1 payloads in payloads. For example, hmac_0_2 would cover all three payloads.

y downstream hmacs that correspond to downstream node positions relative to x. For example, hmac_0_2 would cover hmac_1_1 and hmac_2_0.

implying that we need to concatenate them in that order. But in your code you follow a different order:

// Include payloads including our own.
_, _ = hash.Write(payloads[:(NumMaxHops-position)*payloadLen])

// Include downstream hmacs.
var hmacsIdx = position + NumMaxHops
for j := 0; j < NumMaxHops-position-1; j++ {
	_, _ = hash.Write(
		hmacs[hmacsIdx*sha256.Size : (hmacsIdx+1)*sha256.Size],
	)

	hmacsIdx += NumMaxHops - j - 1
}

// Include message.
_, _ = hash.Write(message)

I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet.

joostjager · 2022-12-09T17:16:40Z

Oh great catch! Will produce a new vector.

joostjager · 2022-12-12T07:34:33Z

@thomash-acinq updated vector

04-onion-routing.md

joostjager · 2023-01-23T15:38:27Z

Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139

04-onion-routing.md

joostjager · 2025-02-13T10:34:39Z

@thomash-acinq, shall we pick up attributable errors again where we left?

thomash-acinq · 2025-02-17T13:00:29Z

@thomash-acinq, shall we pick up attributable errors again where we left?

Yes we should. I'm on vacation right now but I will find time for it in the next weeks.

joostjager · 2025-02-24T10:11:52Z

We could also consider an alternative system where the use-attributable-errors bit is in the update_add_htlc message instead of being inside the onion. That way we always have access to it even if the onion is invalid (or if we decide to reject the HTLC before even reading the onion, for instance if the CLTV is too low). And by adding a new error message to wrap a legacy error, we could even get partial attributability if the first nodes of the route support attributable errors but not the last ones.

Looking at this with fresh eyes, I am wondering if a variation on this could be a interesting option (not sure if it has already been brought up previously - this projects now spans multiple years):

Extend update_fail_htlc with a tlv field at the end that holds the attributable error. A failure source that supports it, would populate both the legacy failure field as well as the attr error tlv field.

Legacy intermediate nodes would just drop the attributable error again. Intermediate nodes that do support attr errs would process both failure fields. If such an intermediate nodes encounters a missing attr errs tlv field, it will fill that field with the wrapped legacy failure, to support partial paths.

For legacy senders, the legacy failure with its single hmac will always be available. Senders supporting attr errs would process the attr err tlv field, if available.

The advantage of this setup is that no signaling in the forward path is needed. Nodes could still advertise the feature in their node announcement, but perhaps even that isn't necessary. At some point, senders may want to avoid nodes not supporting attributable errors. They may be able to identify such nodes via attr errors that wrap a legacy failure, indicating that the node directly downstream doesn't support them.

Perhaps it isn't even necessary to duplicate the failure message itself in the attr errs tlv field. Instead it can take it from the legacy field (minus the legacy hmac), so that the attr errs tlv field just contains the payloads and hmacs.

GeorgeTsagk

Great proposal! Quite an elegant idea
There are some elusive details that make this work, which might need some more coverage

GeorgeTsagk · 2025-02-24T11:32:36Z

04-onion-routing.md

+  `-` | `-` | `-` | `hmac_0'_1` | `hmac_0'_0` | `hmac_1'_0`
+
+  The former `hmac_x'_y` now becomes `hmac_x+1_y`. The left-most hmac for
+  each hop is discarded.


could elaborate a bit more on why these hmacs can be discarded / considered irrelevant

Added a bit more explanation.

GeorgeTsagk · 2025-02-24T11:34:35Z

04-onion-routing.md

+and verifies the HMAC that corresponds to the hop's position in the path, using
+each hop's `um` key.
+
+When the origin node encounters a payload that signals that it is a final


should also include the "unhappy" path here, focusing on:

why origin attributes blame/error on first decoding failure (if that's still the intended way in which things work)

how an intermediary node that maliciously flips bits cannot shake the blame off themselves

What are origin attributes?

I think I've elaborated on this more in the latest iteration of the PR.

GeorgeTsagk · 2025-02-24T11:39:20Z

04-onion-routing.md

+  The former `hmac_x'_y` now becomes `hmac_x+1_y`. The left-most hmac for
+  each hop is discarded.


can the forwarding node always pick the correct pruning positions if it doesn't know its own position in the path?

Yes, the forwarding node can pick the correct pruning positions. The block of hmacs that the forwarding node receives contains series of hmacs for all possible path lengths up to 20 hops. The forwarding node obviously knows that it is also part of the path, so there can never be an additional 20 hops (this would bring the total to 21). This also means that the last hmac for every hmac series won't ever be useful.

Will try to add some more text to explain this.

GeorgeTsagk · 2025-02-24T11:46:38Z

04-onion-routing.md

+
+At each step backwards, one hmac for every hop can be pruned. Rather than
+holding on to 20 * 20 = 400 hmacs, pruning reduces the total space requirement
+to 210 hmacs. More on pruning below.


to 210 hmacs

there was the (max_hops * (max_hops + 1)) / 2 formula in a previous diff, explaining how this number was produced, seems to have been trimmed

Re-added explanation of 210.

joostjager · 2025-02-24T13:18:40Z

04-onion-routing.md

-channel.
+The per-hop payload consists of the following fields:
+   * [`byte`:`payload_source`]
+   * [`uint32`:`hold_time_ms`]


One thing I now realize is that even if there's a node on the path that messes with the data and/or hmacs, the upstream nodes will still have valid hmacs and also valid hold_time_ms.

Information that isn't all that useful in that case though. The sender will simply proceed with penalizing the bad node, retry, and only process timing information after "proper" failures.

04-onion-routing.md

joostjager · 2025-02-25T12:51:33Z

If we go for the idea described in #1044 (comment), it may be unnecessary to keep that first payload byte that is either 0 or 1 to indicate intermediate or failing node. Because the 'legacy' hmac is still present, that can also be used to identify the failing node.

joostjager · 2025-02-27T12:32:35Z

Thanks for your review @GeorgeTsagk. Comments addressed.

joostjager · 2025-02-27T12:33:27Z

PR rebased and updated to reflect the new approach with a tlv extension to update_fail_htlc.

joostjager · 2025-02-27T12:45:01Z

04-onion-routing.md

+calculated. The redundant HMACs will cover portions of the zero-initialized
+data.
+
+Finally a new key is generated, using the key type `ammagext`. This key is then


Decided to derive a new key rather than continue pulling bits for chacha from the ammag key, to minimize the chance of somehow reusing the stream. Alternatively could use a different nonce, but I believe that would be new in lightning?

joostjager · 2025-02-28T14:53:42Z

Updated test vectors

joostjager mentioned this pull request Nov 21, 2022

Lightning Specification Meeting 2022/11/21 #1041

Closed

28 tasks

tnull reviewed Nov 21, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager mentioned this pull request Nov 22, 2022

draft: Staking Credentials token issuance/redemption #1043

Draft

t-bast mentioned this pull request Dec 5, 2022

Lightning Specification Meeting 2022/12/05 #1046

Closed

28 tasks

joostjager mentioned this pull request Dec 6, 2022

Attributable errors lightningnetwork/lightning-onion#60

Open

thomash-acinq mentioned this pull request Dec 6, 2022

Attributable errors ACINQ/eclair#2519

Draft

joostjager force-pushed the fat-errors branch 2 times, most recently from 4b48481 to 24b10d5 Compare December 6, 2022 16:52

joostjager force-pushed the fat-errors branch from 24b10d5 to 76dbf21 Compare December 7, 2022 09:14

joostjager commented Dec 7, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Dec 7, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Dec 7, 2022

View reviewed changes

joostjager force-pushed the fat-errors branch from 76dbf21 to 2de919a Compare December 7, 2022 14:53

joostjager mentioned this pull request Dec 7, 2022

htlcswitch: attributable errors lightningnetwork/lnd#7139

Open

joostjager commented Dec 8, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from 2de919a to bcf022b Compare December 12, 2022 07:34

t-bast mentioned this pull request Dec 15, 2022

Lightning Specification Meeting 2022/12/19 #1047

Closed

27 tasks

t-bast mentioned this pull request Dec 30, 2022

Lightning Specification Meeting 2023/01/02 #1048

Closed

27 tasks

t-bast mentioned this pull request Jan 11, 2023

Lightning Specification Meeting 2023/01/16 #1050

Closed

28 tasks

joostjager commented Jan 13, 2023

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager commented Jan 23, 2023

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from bcf022b to 6bf0729 Compare January 24, 2023 13:55

t-bast mentioned this pull request Jul 12, 2024

Lightning Specification Meeting 2024/07/15 #1183

Closed

22 tasks

t-bast mentioned this pull request Jul 23, 2024

Lightning Specification Meeting 2024/07/29 #1185

Closed

23 tasks

t-bast mentioned this pull request Aug 9, 2024

Lightning Specification Meeting 2024/08/12 #1187

Closed

21 tasks

t-bast mentioned this pull request Aug 23, 2024

Lightning Specification Meeting 2024/08/26 #1191

Closed

20 tasks

t-bast mentioned this pull request Sep 6, 2024

Lightning Specification Meeting 2024/09/09 #1195

Closed

20 tasks

t-bast mentioned this pull request Sep 18, 2024

Can't identify erring node when failure_message is unparseable #332

Closed

t-bast mentioned this pull request Oct 16, 2024

Lightning Specification Meeting 2024/11/04 #1206

Closed

20 tasks

t-bast mentioned this pull request Nov 22, 2024

Lightning Specification Meeting 2024/12/02 #1210

Closed

19 tasks

t-bast mentioned this pull request Dec 9, 2024

Lightning Specification Meeting 2024/12/16 #1213

Closed

19 tasks

t-bast mentioned this pull request Jan 6, 2025

Lightning Specification Meeting 2025/01/13 #1216

Closed

19 tasks

t-bast mentioned this pull request Jan 22, 2025

Lightning Specification Meeting 2025/01/27 #1221

Closed

23 tasks

t-bast mentioned this pull request Feb 4, 2025

Lightning Specification Meeting 2025/02/10 #1224

Closed

21 tasks

t-bast mentioned this pull request Feb 19, 2025

Lightning Specification Meeting 2025/02/24 #1229

Open

18 tasks

GeorgeTsagk reviewed Feb 24, 2025

View reviewed changes

joostjager commented Feb 24, 2025

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from d12376b to 373b9f9 Compare February 27, 2025 12:29

joostjager changed the title ~~Attributable errors (feature 36/37)~~ Attributable failures (feature 36/37) Feb 27, 2025

joostjager commented Feb 27, 2025

View reviewed changes

joostjager force-pushed the fat-errors branch 2 times, most recently from a82caca to bdf9e56 Compare February 28, 2025 14:45

Attributable failures

c2459c7

joostjager force-pushed the fat-errors branch from bdf9e56 to c2459c7 Compare February 28, 2025 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributable failures (feature 36/37) #1044

Attributable failures (feature 36/37) #1044

joostjager commented Nov 21, 2022 •

edited

Loading

thomash-acinq commented Dec 6, 2022

joostjager commented Dec 6, 2022 •

edited

Loading

joostjager commented Dec 6, 2022

joostjager Dec 7, 2022

joostjager commented Dec 7, 2022

thomash-acinq commented Dec 9, 2022

joostjager commented Dec 9, 2022

joostjager commented Dec 12, 2022

joostjager commented Jan 23, 2023

joostjager commented Feb 13, 2025

thomash-acinq commented Feb 17, 2025

joostjager commented Feb 24, 2025

GeorgeTsagk left a comment

GeorgeTsagk Feb 24, 2025

joostjager Feb 27, 2025

GeorgeTsagk Feb 24, 2025 •

edited

Loading

joostjager Feb 27, 2025

GeorgeTsagk Feb 24, 2025

joostjager Feb 24, 2025

GeorgeTsagk Feb 24, 2025

joostjager Feb 27, 2025

joostjager Feb 24, 2025

joostjager commented Feb 25, 2025

joostjager commented Feb 27, 2025

joostjager commented Feb 27, 2025

joostjager Feb 27, 2025

joostjager commented Feb 28, 2025

		The former `hmac_x'_y` now becomes `hmac_x+1_y`. The left-most hmac for
		each hop is discarded.

Attributable failures (feature 36/37) #1044

Are you sure you want to change the base?

Attributable failures (feature 36/37) #1044

Conversation

joostjager commented Nov 21, 2022 • edited Loading

thomash-acinq commented Dec 6, 2022

joostjager commented Dec 6, 2022 • edited Loading

joostjager commented Dec 6, 2022

Choose a reason for hiding this comment

joostjager commented Dec 7, 2022

thomash-acinq commented Dec 9, 2022

joostjager commented Dec 9, 2022

joostjager commented Dec 12, 2022

joostjager commented Jan 23, 2023

joostjager commented Feb 13, 2025

thomash-acinq commented Feb 17, 2025

joostjager commented Feb 24, 2025

GeorgeTsagk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeorgeTsagk Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager commented Feb 25, 2025

joostjager commented Feb 27, 2025

joostjager commented Feb 27, 2025

Choose a reason for hiding this comment

joostjager commented Feb 28, 2025

joostjager commented Nov 21, 2022 •

edited

Loading

joostjager commented Dec 6, 2022 •

edited

Loading

GeorgeTsagk Feb 24, 2025 •

edited

Loading