Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace uint16_t* with uint8_t* in in_cksum #370

Closed
wants to merge 1 commit into from

Conversation

acst1223
Copy link
Contributor

Currently if we compile dhcpcd with -Os, we will get a checksum error upon receiving packets from the DHCP server. This prevents the client from binding a DHCP lease.

This checksum error comes from the checksums_valid() function volating the strict aliasing. In this function, pip is a union representing the pseudo IP header. It consists of ip of struct ip and w of uint16_t[]. ip is assigned with the IP header data, but w is actually used to calculate the checksum. As struct ip and uint16_t[] are incompatible types, according to the strict aliasing, the compiler thinks the assignment to ip is unused, so it simply ignores the assignment. This leads to the incorrect checksum calculation.

According to the strict aliasing rule, a char* alias with any object pointer, so we can choose to use uint8_t* (unsigned char*) to calculate the checksum. We can replace uint16_t* with uint8_t* in in_cksum, and also remove the usage of w in checksums_valid().

src/dhcp.c Outdated Show resolved Hide resolved
@rsmarples
Copy link
Member

What compiler, kernel and CPU are you seeing this on?

@acst1223
Copy link
Contributor Author

I build dhcpcd on Linux and deploy it to Chrome OS. I've observed this checksum error on both x86_64 and aarch64 CPUs. The compilers I use are x86_64-cros-linux-gnu-clang and aarch64-cros-linux-gnu-clang, respectively.

@acst1223 acst1223 force-pushed the cksum branch 2 times, most recently from ec4c86e to 6f98368 Compare September 27, 2024 02:12
src/dhcp.c Outdated Show resolved Hide resolved
@rsmarples
Copy link
Member

rsmarples commented Sep 27, 2024

I build dhcpcd on Linux and deploy it to Chrome OS. I've observed this checksum error on both x86_64 and aarch64 CPUs. The compilers I use are x86_64-cros-linux-gnu-clang and aarch64-cros-linux-gnu-clang, respectively.

Thanks, I have been able to reproduce this now and your change does indeed fix the issue.
Interestingly enough I was only able to reproduce this using clang-15 on debian-12 - all other OSes I have to hand I could not replicate this on using either gcc or clang.

Currently if we compile dhcpcd with -Os, we will get a checksum
error upon receiving packets from the DHCP server. This
prevents the client from binding a DHCP lease.

This checksum error comes from the checksums_valid() function volating
the strict aliasing. In this function, |pip| is a union representing
the pseudo IP header. It consists of |ip| of struct ip and |w| of
uint16_t[]. |ip| is assigned with the IP header data, but |w| is
actually used to calculate the checksum. As struct ip and uint16_t[] are
incompatible types, according to the strict aliasing, the compiler
thinks the assignment to |ip| is unused, so it simply ignores the
assignment. This leads to the incorrect checksum calculation.

According to the strict aliasing rule, a char* alias with any object
pointer, so we can choose to use uint8_t* (unsigned char*) to calculate
the checksum. We replace uint16_t* with uint8_t* in in_cksum, and also
remove the usage of |w| in checksums_valid().
@acst1223
Copy link
Contributor Author

Thanks, I have been able to reproduce this now and your change does indeed fix the issue. Interestingly enough I was only able to reproduce this using clang-15 on debian-12 - all other OSes I have to hand I could not replicate this on using either gcc or clang.

That's indeed very interesting. Another interesting thing is that this can't be reproduced with -O2 or -O3. The inline introduced by -Os is necessary. Anyway aliasing with uint16_t* here is not safe even with the union.

@rsmarples
Copy link
Member

@acst1223 Thanks for the excellent analysis and patch for the problem.

I'm going to go out on a limb and say this looks like a compiler issue as the code compiles with zero warnings and by simply passing &pip to hwaddr_ntoa and dumping the result to stdout fixes the issue. We didn't modify the content at all, just iterated over it. That is pretty odd eh?

Looking at other sources for reference, no-one uses htons in the main loop and the casting that requires or does the csum byte by byte - it's always uint16_t.
This results in more effort on the CPU and a larger binary. Although we are hardly talking large amounts here.

As you say that aliasing is the root cause. We have the union there just so we can reference the structure as uint16_t.
So can we solve this another way?

According to the strict aliasing rule, a char* alias with any object pointer, so we can choose to use uint8_t* (unsigned char*) to calculate the checksum.

Luckily for us, the reverse is also true!
We can create a backing buffer of uint8_t on the stack to hold the struct and then reference it via a struct ip pointer which we can then pass to in_cksum after populating the ip fields.

Please review 8a840af and let me know your thoughts.
I've added you as co-author as you analysed the root cause and provided a good fix.

@acst1223
Copy link
Contributor Author

I've tried your patch on both x86_64 and aarch64 Chrome OS but unfortunately both give me a checksum error again.

Regarding the aliasing rule, casting any other type pointer to char* is valid but not the opposite. I read this doc and in the Casting to char* section it writes

Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.

This can explain why your latest patch sometimes won't work.

I agree with you that using bytes to calculate cksum is indeed inefficient and sounds weird. I've also been struggling on this, since I cannot think of a way to calculate it in words without breaking the (so strict) aliasing rule. The patch initially in the thread is the best I could think of so far.

@rsmarples
Copy link
Member

OK, I've redone the patch in commit f1ff96f.
This now uses a custom psudeo header struct especially for ip header checksum calculations as I think the underlying issue might really be using the real ip header in in_cksum.
This also has the advantage of being a smaller header so even less work to do.

Let me know!

@acst1223
Copy link
Contributor Author

acst1223 commented Oct 1, 2024

I tried it but the checksum error remained ... All fields in ip_pseudo except for ipp_len are not compatible with uint16_t.

What about making all fields in the pseudo header uint16_t (or just building an uint16_t array) to make in_cksum work?

@rsmarples
Copy link
Member

I think building the uint16_t array is the better approach.
For some reason I don't understand, we need to memcpy into it rather than directly write into it via a pointer to the buffer which means holding it twice on stack. I don't mind that so much as the pseudo header we need is fairly small.

I also added some commentary around checksum results to make things more clear for future readers.

Please test 9f8ae57

@acst1223
Copy link
Contributor Author

acst1223 commented Oct 2, 2024

Thanks so much for your effort! The latest patch works fine!

I think the reason why direct writing via a pointer doesn't work is the still the aliasing (casting uint16_t* to some other types). The overhead with memcpy is small so I vote for it, too.

Thanks again for your work! The checksum issue is finally settled.

@rsmarples
Copy link
Member

rsmarples commented Oct 2, 2024

That's great news!
I just pushed hopefully a final patch which changes the buffer to uint8_t. uint16_t buffers just bug me and I couldn't think of a reason why this shouldn't work.

So please give 175fc13 a test, let me know and I'll push to master.

@rsmarples
Copy link
Member

a7e642a just has an updated comment about the buffer now being uint8_t.

@acst1223
Copy link
Contributor Author

acst1223 commented Oct 2, 2024

a7e642a also works and we can proceed with it.

uint16_t buffers just bug me and I couldn't think of a reason why this shouldn't work.

Just curious, what do you mean by "uint16_t buffers just bug you"?

@rsmarples
Copy link
Member

a7e642a also works and we can proceed with it.

Excellent, thanks!

uint16_t buffers just bug me and I couldn't think of a reason why this shouldn't work.

Just curious, what do you mean by "uint16_t buffers just bug you"?

Every other buffer in dhcpcd or most code I look at is uint8_t. Anything else is an array.
In my head I see that a buffer just stores opaque data to be sent or received over a wire (physical or virtual).
Where-as an array is just what an array normally is ... a list of defined objects of equal size and shape.

Thanks for helping me test this! Whilst I could replicate your initial failure, I was surprised my fixes still failed for you as they then worked for me. I used valgrind to test with for the first time yesterday (or for day before, I forget) and surprisingly picked up un-initialised memory errors in the in_cksum loop only when the pseudo header was involved. Which is complete nonsense as the memory has to be initialised - I even memset it all to zero at one point just to say "boo, i can do more to initialise" but no, valgrind still complained. memcpy was the only escape for it I could find, so glad it all works now. And has resulted in smaller and hopefully better code.

@acst1223
Copy link
Contributor Author

acst1223 commented Oct 2, 2024

This bug's very strange indeed. Thanks to memcpy anyway!

This cksum bug was a really long struggle. I'm so happy to see we can finally fix this now!

@rsmarples
Copy link
Member

Merged to master as dcef5d9

Thanks again for your efforts.

@rsmarples rsmarples closed this Oct 2, 2024
@acst1223 acst1223 deleted the cksum branch October 3, 2024 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants