Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmless buffer overflow reads reported by valgrind and asan when validating without length set #6

Open
pwithnall opened this issue Nov 8, 2024 · 2 comments · May be fixed by #7
Open

Comments

@pwithnall
Copy link

We’ve recently switched GLib’s UTF-8 validation functions (g_utf8_validate() and g_str_is_ascii()) to use c-utf8. In doing so, we found that asan (via oss-fuzz) and valgrind report buffer overflow reads from the code. See https://gitlab.gnome.org/GNOME/glib/-/issues/3493

The buffer overflow reads are caused by the SIMD 4-byte read reading 1–3 bytes off the end of the input string buffer. They are harmless, as the 1–3 byte over-read is guaranteed to not cross a page boundary (as those are at least 4-byte aligned), and the read data is only used up to the detected nul termination byte. There are no buffer overflow writes.

These buffer overflow reads probably haven’t been detected in c-utf8 before because, as far as I can tell, its use within systemd is limited to situations where the input string length is known, so the “read until you hit the nul terminator” code path is not taken.

While these buffer overflow reads are harmless, and not a security vulnerability, they introduce noise in valgrind and asan output, potentially hiding actual vulnerabilities, and preventing those tools being used to automatically verify that code is completely free of overflows.

We’ve taken a couple of steps in GLib to try and mitigate these false positives (see the ‘related merge requests’ for the GLib issue):

  • Annotate the validation functions as __attribute__((no_sanitize_address)) to turn asan off for them
  • Add an ifunc to switch to an alternative implementation when running under valgrind, which essentially calls strlen() on the input string first, then passes it to the length-is-known codepath, avoiding the buffer overflow read

These mitigations are not perfect (ifuncs don’t appear to be well supported on musl and appear to cause crashes when statically linked), but they’re something.

I’m filing this issue to:

  • Make you aware that the same false positives from asan and valgrind will affect c-utf8 if it’s used with a string input length unset
  • Suggest you might want to take similar mitigations, or find better mitigations (we’re still deciding whether our mitigations are actually going to work reliably, or are going to cause more trouble)
@dvdhrm
Copy link
Member

dvdhrm commented Nov 15, 2024

Thanks a lot for the hint! I was not aware that c-utf8 found its way into glib, but happy to see it being used!

To be honest, I would prefer to just drop the strlen optimization and instead always call strlen() first. We don't use strings without knowing their length in any of our projects, so this should be fine.

@pwithnall
Copy link
Author

To be honest, I would prefer to just drop the strlen optimization and instead always call strlen() first. We don't use strings without knowing their length in any of our projects, so this should be fine.

Sure. As a result, I’m proposing we go with the same in GLib: https://gitlab.gnome.org/GNOME/glib/-/merge_requests/4404

I’ll submit an MR to c-utf8 shortly.

pwithnall added a commit to pwithnall/c-utf8 that referenced this issue Nov 19, 2024
There was provision in the validation loop for continuing to validate
until a nul byte was hit, but unfortunately that often results in buffer
overflow reads when used with heap allocated strings.

This is because the SIMD code reads 4 bytes at a time. If the string’s
terminal nul byte is in bytes 0–3, this results in a 3–1 byte overread.
This is detected and warned about by valgrind and asan, even though it’s
harmless because:
 - it’s only a read,
 - the control logic terminates the loop after detecting the nul byte,
   without basing any decisions on the following bytes,
 - the overread is guaranteed to not cross a page boundary (as they are 4
   byte aligned), so will not cause a page fault.

Unfortunately, there’s no easy way to tell valgrind or asan about this.
We could use the valgrind client API to tell it that the overread is
harmless, but using the API is a significant runtime overhead. Asan
doesn’t allow any warnings to be suppressed apart from telling it to not
instrument the entire function, which would mean that basically all the
code in c-utf8 would be uninstrumented. So we’d never be told about
other, true positive, errors, if they exist.

So, the least bad option seems to be to explicitly calculate the string
length with `strlen()` before validating, if the length is not
specified. Many callers into c-utf8 will already specify the length, in
which case this commit is a no-op.

See also: https://gitlab.gnome.org/GNOME/glib/-/issues/3493

Signed-off-by: Philip Withnall <pwithnall@gnome.org>

Fixes: c-util#6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants