Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crucible-llvm: Support loading mostly symbolic strings #1061

Open
RyanGlScott opened this issue Feb 15, 2023 · 0 comments
Open

crucible-llvm: Support loading mostly symbolic strings #1061

RyanGlScott opened this issue Feb 15, 2023 · 0 comments

Comments

@RyanGlScott
Copy link
Contributor

Currently, the LLVM memory model in crucible-llvm comes with the restriction that all strings must be symbolic. Here, when I say "string", I really mean "C string", i.e., a fixed-length array of bytes that is followed by a null terminator. This restriction impacts several parts of the crucible-llvm codebase, including our printf and strlen overrides and the various parts of crucible-symio that interface with strings.

This issue tracks possible ways to this lift this restriction. The restriction manifests itself in this part of loadString:

go f p maxChars = do
v <- doLoad bak mem p (bitvectorType 1) (LLVMPointerRepr (knownNat :: NatRepr 8)) noAlignment
x <- Partial.projectLLVM_bv bak v
case BV.asUnsigned <$> asBV x of
Just 0 -> return $ f []
Just c -> do
let c' :: Word8 = toEnum $ fromInteger c
p' <- doPtrAddOffset bak mem p =<< bvLit sym PtrWidth (BV.one PtrWidth)
go (f . (c':)) p' (fmap (\n -> n - 1) maxChars)
Nothing ->
addFailedAssertion bak
$ Unsupported GHC.callStack "Symbolic value encountered when loading a string"

Here, the only place where we actually make use of the string being concrete is in the null terminator check. However, this is not an all-or-nothing proposition: it is possible for a string to contain some symbolic characters but still end with a concrete null terminator character. To help distinguish between these types of strings, I will refer to strings that entirely symbolic characters (including the null terminator) as fully symbolic strings, while I will refer to strings that some symbolic characters followed by a concrete null terminator as mostly symbolic strings.

Handling fully symbolic strings is definitely not possible for Crucible. That would be tantamount to asking Crucible to handle strings of symbolic length, which is something that Crucible is fundamentally unequipped to handle. Handling mostly symbolic strings, however, is definitely possible, as that would allow Crucible to say with certainty where strings start and end. It doesn't matter so much whether the characters preceding the null terminator are symbolic.

One proposal for handling mostly symbolic strings is to remove the Nothing case in loadString.go above, and generalize the Just c case to handle all characters that are not concretely equal to the null terminator. (This would require removing the use of bvLit.) I've successfully employed this technique on a downstream project, and it should work nicely with Crux's crucible_string function, which allocates mostly symbolic strings.

Another possibility is to back C strings with SMT strings, which are supported in recent versons of various SMT solvers. This would largely accomplish the same goals while perhaps being more efficient due to not needing to have separate memory log entries for each individual character. On the other hand, we would need to be able to convert from SMT strings to individual characters and vice versa, and what4 does not yet offer a way to do this. Given the amount of work required for this option, I'm really only mentioning it for the sake of completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant