-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file-local symtab lookups #281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! In the future, it would be helpful to file an issue first describing the (rather lengthy and nuanced) problem before filing a PR. This would be especially helpful since (AFAIU) this PR doesn't completely fix the underlying problem due to the roundtrip errors—that way, we could associate one issue with multiple PRs.
A test is included; running the
cabal run disasm-test
or simplycabal run llvm-disasm -- disasm_test/test/localstatic.bc
at the commit where this was introduced shows the error; subsequent commits fix the issue. Unfortunately, the test still fails round-trip as described in the known_bugs file added; this is for a somewhat separate issue that I would prefer to fix independently.
Yes, I would greatly appreciate if you could help with this. I strongly suspect that this is due to #260, an issue which is currently blocking support for LLVM 17 and later (see #261).
It's never too late to file an issue :-) |
An LLVM Bitcode file, it is comprised of nested Blocks of information, with
records available at each block. Different blocks hold different types of
information, and the nesting represents program scope and what is
defined/accessible at a particular point within the program.
The
Parse
andFinalize
monads maintain a set of symbol tables to use forlookups, as stored in the
Env
structure referenced by those monads:valSymtab
for value referencesfnSymtab
for function referencesbbSymTab
for basic-block references, either "named" by association witha symbol name (like the function's entry block) or "anonymous", where
there is no direct symbol but there may be a block label (like for a goto
statement or similar surface language construct.)
These lookups are necessary to "relabel" values that reference symbol or label
addresses in the code with the actual targets as resolved by processing the
entirety of the LLVM bitcode file. The llvm-pretty-bc-parser runs in two
phases: the initial phase where it processes the LLVM bitcode stream to create
"Partial" representations of all of the elements, followed by a "finalization"
phase where it performs all of the label references above (via the llvm-pretty
relabel
operation), as well as other fixups to convert the "Partial" datastructures into the structures defined by the
llvm-pretty
AST.However, the tables in the
Env
structure previously only held the directsymbol tables (i.e.
VALUE_SYMTAB_BLOCK
, found immediately within themodule's top-level
MODULE_BLOCK
. Symbol resolution was performed by callingbbEntryName
orrequireBbEntryName
. This was sufficient to resolve globalsymbol references.
When finalizing functions, the Function finalization defined an alternate
resolution type (
BlockLookup
) and function (lookupBlockName
) to use withrelabel
for handling the code within the function. ThebbEntryName
andrequireBbEntryName
did not accept a symbol (identifying the function orDefine
in LLVM IR parlance) so they could only reference the top-levelsymbol tables, whereas
lookupBlockName
was used if therelabel
indicatedthere was an associated symbol: it would find the (partial) symbol table for
the function (
Define
) associated with that symbol and perform thebbSymtab
lookup from that block.
This duplication of functionality should have been a suggestion that something
was not right, particularly when the
bbEntryName
andrequireBbEntryName
uses simply discarded any supplied symbol name. But it worked most of the
time, so this went unnoticed for quite some time (eleven years or so), but it
finally broke when the assumptions about the ability to discard symbol names
was proven to be a bad.
What was it that revealed the invalidity of this assumption?
Answer: Function-local label references.
Consider the following
In this code,
disptab
is a local static. To implement this, clang adds thedefinition of
disptab
to the top-levelVALUE_SYMTAB_BLOCK
. However, thevalues in that list are local references to
has_local_static
. Therelabel
operation will supply the
has_local_static
symbol, but VALUE_SYMTAB_BLOCKfinalization (
finalizeGlobal
) usesrequireBbEntryName
and discards thatsupplied symbol, thus failing to find the symbol in the top-level symbol block
and halting the parse with
fail
--or even worse, finding an unrelated globalbasic-block reference and using that to create invalid references. Neither of
these is any kind of good.
To fix this, this PR enhances the reader environment for Finalizeto also maintain the
Define
symtabs entries obtained during parsing, and then enhancingbbEntryName
(andrequireBbEntryName
) to lookup any symbols and performbbSymtab
resolution in the symboltable associated with that definition. This means the
relabel
-supplied symbolnames are no longer ignored, and also means that
bbEntryName
is capable ofperforming the full resolution that
lookupBlockName
was concocted to handle,so the latter is removed as obsolete. These changes allow us to successfully
parse the bitcode for the
has_local_static
function above (also added as atest in
disasm-test/tests/localstatic.c
).A test is included; running the
cabal run disasm-test
or simplycabal run llvm-disasm -- disasm_test/test/localstatic.bc
at the commit where this was introduced shows the error; subsequent commits fix the issue. Unfortunately, the test still fails round-trip as described in the known_bugs file added; this is for a somewhat separate issue that I would prefer to fix independently.All commits are atomic; this PR can be split up if desired.