Skip to content

Commit

Permalink
[clang] Implement __attribute__((format_matches)) (llvm#116708)
Browse files Browse the repository at this point in the history
This implements ``__attribute__((format_matches))``, as described in the
RFC:
https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The ``format`` attribute only allows the compiler to check that a format
string matches its arguments. If the format string is passed
independently of its arguments, there is no way to have the compiler
check it. ``format_matches(flavor, fmtidx, example)`` allows the
compiler to check format strings against the ``example`` format string
instead of against format arguments. See the changes to AttrDocs.td in
this diff for more information.

Implementation-wise, this change subclasses CheckPrintfHandler and
CheckScanfHandler to allow them to collect specifiers into arrays, and
implements comparing that two specifiers are equivalent.
`checkFormatStringExpr` gets a new `ReferenceFormatString` argument that
is piped down when calling a function with the `format_matches`
attribute (and is `nullptr` otherwise); this is the string that the
actual format string is compared against.

Although this change does not enable -Wformat-nonliteral by default,
IMO, all the pieces are now in place such that it could be.
  • Loading branch information
apple-fcloutier authored Feb 25, 2025
1 parent 253e116 commit c710118
Show file tree
Hide file tree
Showing 14 changed files with 1,411 additions and 180 deletions.
44 changes: 44 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,50 @@ Attribute Changes in Clang
This forces the global to be considered small or large in regards to the
x86-64 code model, regardless of the code model specified for the compilation.

- There is a new ``format_matches`` attribute to complement the existing
``format`` attribute. ``format_matches`` allows the compiler to verify that
a format string argument is equivalent to a reference format string: it is
useful when a function accepts a format string without its accompanying
arguments to format. For instance:

.. code-block:: c
static int status_code;
static const char *status_string;
void print_status(const char *fmt) {
fprintf(stderr, fmt, status_code, status_string);
// ^ warning: format string is not a string literal [-Wformat-nonliteral]
}
void stuff(void) {
print_status("%s (%#08x)\n");
// order of %s and %x is swapped but there is no diagnostic
}
Before the introducion of ``format_matches``, this code cannot be verified
at compile-time. ``format_matches`` plugs that hole:

.. code-block:: c
__attribute__((format_matches(printf, 1, "%x %s")))
void print_status(const char *fmt) {
fprintf(stderr, fmt, status_code, status_string);
// ^ `fmt` verified as if it was "%x %s" here; no longer triggers
// -Wformat-nonliteral, would warn if arguments did not match "%x %s"
}
void stuff(void) {
print_status("%s (%#08x)\n");
// warning: format specifier 's' is incompatible with 'x'
// warning: format specifier 'x' is incompatible with 's'
}
Like with ``format``, the first argument is the format string flavor and the
second argument is the index of the format string parameter.
``format_matches`` accepts an example valid format string as its third
argument. For more information, see the Clang attributes documentation.

Improvements to Clang's diagnostics
-----------------------------------

Expand Down
3 changes: 2 additions & 1 deletion clang/include/clang/AST/FormatString.h
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ class ArgType {
};

private:
const Kind K;
Kind K;
QualType T;
const char *Name = nullptr;
bool Ptr = false;
Expand Down Expand Up @@ -338,6 +338,7 @@ class ArgType {
}

MatchKind matchesType(ASTContext &C, QualType argTy) const;
MatchKind matchesArgType(ASTContext &C, const ArgType &other) const;

QualType getRepresentativeType(ASTContext &C) const;

Expand Down
10 changes: 10 additions & 0 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -1834,6 +1834,16 @@ def Format : InheritableAttr {
let Documentation = [FormatDocs];
}

def FormatMatches : InheritableAttr {
let Spellings = [GCC<"format_matches">];
let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">, ExprArgument<"ExpectedFormat">];
let AdditionalMembers = [{
StringLiteral *getFormatString() const;
}];
let Subjects = SubjectList<[ObjCMethod, Block, HasFunctionProto]>;
let Documentation = [FormatMatchesDocs];
}

def FormatArg : InheritableAttr {
let Spellings = [GCC<"format_arg">];
let Args = [ParamIdxArgument<"FormatIdx">];
Expand Down
126 changes: 126 additions & 0 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -3886,6 +3886,132 @@ behavior of the program is undefined.
}];
}

def FormatMatchesDocs : Documentation {
let Category = DocCatFunction;
let Content = [{

The ``format`` attribute is the basis for the enforcement of diagnostics in the
``-Wformat`` family, but it only handles the case where the format string is
passed along with the arguments it is going to format. It cannot handle the case
where the format string and the format arguments are passed separately from each
other. For instance:

.. code-block:: c

static const char *first_name;
static double todays_temperature;
static int wind_speed;

void say_hi(const char *fmt) {
printf(fmt, first_name, todays_temperature);
// ^ warning: format string is not a string literal
printf(fmt, first_name, wind_speed);
// ^ warning: format string is not a string literal
}

int main() {
say_hi("hello %s, it is %g degrees outside");
say_hi("hello %s, it is %d degrees outside!");
// ^ no diagnostic, but %d cannot format doubles
}

In this example, ``fmt`` is expected to format a ``const char *`` and a
``double``, but these values are not passed to ``say_hi``. Without the
``format`` attribute (which cannot apply in this case), the -Wformat-nonliteral
diagnostic unnecessarily triggers in the body of ``say_hi``, and incorrect
``say_hi`` call sites do not trigger a diagnostic.

To complement the ``format`` attribute, Clang also defines the
``format_matches`` attribute. Its syntax is similar to the ``format``
attribute's, but instead of taking the index of the first formatted value
argument, it takes a C string literal with the expected specifiers:

.. code-block:: c

static const char *first_name;
static double todays_temperature;
static int wind_speed;

__attribute__((__format_matches__(printf, 1, "%s %g")))
void say_hi(const char *fmt) {
printf(fmt, first_name, todays_temperature); // no dignostic
printf(fmt, first_name, wind_speed); // warning: format specifies type 'int' but the argument has type 'double'
}

int main() {
say_hi("hello %s, it is %g degrees outside");
say_hi("it is %g degrees outside, have a good day %s!");
// warning: format specifies 'double' where 'const char *' is required
// warning: format specifies 'const char *' where 'double' is required
}

The third argument to ``format_matches`` is expected to evaluate to a **C string
literal** even when the format string would normally be a different type for the
given flavor, like a ``CFStringRef`` or a ``NSString *``.

The only requirement on the format string literal is that it has specifiers
that are compatible with the arguments that will be used. It can contain
arbitrary non-format characters. For instance, for the purposes of compile-time
validation, ``"%s scored %g%% on her test"`` and ``"%s%g"`` are interchangeable
as the format string argument. As a means of self-documentation, users may
prefer the former when it provides a useful example of an expected format
string.

In the implementation of a function with the ``format_matches`` attribute,
format verification works as if the format string was identical to the one
specified in the attribute.

.. code-block:: c

__attribute__((__format_matches__(printf, 1, "%s %g")))
void say_hi(const char *fmt) {
printf(fmt, "person", 546);
// ^ warning: format specifies type 'double' but the
// argument has type 'int'
// note: format string is defined here:
// __attribute__((__format_matches__(printf, 1, "%s %g")))
// ^~
}


At the call sites of functions with the ``format_matches`` attribute, format
verification instead compares the two format strings to evaluate their
equivalence. Each format flavor defines equivalence between format specifiers.
Generally speaking, two specifiers are equivalent if they format the same type.
For instance, in the ``printf`` flavor, ``%2i`` and ``%-0.5d`` are compatible.
When ``-Wformat-signedness`` is disabled, ``%d`` and ``%u`` are compatible. For
a negative example, ``%ld`` is incompatible with ``%d``.

Do note the following un-obvious cases:

* Passing ``NULL`` as the format string does not trigger format diagnostics.
* When the format string is not NULL, it cannot _miss_ specifiers, even in
trailing positions. For instance, ``%d`` is not accepted when the required
format is ``%d %d %d``.
* While checks for the ``format`` attribute tolerate sone size mismatches
that standard argument promotion renders immaterial (such as formatting an
``int`` with ``%hhd``, which specifies a ``char``-sized integer), checks for
``format_matches`` require specified argument sizes to match exactly.
* Format strings expecting a variable modifier (such as ``%*s``) are
incompatible with format strings that would itemize the variable modifiers
(such as ``%i %s``), even if the two specify ABI-compatible argument lists.
* All pointer specifiers, modifiers aside, are mutually incompatible. For
instance, ``%s`` is not compatible with ``%p``, and ``%p`` is not compatible
with ``%n``, and ``%hhn`` is incompatible with ``%s``, even if the pointers
are ABI-compatible or identical on the selected platform. However, ``%0.5s``
is compatible with ``%s``, since the difference only exists in modifier flags.
This is not overridable with ``-Wformat-pedantic`` or its inverse, which
control similar behavior in ``-Wformat``.

At this time, clang implements ``format_matches`` only for format types in the
``printf`` family. This includes variants such as Apple's NSString format and
the FreeBSD ``kprintf``, but excludes ``scanf``. Using a known but unsupported
format silently fails in order to be compatible with other implementations that
would support these formats.

}];
}

def FlagEnumDocs : Documentation {
let Category = DocCatDecl;
let Content = [{
Expand Down
24 changes: 24 additions & 0 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -7764,6 +7764,7 @@ def warn_format_nonliteral_noargs : Warning<
def warn_format_nonliteral : Warning<
"format string is not a string literal">,
InGroup<FormatNonLiteral>, DefaultIgnore;
def err_format_nonliteral : Error<"format string is not a string literal">;

def err_unexpected_interface : Error<
"unexpected interface name %0: expected expression">;
Expand Down Expand Up @@ -10066,6 +10067,8 @@ def note_previous_declaration_as : Note<

def warn_printf_insufficient_data_args : Warning<
"more '%%' conversions than data arguments">, InGroup<FormatInsufficientArgs>;
def warn_format_cmp_specifier_arity : Warning<
"%select{fewer|more}0 specifiers in format string than expected">, InGroup<FormatInsufficientArgs>;
def warn_printf_data_arg_not_used : Warning<
"data argument not used by format string">, InGroup<FormatExtraArgs>;
def warn_format_invalid_conversion : Warning<
Expand Down Expand Up @@ -10183,6 +10186,27 @@ def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
def note_printf_c_str: Note<"did you mean to call the %0 method?">;
def note_format_security_fixit: Note<
"treat the string as an argument to avoid this">;
def warn_format_string_type_incompatible : Warning<
"passing '%0' format string where '%1' format string is expected">,
InGroup<Format>;
def warn_format_cmp_role_mismatch : Warning<
"format argument is %select{a value|an indirect field width|an indirect "
"precision|an auxiliary value}0, but it should be %select{a value|an indirect "
"field width|an indirect precision|an auxiliary value}1">, InGroup<Format>;
def warn_format_cmp_modifierfor_mismatch : Warning<
"format argument modifies specifier at position %0, but it should modify "
"specifier at position %1">, InGroup<Format>;
def warn_format_cmp_sensitivity_mismatch : Warning<
"argument sensitivity is %select{unspecified|private|public|sensitive}0, but "
"it should be %select{unspecified|private|public|sensitive}1">, InGroup<Format>;
def warn_format_cmp_specifier_mismatch : Warning<
"format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
def warn_format_cmp_specifier_sign_mismatch : Warning<
"signedness of format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
def warn_format_cmp_specifier_mismatch_pedantic : Extension<
warn_format_cmp_specifier_sign_mismatch.Summary>, InGroup<FormatPedantic>;
def note_format_cmp_with : Note<
"comparing with this %select{specifier|format string}0">;

def warn_null_arg : Warning<
"null passed to a callee that requires a non-null argument">,
Expand Down
57 changes: 47 additions & 10 deletions clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -2177,6 +2177,7 @@ class Sema final : public SemaBase {
FAPK_Fixed, // values to format are fixed (no C-style variadic arguments)
FAPK_Variadic, // values to format are passed as variadic arguments
FAPK_VAList, // values to format are passed in a va_list
FAPK_Elsewhere, // values to format are not passed to this function
};

// Used to grab the relevant information from a FormatAttr and a
Expand All @@ -2187,12 +2188,15 @@ class Sema final : public SemaBase {
FormatArgumentPassingKind ArgPassingKind;
};

/// Given a FunctionDecl's FormatAttr, attempts to populate the
/// FomatStringInfo parameter with the FormatAttr's correct format_idx and
/// firstDataArg. Returns true when the format fits the function and the
/// FormatStringInfo has been populated.
static bool getFormatStringInfo(const FormatAttr *Format, bool IsCXXMember,
bool IsVariadic, FormatStringInfo *FSI);
/// Given a function and its FormatAttr or FormatMatchesAttr info, attempts to
/// populate the FomatStringInfo parameter with the attribute's correct
/// format_idx and firstDataArg. Returns true when the format fits the
/// function and the FormatStringInfo has been populated.
static bool getFormatStringInfo(const Decl *Function, unsigned FormatIdx,
unsigned FirstArg, FormatStringInfo *FSI);
static bool getFormatStringInfo(unsigned FormatIdx, unsigned FirstArg,
bool IsCXXMember, bool IsVariadic,
FormatStringInfo *FSI);

// Used by C++ template instantiation.
ExprResult BuiltinShuffleVector(CallExpr *TheCall);
Expand All @@ -2215,7 +2219,10 @@ class Sema final : public SemaBase {
FST_Syslog,
FST_Unknown
};
static StringRef GetFormatStringTypeName(FormatStringType FST);
static FormatStringType GetFormatStringType(StringRef FormatFlavor);
static FormatStringType GetFormatStringType(const FormatAttr *Format);
static FormatStringType GetFormatStringType(const FormatMatchesAttr *Format);

bool FormatStringHasSArg(const StringLiteral *FExpr);

Expand Down Expand Up @@ -2362,6 +2369,25 @@ class Sema final : public SemaBase {
bool IsMemberFunction, SourceLocation Loc, SourceRange Range,
VariadicCallType CallType);

/// Verify that two format strings (as understood by attribute(format) and
/// attribute(format_matches) are compatible. If they are incompatible,
/// diagnostics are emitted with the assumption that \c
/// AuthoritativeFormatString is correct and
/// \c TestedFormatString is wrong. If \c FunctionCallArg is provided,
/// diagnostics will point to it and a note will refer to \c
/// TestedFormatString or \c AuthoritativeFormatString as appropriate.
bool
CheckFormatStringsCompatible(FormatStringType FST,
const StringLiteral *AuthoritativeFormatString,
const StringLiteral *TestedFormatString,
const Expr *FunctionCallArg = nullptr);

/// Verify that one format string (as understood by attribute(format)) is
/// self-consistent; for instance, that it doesn't have multiple positional
/// arguments referring to the same argument in incompatible ways. Diagnose
/// if it isn't.
bool ValidateFormatString(FormatStringType FST, const StringLiteral *Str);

/// \brief Enforce the bounds of a TCB
/// CheckTCBEnforcement - Enforces that every function in a named TCB only
/// directly calls other functions in the same TCB as marked by the
Expand Down Expand Up @@ -2577,11 +2603,17 @@ class Sema final : public SemaBase {
VariadicCallType CallType, SourceLocation Loc,
SourceRange Range,
llvm::SmallBitVector &CheckedVarArgs);
bool CheckFormatString(const FormatMatchesAttr *Format,
ArrayRef<const Expr *> Args, bool IsCXXMember,
VariadicCallType CallType, SourceLocation Loc,
SourceRange Range,
llvm::SmallBitVector &CheckedVarArgs);
bool CheckFormatArguments(ArrayRef<const Expr *> Args,
FormatArgumentPassingKind FAPK, unsigned format_idx,
unsigned firstDataArg, FormatStringType Type,
VariadicCallType CallType, SourceLocation Loc,
SourceRange range,
FormatArgumentPassingKind FAPK,
const StringLiteral *ReferenceFormatString,
unsigned format_idx, unsigned firstDataArg,
FormatStringType Type, VariadicCallType CallType,
SourceLocation Loc, SourceRange range,
llvm::SmallBitVector &CheckedVarArgs);

void CheckInfNaNFunction(const CallExpr *Call, const FunctionDecl *FDecl);
Expand Down Expand Up @@ -4576,6 +4608,11 @@ class Sema final : public SemaBase {
FormatAttr *mergeFormatAttr(Decl *D, const AttributeCommonInfo &CI,
IdentifierInfo *Format, int FormatIdx,
int FirstArg);
FormatMatchesAttr *mergeFormatMatchesAttr(Decl *D,
const AttributeCommonInfo &CI,
IdentifierInfo *Format,
int FormatIdx,
StringLiteral *FormatStr);

/// AddAlignedAttr - Adds an aligned attribute to a particular declaration.
void AddAlignedAttr(Decl *D, const AttributeCommonInfo &CI, Expr *E,
Expand Down
4 changes: 4 additions & 0 deletions clang/lib/AST/AttrImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -270,4 +270,8 @@ unsigned AlignedAttr::getAlignment(ASTContext &Ctx) const {
return Ctx.getTargetDefaultAlignForAttributeAligned();
}

StringLiteral *FormatMatchesAttr::getFormatString() const {
return cast<StringLiteral>(getExpectedFormat());
}

#include "clang/AST/AttrImpl.inc"
Loading

0 comments on commit c710118

Please sign in to comment.