Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] fix type name normalization #17637

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Feb 5, 2025

Fixes type name and type alias normalization, especially for types that involve meta (TClass etc.). Introduces patch version 1.0.0.1 of the specification.

Fixes #17570

- Integer template arguments are written in standard decimal notation; only if necessary (long integer types),
a suffix `l` or `ul` is added according to the template parameter type.

The type name of a field has typedefs and usings fully resolved (except for standard integer typedefs).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may belong sooner in the document as the above mentioned rules are applying to the name with fully resolved typedefs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not sure I understand. The normalization rules, as far as I understand, should apply to both the original type name and the resolved type name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replacement of the typedef is an integral type of the 'normalization' of the name. Without that part, the name are not unique and thus not normalized. I.e. if you have both const vector<vector<Int_t> > and vector<vector<int32_t, typedef_to_regular_allocator>> "only" pretty printing the name does not help much. Similar if the typedef are not replaced, neither would/should the Integer template arguments be calculated/compacted. In the second example, without resolving the typedef we can not even remove the default template argument.

Copy link
Contributor Author

@jblomer jblomer Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user passes us vector<vector<int32_t, typedef_to_regular_allocator>>, its "pretty printed" version will be registered as the field's alias type, i.e. the "original spelling" from the user. Otherwise (vector<vector<int>>), the alias type is empty.

In any case, the type that is relevant is the resolved type name. The alias one is just for information, except for Double32_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the original spelling, I guess we ought to decide to either really put it as-is or we need to spell out, separately from the normalization, what applies there.
For example, it is clear that the 'whitespace' rule would apply, maybe the const rule and what else? (Not the typedef for sure, what about the std::[u]int(8|16|32|64)_t or even the fully qualified)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is to apply all the normalization rules to the original spelling except typedef resolution.

@jblomer jblomer force-pushed the ntuple-fix-default-template branch from d260d3e to 3560223 Compare February 5, 2025 20:49
@jblomer jblomer requested a review from pcanal February 5, 2025 20:50
@jblomer jblomer force-pushed the ntuple-fix-default-template branch from 3560223 to 5e645de Compare February 5, 2025 20:52
the corresponding (at the time of writing) `std::[u]int(8|16|32|64)_t` standard integer typedef.
- Type names used as template arguments are themselves normalized.
- For user-defined types, default-initialized template arguments are explicitly spelled out;
optional template arguments of stdlib types such as allocators are omitted.
Copy link
Member

@pcanal pcanal Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for the I/O normalization we have elected to keep the stdlib optional argument whose value is not the default value. ie.:

TClass::GetClass("vector<int, other_alloc>")->GetName()
(const char *) "vector<int,other_alloc>"

I do not recall the detailed argumentation, but at the very least they 'may' be semantically relevant to the user.

Copy link
Contributor Author

@jblomer jblomer Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that should be looked / discussed in #16754.

Copy link

github-actions bot commented Feb 6, 2025

Test Results

    18 files      18 suites   4d 7h 14m 40s ⏱️
 2 717 tests  2 716 ✅ 0 💤 1 ❌
47 208 runs  47 207 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 3187473.

♻️ This comment has been updated with latest results.

@jblomer jblomer force-pushed the ntuple-fix-default-template branch 3 times, most recently from e4045e6 to 2697372 Compare February 6, 2025 14:20
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline

tree/ntuple/v7/doc/BinaryFormatSpecification.md Outdated Show resolved Hide resolved
tree/ntuple/v7/doc/BinaryFormatSpecification.md Outdated Show resolved Hide resolved
tree/ntuple/v7/doc/BinaryFormatSpecification.md Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RFieldBase.cxx Outdated Show resolved Hide resolved
@jblomer jblomer force-pushed the ntuple-fix-default-template branch from 2697372 to 51e2e81 Compare February 7, 2025 13:25
@jblomer jblomer requested review from hahnjo and pcanal February 7, 2025 13:25
@jblomer jblomer force-pushed the ntuple-fix-default-template branch from 51e2e81 to 962b8eb Compare February 7, 2025 14:44
@jblomer jblomer requested a review from hahnjo February 7, 2025 14:47
@hahnjo
Copy link
Member

hahnjo commented Feb 10, 2025

I'm still trying to wrap my head around the type alias commit. I think we should test it more with class templates, like

diff --git a/tree/ntuple/v7/test/rfield_class.cxx b/tree/ntuple/v7/test/rfield_class.cxx
index 02cbc93ed6..52e1bc3ba4 100644
--- a/tree/ntuple/v7/test/rfield_class.cxx
+++ b/tree/ntuple/v7/test/rfield_class.cxx
@@ -278,6 +278,8 @@ TEST(RNTuple, TClassDefaultTemplateParameter)
       model->MakeField<DataVector<int, float>>("f2");
       model->AddField(RFieldBase::Create("f3", "DataVector<int>").Unwrap());
       model->AddField(RFieldBase::Create("f4", "struct DataVector<bool,vector<unsigned>>").Unwrap());
+      model->AddField(RFieldBase::Create("f5", "DataVector<Double32_t>").Unwrap());
+      model->AddField(RFieldBase::Create("f6", "DataVector<int, double>").Unwrap());
       auto writer = RNTupleWriter::Recreate(std::move(model), "ntpl", fileGuard.GetPath());
    }
 
@@ -286,10 +288,18 @@ TEST(RNTuple, TClassDefaultTemplateParameter)
 
    const auto &desc = reader->GetDescriptor();
    EXPECT_EQ("DataVector<std::int32_t,double>", desc.GetFieldDescriptor(desc.FindFieldId("f1")).GetTypeName());
+   EXPECT_EQ("", desc.GetFieldDescriptor(desc.FindFieldId("f1")).GetTypeAlias());
    EXPECT_EQ("DataVector<std::int32_t,float>", desc.GetFieldDescriptor(desc.FindFieldId("f2")).GetTypeName());
+   EXPECT_EQ("", desc.GetFieldDescriptor(desc.FindFieldId("f2")).GetTypeAlias());
    EXPECT_EQ("DataVector<std::int32_t,double>", desc.GetFieldDescriptor(desc.FindFieldId("f3")).GetTypeName());
+   EXPECT_EQ("", desc.GetFieldDescriptor(desc.FindFieldId("f3")).GetTypeAlias());
    EXPECT_EQ("DataVector<bool,std::vector<std::uint32_t>>",
              desc.GetFieldDescriptor(desc.FindFieldId("f4")).GetTypeName());
+   EXPECT_EQ("", desc.GetFieldDescriptor(desc.FindFieldId("f4")).GetTypeAlias());
+   EXPECT_EQ("DataVector<double,double>", desc.GetFieldDescriptor(desc.FindFieldId("f5")).GetTypeName());
+   EXPECT_EQ("DataVector<Double32_t,double>", desc.GetFieldDescriptor(desc.FindFieldId("f5")).GetTypeAlias());
+   EXPECT_EQ("DataVector<std::int32_t,double>", desc.GetFieldDescriptor(desc.FindFieldId("f6")).GetTypeName());
+   EXPECT_EQ("", desc.GetFieldDescriptor(desc.FindFieldId("f6")).GetTypeAlias());
 
    auto v1 = reader->GetView<DataVector<int>>("f1");
    auto v3 = reader->GetView<DataVector<int>>("f3");

However, that doesn't seem to work right now:

553: [ RUN      ] RNTuple.TClassDefaultTemplateParameter
553: /home/jhahnfel/ROOT/src/tree/ntuple/v7/test/rfield_class.cxx:295: Failure
553: Expected equality of these values:
553:   ""
553:   desc.GetFieldDescriptor(desc.FindFieldId("f3")).GetTypeAlias()
553:     Which is: "DataVector<std::int32_t>"
553: /home/jhahnfel/ROOT/src/tree/ntuple/v7/test/rfield_class.cxx:299: Failure
553: Expected equality of these values:
553:   "DataVector<double,double>"
553:   desc.GetFieldDescriptor(desc.FindFieldId("f5")).GetTypeName()
553:     Which is: "DataVector<Double32_t,double>"
553: /home/jhahnfel/ROOT/src/tree/ntuple/v7/test/rfield_class.cxx:300: Failure
553: Expected equality of these values:
553:   "DataVector<Double32_t,double>"
553:   desc.GetFieldDescriptor(desc.FindFieldId("f5")).GetTypeAlias()
553:     Which is: "DataVector<Double32_t>"
553: [  FAILED  ] RNTuple.TClassDefaultTemplateParameter (10 ms)

I believe Double32_t as template argument was already wrong before, but the type alias for f3 looks wrong to me...

@jblomer jblomer force-pushed the ntuple-fix-default-template branch from 962b8eb to 24c9896 Compare February 12, 2025 08:44
@jblomer jblomer requested a review from bellenot as a code owner February 12, 2025 08:44
@jblomer jblomer force-pushed the ntuple-fix-default-template branch 4 times, most recently from e37177c to e54ceac Compare February 12, 2025 22:15
Following clarifications and fixes:
  - Strings in the meta-data are UTF-8 encoded (instead of ASCII),
    matching the section "Naming specification"
  - Extend restrictions on user-defined classes regarding bit-fields and
    template arguments
  - The type name normalization is specified in a dedicated section
Normalize and resolve type name only in the innermost overload.
For types that require meta (TClass, TEnum, etc.), apply the following
normalization steps:
  - Use TClass::GetName() to get the type name normalization from meta
  - Renormalize that name according to the RNTuple specs
Compares a normalized original type name with the normalized resolved
type name to determine if a field type alias (i.e. the use of a typedef)
should be set.

Previously, the logic was buggy comparing only partially normalized types.
@jblomer jblomer force-pushed the ntuple-fix-default-template branch from e54ceac to 3187473 Compare February 12, 2025 22:16
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some last minor comments from my side

@@ -37,7 +37,10 @@ TEST(RNTuple, TypeNameNormalization)
EXPECT_EQ("std::int32_t", RFieldBase::Create("f", "signed").Unwrap()->GetTypeName());
EXPECT_EQ("std::map<std::int32_t,std::int32_t>", RFieldBase::Create("f", "map<int, int>").Unwrap()->GetTypeName());

EXPECT_TRUE(RFieldBase::Create("f", "std::int32_t").Unwrap()->GetTypeAlias().empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to keep this newly added test that std::int32_t has an empty type alias

auto arrayLength = std::stoi(arrayDef[1]);
auto arrayLength = std::stoull(arrayDef[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want more error checking here, for example std::stoull("123junk") = 123 without error reporting. std::stoull takes a second pos argument that stores the number of processed characters, or we can use the underlying std::strtoull where str_end should point to NULL after successful conversion.

Comment on lines +262 to +263
return GetNormalizedInteger(std::stoll(intTemplateArg));
return GetNormalizedInteger(std::stoull(intTemplateArg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ntuple] Missing type name normalization for meta fields
3 participants