Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index_sorting, synthetic_source_mode and force_merge_max_num_segments track parameters to elastic/logs track #522

Merged
merged 5 commits into from
Jan 12, 2024

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Nov 24, 2023

This allows to enable indexing sorting, that either sorts by hostname and timestamp or
timestamp and hostname.

Additionally this change adds force_merge_max_num_segments and synthetic_source parameters.
The former controls the number of segments after indexing and the latter whether synthetic source is enabled.

All newly added track params are unset by default, meaning index sorting, synthetic source and force merging to N segments are not enabled.

This allows to enable indexing sorting, that either sorts by hostname and timestamp or
timestamp and hostname. When index sorting is enabled than synthetic source is also enabled.
elastic/logs/README.md Outdated Show resolved Hide resolved
}
},
"mappings": {
{% if index_sorting %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's a bit awkward that index sorting enables synthetic source, is there any downside with configuring it with another parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no downside. But I did this with the index.mode=logs in mind. I think this would both enable index sorting and synthetic source. But I can add a synthetic source parameter for now, which would give us more benchmarking flexibility.

@@ -1785,6 +1785,9 @@
"message": {
"norms": false,
"type": "text"
{% if index_sorting %},
"store": true
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully one day this will work automagically when synthetic source is enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should just fix this. For the match_only_text this is working as expected. If synthetic source is enabled then store defaults to true.

@jpountz
Copy link
Contributor

jpountz commented Nov 24, 2023

store: true seems missing on more text fields, e.g. kafka.log.trace.message?

Co-authored-by: Adrien Grand <jpountz@gmail.com>
@martijnvg
Copy link
Member Author

store: true seems missing on more text fields, e.g. kafka.log.trace.message?

That field is also covered by this: https://github.com/elastic/rally-tracks/pull/522/files#diff-6da3dab0371882c35be61a7d2fffa07d9b9c39351ebda2987e953459eb104104R161
So I think it is good?

@jpountz
Copy link
Contributor

jpountz commented Nov 24, 2023

OK, cool. I was surprised that we only had to set store: true in a couple places given the total number of text fields, but it looks like they are all covered by being sub fields of another field. 👍

@martijnvg
Copy link
Member Author

but it looks like they are all covered by being sub fields of another field.

Yes, most text fields are defined as sub fields. So no need to store these text fields.

@pmpailis pmpailis closed this Nov 30, 2023
@martijnvg
Copy link
Member Author

@pmpailis why was this pr closed? I intend to merge this in after I have completed my investigation.

@pmpailis
Copy link
Contributor

Hey @martijnvg ! No idea how this happened to be honest. Just merged my PR and at the same time a number of other open PRs were closed :/ Please reopen this one and apologies for the trouble. (I think that it'll close instantly again if I do it)

@pmpailis pmpailis reopened this Nov 30, 2023
@pmpailis pmpailis closed this Nov 30, 2023
@martijnvg martijnvg reopened this Dec 1, 2023
@martijnvg
Copy link
Member Author

@pmpailis No worries. Something strange is going on this PR auto closed again and so did #500.

@gareth-ellis gareth-ellis requested a review from a team December 21, 2023 12:51
@gareth-ellis
Copy link
Member

@elasticmachine update branch

Copy link
Member

@gareth-ellis gareth-ellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. as I understand it the default run mode isnt being changed. Do you have any plans to update existing nightly benchmark with these new parameters, or are they going to be kept as is? (thus keeping currently nightlies comparable)

@martijnvg
Copy link
Member Author

Do you have any plans to update existing nightly benchmark with these new parameters, or are they going to be kept as is?

No, for now we plan to keep the existing nightly benchmark as is. The changes added are initially be used for one of benchmark runs for specific research around storage optimisations for logs. In the future we may make changes that will also affect the nightly benchmark, however for now this isn't the case.

@martijnvg martijnvg changed the title Add index_sorting track parameter to elastic/logs track. Add index_sorting, synthetic_source_mode and force_merge_max_num_segments track parameters to elastic/logs track Jan 12, 2024
@martijnvg martijnvg merged commit a8ccf0c into elastic:master Jan 12, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants