Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relevance metrics including pruned tokens to MS Marco ranking track #525

Merged

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented Nov 28, 2023

Adds recall metrics for query pruning using the weighted tokens query and precomputed ELSER tokens.

jimczi and others added 2 commits January 4, 2024 11:08
…e the recall against the original weighted terms query
…NDCG based on a smaller version of the queries used to test performance.
@kderusso kderusso force-pushed the kderusso/msmarco-passage-ranking-iter-from-jim branch from 105b19a to ac79e54 Compare January 4, 2024 16:10
},
{
"name": "pruned-text-expansion-search-maxwand-disabled",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

},
{
"name": "pruned-weighted-terms-recall-10-10",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was auto-formatted for correct json indentation. I have left comments starting at the two places where I made changes.

@kderusso kderusso changed the title WIP Weighted term benchmarking Add relevance metrics including pruned tokens to MS Marco ranking track Jan 4, 2024
@kderusso kderusso marked this pull request as ready for review January 4, 2024 16:14
Copy link
Contributor

@demjened demjened left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few minor comments

msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved
msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left a few minor comments.

elastic/Makefile Outdated Show resolved Hide resolved
msmarco-passage-ranking/track.json Show resolved Hide resolved
msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved
msmarco-passage-ranking/track.py Show resolved Hide resolved
msmarco-passage-ranking/track.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
msmarco-passage-ranking/README.md Outdated Show resolved Hide resolved
def generate_pruned_query(field, query_expansion, boost=1.0):
return {
"query": {
"weighted_tokens": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried running the modified track and noticed weighted_tokens query is not available in 8.11.x. Is this planned for 8.13.0? We need to apply the right versioning as per https://esrally.readthedocs.io/en/stable/track.html#custom-track-repositories. The most recent non-master branch in the track repo is 8.11. We need to bump up to 8.13 before merging this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is for 8.13.0.

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left small remark regarding _tools/requirements.txt file.

Comment on lines +1 to +2
pytrec_eval
numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the presence of this file be mentioned in README? Also, please align version pinning with dependencies section.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kderusso kderusso merged commit 30a9b98 into elastic:master Jan 10, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants