Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relevance metrics including pruned tokens to MS Marco ranking track #525

Merged
8 changes: 5 additions & 3 deletions msmarco-passage-ranking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ To compare search performance, the following strategies are employed:

It's important to highlight that the text-expansion and hybrid strategies are dependent on a dataset that has undergone query token expansion.

Additional properties can be sent in to `text_expansion` and `hybrid`: `prune` will prune insignificant tokens from the text expansion query and `rescore` will issue a rescore query that rescores with the pruned tokens. Both of these values default to `false`.

### Example Document

Documents adhere to the [JSON Lines format](https://jsonlines.org/).
Documents adhere to the [JSON Lines format](https://jsonlines.org/).
When a single document is pretty printed, it takes the following example format:

<details>
Expand Down Expand Up @@ -454,7 +456,7 @@ EnsembleDistil](https://huggingface.co/naver/splade-cocondenser-ensembledistil)

### Example Query

Queries are structured within a JSON array, where each individual object signifies a unique 'query' and its corresponding expansion achieved through ELSER v2, which is stored pre-computed in the 'query_expansion' field.:
Queries are structured within a JSON array, where each individual object signifies a unique 'query' and its corresponding expansion achieved through ELSER v2, which is stored pre-computed in the 'text_expansion_elser' field.:

<details>
<summary><i>Example query object</i></summary>
Expand Down Expand Up @@ -639,4 +641,4 @@ title = {From Distillation to Hard Negative Sampling: Making Sparse Neural IR Mo
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}
}
2 changes: 2 additions & 0 deletions msmarco-passage-ranking/_tools/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pytrec_eval
numpy
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the presence of this file be mentioned in README? Also, please align version pinning with dependencies section.

56 changes: 51 additions & 5 deletions msmarco-passage-ranking/challenges/default.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,52 +55,98 @@
}
},
{
"name": "text-expansion-search-maxwand-enabled",
"operation": "text-expansion-search-maxwand-enabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_client | default(1)}}
"clients": {{search_clients | default(1)}}
},
{
"name": "text-expansion-search-maxwand-disabled",
"operation": "text-expansion-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "text-expansion-splade-search-maxwand-enabled",
"operation": "text-expansion-splade-search-maxwand-enabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_client | default(1)}}
},
"clients": {{search_clients | default(1)}}
},
{
"name": "text-expansion-splade-search-maxwand-disabled",
"operation": "text-expansion-splade-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
},
{
"name": "pruned-text-expansion-search-maxwand-enabled",
"operation": "pruned-text-expansion-search-maxwand-enabled",
kderusso marked this conversation as resolved.
Show resolved Hide resolved
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "pruned-text-expansion-search-maxwand-disabled",
"operation": "pruned-text-expansion-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "pruned-rescored-text-expansion-search-maxwand-enabled",
"operation": "pruned-rescored-text-expansion-search-maxwand-enabled",
"warmup-iterations": 100,
"iterations": 1000,
"num_candidates": 100,
"clients": {{search_clients | default(1)}}
},
{
"name": "pruned-rescored-text-expansion-search-maxwand-disabled",
"operation": "pruned-rescored-text-expansion-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"num_candidates": 100,
"clients": {{search_clients | default(1)}}
},
{
"name": "bm25-search-maxwand-enabled",
"operation": "bm25-search-maxwand-enabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "bm25-search-maxwand-disabled",
"operation": "bm25-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "hybrid-search-maxwand-enabled",
"operation": "hybrid-search-maxwand-enabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "hybrid-search-maxwand-disabled",
"operation": "hybrid-search-maxwand-disabled",
"warmup-iterations": 100,
"iterations": 1000,
"clients": {{search_clients | default(1)}}
},
{
"name": "pruned-weighted-terms-recall-10-10",
"operation": "pruned-weighted-terms-recall-10-10"
},
{
"name": "pruned-weighted-terms-recall-10-100",
"operation": "pruned-weighted-terms-recall-10-100"
}
]
}
}
222 changes: 152 additions & 70 deletions msmarco-passage-ranking/operations/default.json
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was auto-formatted for correct json indentation. I have left comments starting at the two places where I made changes.

Original file line number Diff line number Diff line change
@@ -1,70 +1,152 @@
{
"name": "text-expansion-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": true
},
{
"name": "text-expansion-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": false
},
{
"name": "text-expansion-splade-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_splade",
"track_total_hits": true
},
{
"name": "text-expansion-splade-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_splade",
"track_total_hits": false
},
{
"name": "bm25-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "bm25",
"track_total_hits": true
},
{
"name": "bm25-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "bm25",
"track_total_hits": false
},
{
"name": "hybrid-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "hybrid",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": true
},
{
"name": "hybrid-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "hybrid",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": false
}
{
"name": "text-expansion-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": true
},
{
"name": "text-expansion-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": false
},
{
"name": "text-expansion-splade-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_splade",
"track_total_hits": true
},
{
"name": "text-expansion-splade-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"text_expansion_field": "text_expansion_splade",
"track_total_hits": false
},
{
"name": "pruned-text-expansion-search-maxwand-disabled",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"track_total_hits": true,
"prune": true
},
{
"name": "pruned-text-expansion-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"track_total_hits": false,
"prune": true
},
{
"name": "pruned-rescored-text-expansion-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"track_total_hits": true,
"prune": true,
"rescore": true
},
{
"name": "pruned-rescored-text-expansion-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "text_expansion",
"track_total_hits": false,
"prune": true,
"rescore": true
},
{
"name": "bm25-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "bm25",
"track_total_hits": true
},
{
"name": "bm25-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "bm25",
"track_total_hits": false
},
{
"name": "hybrid-search-maxwand-disabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "hybrid",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": true
},
{
"name": "hybrid-search-maxwand-enabled",
"operation-type": "search",
"param-source": "query_param_source",
"query_source": "queries.json",
"query_strategy": "hybrid",
"text_expansion_field": "text_expansion_elser",
"track_total_hits": false
},
{
"name": "pruned-weighted-terms-recall-10-10",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

"operation-type": "weighted_terms_recall",
"param-source": "weighted_terms_recall_param_source",
"top_k": 10,
"num_candidates": 10,
"query_source": "queries-small.json",
"qrels_source": "qrels-small.tsv",
"text_expansion_field": "text_expansion_elser",
"include-in-reporting": true
},
{
"name": "pruned-weighted-terms-recall-10-100",
"operation-type": "weighted_terms_recall",
"param-source": "weighted_terms_recall_param_source",
"top_k": 10,
"num_candidates": 100,
"query_source": "queries-small.json",
"qrels_source": "qrels-small.tsv",
"text_expansion_field": "text_expansion_elser",
"include-in-reporting": true
},
{
"name": "pruned-weighted-terms-recall-100-100",
"operation-type": "weighted_terms_recall",
"param-source": "weighted_terms_recall_param_source",
"top_k": 100,
"num_candidates": 100,
"query_source": "queries-small.json",
"qrels_source": "qrels-small.tsv",
"text_expansion_field": "text_expansion_elser",
"include-in-reporting": true
},
{
"name": "pruned-weighted-terms-recall-100-1000",
"operation-type": "weighted_terms_recall",
"param-source": "weighted_terms_recall_param_source",
"top_k": 100,
"num_candidates": 1000,
"query_source": "queries-small.json",
"qrels_source": "qrels-small.tsv",
"text_expansion_field": "text_expansion_elser",
"include-in-reporting": true
}
Loading