Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log model responses to detect out of distribution cases #28

Open
santanavagner opened this issue Jan 30, 2025 · 3 comments
Open

Log model responses to detect out of distribution cases #28

santanavagner opened this issue Jan 30, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@santanavagner
Copy link
Member

Description (Actual Behavior)

A limitation of the current similarity-based approach is that, when the input prompts have really low similarity score, recommendations may have nothing to do with the entered prompt (or null) due to a topic/term that is out of the distribution of the dataset used to train the sentence transformer used. Hence, to identify these cases, we need (1) to log model responses (already pointed out by the issue #20 ) and (2) keep track of the low similarity scores, so we can detect these out of distribution cases and inform edge cases regrading sentence transformers used.

Expected Behavior

Once having a logging mechanism, we should employ a log analytics component to detect these terms/topics with low similarity score.
The goal is to detect edge cases and inform developers using the API.

Possible Approach

We can provide a warning at the API level, so that people have a better sense on why some input prompts are getting no recommendation. This information can help developers connecting with the API to properly select a different sentence transformer or fine-tune an existing one. The goal is to increase transparency for developers using the API.

Steps to Reproduce

NA.

Context

Recommendation.

@santanavagner santanavagner added the enhancement New feature or request label Jan 30, 2025
@cassiasamp
Copy link
Collaborator

Hi @santanavagner, do you have any numerical examples for the low similarity score?
would it be, for instance, less that 0.5? or even lower?

@santanavagner
Copy link
Member Author

Hi @cassiasamp ,

It depends on the embedding being used. For all-minilm-l6-v2 it would be less than 0.1.

I'd suggest you to test our swagger with a prompt that we know will result in recommendations, then, change the prompt to things that are not in our embeddings. Then, you would find examples of low similarity scores for that sentence transformer.

@cassiasamp
Copy link
Collaborator

Thanks @santanavagner,
I will be working on this issue 🤘

@cassiasamp cassiasamp self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants