Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evaluation client #12

Conversation

TechPreacher
Copy link
Contributor

@TechPreacher TechPreacher commented Dec 17, 2024

This PR adds an evaluation client that can evaluate answer correctness vs. a ground truth and document retrieval for RAG based Copilot Studio agents.

Sample Evaluation

Evaluation Dataset.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
Test Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf

Evaluation Dataset Results.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
est Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf
Agent Response: 
To charge the removable keyboard of the Asus Zenbook Duo 2024 you can follow these steps:
1. Connect the AC power cord to the AC/DC adapter.
2. Connect the DC power connector into your keyboard’s power (DC) input port.
3. Plug the AC power adapter into a 100V~240V power source [1].
[1]: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf
Answer Score: 65
Sources Score: 1/2

Explanation

This RAG agent was created by adding a SharePoint knowledge source to the agent. The evaluator client asks the agent to answer the question provided in the "Test Utterance" field and compares the answer with the "Expected Response" answer.

Based on how well the agent's answer matches the expected response semantically, a score is given between 0 and 100, 10 being the worst and 100 the perfect answer. The value is stored in the "Answer Score" field. The "Sources" field provided 2 URLs the agent should have used to answer the question.

The evaluator client checks if the agent used the provided URLs to answer the question by returning them as reference links and returns it as x/y where x are the number of links provided by the agent
over y which represents the number of links expected.

Copy link

@svandenhoven svandenhoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

src/samples/EvalClient/EvaluationService.cs Outdated Show resolved Hide resolved
src/samples/EvalClient/EvaluationService.cs Show resolved Hide resolved
src/samples/EvalClient/Program.cs Outdated Show resolved Hide resolved
src/samples/EvalClient/EvalClient.csproj Outdated Show resolved Hide resolved
@tracyboehrer
Copy link
Member

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

@TechPreacher
Copy link
Contributor Author

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

This is fixed now. Thanks for noticing!

@TechPreacher
Copy link
Contributor Author

TechPreacher commented Dec 19, 2024

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

Thanks @tracyboehrer ! You can always reach me internally at "saschac".

@TechPreacher TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from 331e238 to 8dbe42b Compare January 15, 2025 10:02
@MattB-msft
Copy link
Member

@TechPreacher
Please rebase your fork on the main repro to pick up the changes to the build validation logic.
thanks

@TechPreacher TechPreacher requested a review from a team as a code owner January 16, 2025 08:01
@github-actions github-actions bot added ML: Samples Tags changes to samples ML: Core Tags changes to core libraries ML: Tests Tags changes to tests From Fork This PR was created from a Fork labels Jan 16, 2025
@TechPreacher TechPreacher force-pushed the feature/saschac-add-evaluation-client branch 2 times, most recently from d015f64 to 155723c Compare January 16, 2025 08:07
@github-actions github-actions bot added ML: Samples Tags changes to samples and removed ML: Samples Tags changes to samples ML: Core Tags changes to core libraries ML: Tests Tags changes to tests labels Jan 16, 2025
@TechPreacher
Copy link
Contributor Author

@TechPreacher Please rebase your fork on the main repro to pick up the changes to the build validation logic. thanks

Done. Tests now pass. :)

Copy link

@svandenhoven svandenhoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tested the solution and it works. My previous remarks have been resolved.

There are some warnings in the code. Might be good to review at them.

@TechPreacher TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from d125a12 to 17e5cdc Compare January 16, 2025 18:16
@TechPreacher TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from 17e5cdc to 59b5346 Compare January 16, 2025 18:17
@tracyboehrer tracyboehrer merged commit 4100d31 into microsoft:main Jan 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
From Fork This PR was created from a Fork ML: Samples Tags changes to samples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants