Add evaluation client #12

TechPreacher · 2024-12-17T13:09:27Z

This PR adds an evaluation client that can evaluate answer correctness vs. a ground truth and document retrieval for RAG based Copilot Studio agents.

Sample Evaluation

Evaluation Dataset.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
Test Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf

Evaluation Dataset Results.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
est Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf
Agent Response: 
To charge the removable keyboard of the Asus Zenbook Duo 2024 you can follow these steps:
1. Connect the AC power cord to the AC/DC adapter.
2. Connect the DC power connector into your keyboard’s power (DC) input port.
3. Plug the AC power adapter into a 100V~240V power source [1].
[1]: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf
Answer Score: 65
Sources Score: 1/2

Explanation

This RAG agent was created by adding a SharePoint knowledge source to the agent. The evaluator client asks the agent to answer the question provided in the "Test Utterance" field and compares the answer with the "Expected Response" answer.

Based on how well the agent's answer matches the expected response semantically, a score is given between 0 and 100, 10 being the worst and 100 the perfect answer. The value is stored in the "Answer Score" field. The "Sources" field provided 2 URLs the agent should have used to answer the question.

The evaluator client checks if the agent used the provided URLs to answer the question by returning them as reference links and returns it as x/y where x are the number of links provided by the agent
over y which represents the number of links expected.

svandenhoven

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

src/samples/EvalClient/EvaluationService.cs

src/samples/EvalClient/Program.cs

src/samples/EvalClient/EvalClient.csproj

tracyboehrer · 2024-12-19T14:06:30Z

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

TechPreacher · 2024-12-19T15:20:39Z

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

This is fixed now. Thanks for noticing!

TechPreacher · 2024-12-19T15:22:27Z

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

Thanks @tracyboehrer ! You can always reach me internally at "saschac".

.gitignore

MattB-msft · 2025-01-15T20:11:20Z

@TechPreacher
Please rebase your fork on the main repro to pick up the changes to the build validation logic.
thanks

TechPreacher · 2025-01-16T08:51:21Z

@TechPreacher Please rebase your fork on the main repro to pick up the changes to the build validation logic. thanks

Done. Tests now pass. :)

svandenhoven

Have tested the solution and it works. My previous remarks have been resolved.

There are some warnings in the code. Might be good to review at them.

svandenhoven suggested changes Dec 18, 2024

View reviewed changes

TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from 331e238 to 8dbe42b Compare January 15, 2025 10:02

tracyboehrer reviewed Jan 15, 2025

View reviewed changes

.gitignore Show resolved Hide resolved

tracyboehrer approved these changes Jan 15, 2025

View reviewed changes

TechPreacher requested a review from a team as a code owner January 16, 2025 08:01

github-actions bot added ML: Samples Tags changes to samples ML: Core Tags changes to core libraries ML: Tests Tags changes to tests From Fork This PR was created from a Fork labels Jan 16, 2025

TechPreacher force-pushed the feature/saschac-add-evaluation-client branch 2 times, most recently from d015f64 to 155723c Compare January 16, 2025 08:07

github-actions bot added ML: Samples Tags changes to samples and removed ML: Samples Tags changes to samples ML: Core Tags changes to core libraries ML: Tests Tags changes to tests labels Jan 16, 2025

TechPreacher closed this Jan 16, 2025

TechPreacher reopened this Jan 16, 2025

svandenhoven approved these changes Jan 16, 2025

View reviewed changes

TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from d125a12 to 17e5cdc Compare January 16, 2025 18:16

Squash commits

59b5346

TechPreacher force-pushed the feature/saschac-add-evaluation-client branch from 17e5cdc to 59b5346 Compare January 16, 2025 18:17

tracyboehrer merged commit 4100d31 into microsoft:main Jan 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation client #12

Add evaluation client #12

TechPreacher commented Dec 17, 2024 •

edited

Loading

svandenhoven left a comment

tracyboehrer commented Dec 19, 2024

TechPreacher commented Dec 19, 2024

TechPreacher commented Dec 19, 2024 •

edited

Loading

MattB-msft commented Jan 15, 2025

TechPreacher commented Jan 16, 2025

svandenhoven left a comment

Add evaluation client #12

Add evaluation client #12

Conversation

TechPreacher commented Dec 17, 2024 • edited Loading

Sample Evaluation

Evaluation Dataset.csv

Evaluation Dataset Results.csv

Explanation

svandenhoven left a comment

Choose a reason for hiding this comment

tracyboehrer commented Dec 19, 2024

TechPreacher commented Dec 19, 2024

TechPreacher commented Dec 19, 2024 • edited Loading

MattB-msft commented Jan 15, 2025

TechPreacher commented Jan 16, 2025

svandenhoven left a comment

Choose a reason for hiding this comment

TechPreacher commented Dec 17, 2024 •

edited

Loading

TechPreacher commented Dec 19, 2024 •

edited

Loading