Integration tests for input detection cases in `/api/v1/task/server-streaming-classification-with-text-generation` #312

mdevino · 2025-02-24T22:11:31Z

This addreses #299.

The list of implemented test cases can be found on Integration tests for input detections in /api/v1/task/server-streaming-classification-with-text-generation [NLP] #299 and were implemented on tests/streaming_classification_with_gen.nlp. Besides, tests to isolated chunker and generation calls are present in tests/chunker.rs and tests/generation_nlp.rs, respectively.
As Integration tests for detection content endpoint #298 got outdated and wasn't merged until now, I closed it and favor of this PR. The tests for it are present in tests/detection_content.rs
Created tests/common to shared reusable code among test crates. The structure is as follows:
- tests/common/chunker.rs hosts code associated to chunkers.
- tests/common/detectors.rs hosts code associated to detectors.
- tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).
- tests/common/orchestrator.rs hosts code associated to the orchestrator, and also contains a test server for it.
- tests/common/mod.rs declares the submodules and hosts code that is not associated to any of the above.
Updated orchestrator test config file:
- Chunkers: added sentence_chunker
- Detectors: added angle_brackets detector with two configs - one associed to whole_doc_chunker and one associated to sentence_chunker.
Dependency changes:
- Updated dependencies to a later version (latest at the time of the commit).
- Added mocktail and test-log as dev-dependencies, and removed tracing-test.

gkumbhat

Couple of high level points:

Can you please add more details in the PR description. The current description doesn't seem to be matching all the changes proposed in this PR
Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

Cargo.toml

tests/canary_test.rs

tests/chunker.rs

mdevino · 2025-02-25T16:24:35Z

Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

@gkumbhat I could, but don't we implement unit tests in the appropriate files in /src?

mdevino · 2025-02-25T16:48:03Z

Can you please add more details in the PR description. The current description doesn't seem to be matching all the changes proposed in this PR

@gkumbhat Done. Let me know if any further changes are required.

declark1 · 2025-02-25T16:57:50Z

Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

@gkumbhat I could, but don't we implement unit tests in the appropriate files in /src?

FYI - in Rust /tests is for integration tests only (as it's external to /src), while unit tests are placed in modules under /src

declark1 · 2025-02-25T17:02:48Z

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

mdevino · 2025-02-25T17:10:43Z

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

mdevino · 2025-02-25T17:12:53Z

tests/common/orchestrator.rs

+pub struct TestOrchestratorServer {
+    base_url: Url,
+    health_url: Url,
+    client: reqwest::Client,
+    _handle: JoinHandle<Result<(), anyhow::Error>>,
+}


@declark1 I remember you suggesting some kind of builder pattern for the orchestrator config (so we could ditch the test configuration file). Could we address this on a separate issue?

declark1 · 2025-02-25T18:05:54Z

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

Yea true, I forgot that you need to assert the specific error code. Can't you just use the existing server::Error and client::Error types?

mdevino · 2025-02-25T21:06:00Z

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

Yea true, I forgot that you need to assert the specific error code. Can't you just use the existing server::Error and client::Error types?

I think I tried that, but needed something that implements serde's Serialize/Deserialize traits to parse the responses for assertion. As I wasn't sure if adding these traits to server::Error could have any side-effects, I opted for creating a separate struct altogether.

gkumbhat · 2025-02-26T18:58:20Z

tests/chunker.rs

+async fn test_isolated_chunker_unary_call() -> Result<(), anyhow::Error> {
+    // Add detector mock
+    let chunker_id = "sentence_chunker";
+    let input_test = "Hi there! how are you? I am great!";


nit:

Suggested change

let input_test = "Hi there! how are you? I am great!";

let input_text = "Hi there! how are you? I am great!";

tests/common/orchestrator.rs

gkumbhat · 2025-02-26T19:12:54Z

tests/detection_content.rs

General question, can we move to a paradigm (in next iteration) to start the servers once and later on run tests, that way there is no start and close of server at every test. This would simplify the tests

It could but done, but I would advise against it as it would introduce the chance of tests interfering with each other.

right, but starting a new server and tearing it down for each test could slow down the test execution due to the overhead.

Additionally, this is adding lot of boiler plate in each test.

Could you point at what you're calling boilerplate?

Most lines in a text body are related to setting mock request/responses. I don't see this going away with running a single orchestrator instance.

In addition, the orchestrator configuration would need to account for all tests. This adds the complexity of having to know all existing mocks to add a new test, otherwise it could impact existing ones.

I strongly advise against reusing the orchestrator server.

tests/test.config.yaml

tests/generation_nlp.rs

gkumbhat · 2025-02-26T21:31:33Z

tests/streaming_classification_with_gen_nlp.rs

+        .send()
+        .await?;
+
+    // // Example showing how to create an event stream from a bytes stream.


nit: Is this example here to show how to write even stream involving test? In that case, can we move this at the top and make it more clearer?

Since each test is lengthy, it would be good for it not take up extra lines with such examples. Additionally, it would be not clear why this code is commented out

I've used this piece of code to troubleshoot a few test cases. I've moved it to the top of the file with a suggestion on how to use it.

tests/streaming_classification_with_gen_nlp.rs

gkumbhat · 2025-02-26T21:33:43Z

tests/streaming_classification_with_gen_nlp.rs

+    assert!(messages.len() == 3);
+    assert!(messages[0].generated_text == Some("I".into()));
+    assert!(messages[1].generated_text == Some(" am".into()));
+    assert!(messages[2].generated_text == Some(" great!".into()));


can we also assert / check that the detection block is not available, showing "no detections"

tests/streaming_classification_with_gen_nlp.rs

gkumbhat · 2025-02-26T21:59:40Z

I could, but don't we implement unit tests in the appropriate files in /src?

@mdevino you are right. I changed the repos and missed that the unit tests are embedded in this one.

gkumbhat · 2025-03-03T21:13:47Z

tests/common/orchestrator.rs

+use bytes::Bytes;
+use eventsource_stream::{EventStream, Eventsource};
+use fms_guardrails_orchestr8::{config::OrchestratorConfig, orchestrator::Orchestrator};
+use futures::{stream::BoxStream, Stream, StreamExt};


nit: StreamExt isn't getting used ?

It is being used on line 205 (boxed()). My understanding is that clippy would complain if it weren't being used.

hmm. thats true clippy should complain. But not sure, where am missing, since I don't see reference to StreamExt on 205.

https://github.com/foundation-model-stack/fms-guardrails-orchestrator/pull/312/files#diff-1a9212e8f97b31a5e45cc8b28ebcbe726993229a7c05f9f844e266e9b360cb5aR205

Also did a Ctrl+F

gkumbhat · 2025-03-03T22:40:20Z

tests/generation_nlp.rs

+
+/// Asserts that the NlpClient correctly invokes the streaming endpoint.
+#[test(tokio::test)]
+async fn test_nlp_streaming_call() -> Result<(), anyhow::Error> {


just noticing that while the tests for detectors above are at orchestrator level, i.e we are starting up an orchestrator server and the expected response being compared is also orchestrator level output, but over here, we are comparing with the client's response. So these tests are essentially testing the client code itself and not orchestrator code.

I see that you have mentioned in the PR description that these are essentially the "isolated" ones, but a couple of points / questions:

are those tests coming in future ones?

This seems a bit confusing that we have in the same folder and with similar naming test files, we have some tests testing orchestrator level API integration and others testing internal client level code.

I can remove the client tests. Their main intent was to make sure I was mocking the calls correctly.

gkumbhat · 2025-03-03T22:43:51Z

tests/streaming_classification_with_gen_nlp.rs

+            }
+    );
+    assert!(messages[0].input_token_count == mock_tokenization_response.token_count as u32);
+    assert!(messages[0].warnings == Some(vec![DetectionWarning{ id: Some(fms_guardrails_orchestr8::models::DetectionWarningReason::UnsuitableInput), message: Some("Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed.".into()) }]));


nit: formatting wise, this seems a bit odd, may be we should split these into 2 lines for better readability.

I've broken this into two lines, but shouldn't this be configured in the linter/formatter?

gkumbhat · 2025-03-03T22:45:03Z

tests/test_config.yaml

+      hostname: localhost
+    type: sentence
+detectors:
+  test_detector:


where is this one getting used ?

This detector was already there when I started writing tests. I've removed that and tests keep passing, so I'll push the change to remove it.

gkumbhat · 2025-03-03T22:48:13Z

tests/streaming_classification_with_gen_nlp.rs

+                contents: vec!["This should return a 500".into()],
+                detector_params: DetectorParams::new(),
+            }),
+            MockResponse::json(&expected_detector_error).with_code(StatusCode::NOT_FOUND),


the error code here doesn't match the error code in the body, i.e 404 (not found) vs 500. This is pointing out that we give precedence to body than the header of the request, which is fine, but generally these 2 codes should be same, so this test is looking odd

Signed-off-by: Mateus Devino <mdevino@ibm.com>

…tor_id Signed-off-by: Mateus Devino <mdevino@ibm.com>

…to use random ports Signed-off-by: Mateus Devino <mdevino@ibm.com>

Signed-off-by: Mateus Devino <mdevino@ibm.com>

…hunker Signed-off-by: Mateus Devino <mdevino@ibm.com>

Signed-off-by: Mateus Devino <mdevino@ibm.com>

…rServer::run() Signed-off-by: Mateus Devino <mdevino@ibm.com>

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Signed-off-by: Mateus Devino <mateus@mdevino.com>

Signed-off-by: Mateus Devino <mdevino@ibm.com>

mdevino requested review from gkumbhat, evaline-ju and declark1 as code owners February 24, 2025 22:11

mdevino mentioned this pull request Feb 25, 2025

Integration tests for detection content endpoint #298

Closed

gkumbhat reviewed Feb 25, 2025

View reviewed changes

Cargo.toml Show resolved Hide resolved

tests/canary_test.rs Show resolved Hide resolved

tests/chunker.rs Outdated Show resolved Hide resolved

mdevino commented Feb 25, 2025

View reviewed changes

gkumbhat reviewed Feb 26, 2025

View reviewed changes

gkumbhat reviewed Mar 3, 2025

View reviewed changes

mdevino added 14 commits March 6, 2025 09:16

Refactor common integration test code

8080a70

Signed-off-by: Mateus Devino <mdevino@ibm.com>

/detection/content base test case

a895290

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Replace json macros with strong types

ee1440d

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Update test detector name to mention whole_doc_chunker

251cc47

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Testing mock for chunker gRPC call

1efa6d0

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Update detection_content.rs::test_single_detection() to include detec…

a4557af

…tor_id Signed-off-by: Mateus Devino <mdevino@ibm.com>

Update detection_content.rs::test_single_detection_whole_doc() mocks …

520836a

…to use random ports Signed-off-by: Mateus Devino <mdevino@ibm.com>

Make test_single_detection_whole_doc() more meaningful

658d4b7

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Add test case: detection_content.rs::test_single_detection_sentence_c…

149bd52

…hunker Signed-off-by: Mateus Devino <mdevino@ibm.com>

Move specific code back to canary_test.rs

32fc513

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: extract constants

db69cce

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: move grpc server macros to tests common module

6b771e4

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Add copyright notice to common test module

a0a14b3

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: create function for orchestrator config

0087e0e

Signed-off-by: Mateus Devino <mdevino@ibm.com>

mdevino and others added 28 commits March 6, 2025 09:24

test case: test_input_detector_sentence_chunker_no_detections

8ee9690

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_returns_404

d4d2ec0

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_returns_503

862184d

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_returns_non_compliant_message

c121aa2

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: move ensure_global_rustls_state() call into TestOrchestrato…

ed19f8a

…rServer::run() Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_whole_doc_with_detections()

f84d809

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_sentence_chunker_no_detections()

fbb8dbf

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_sentence_chunker_with_detections()

9d004c6

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: rename/move integration testing constants

ea50ec3

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_input_detector_returns_an_error()

5995ff3

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_generation_server_returns_an_error()

f521e0c

Signed-off-by: Mateus Devino <mdevino@ibm.com>

test case: test_orchestrator_receives_a_non_compliant_request()

bfa759c

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Add grpc_dns_probe_interval tests

7a2981e

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Rename streaming tests file

865fbaa

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Create test-specific header name constants

8de734f

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Undo src/server.rs changes

b5b5927

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Fix test_input_detector_returns_503() detector mock response status

df3b946

Signed-off-by: Mateus Devino <mdevino@ibm.com>

refactor: move input to variable in tests/chunker.rs

256d789

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Document chunker::test_isolated_chunker_unary_call()

bc9f167

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Rename tests/test.config.yaml

ce0d5b7

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Document test cases on comments

c89765b

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Add assertions to make sure no detections are returned

07924dc

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Update mocktail to 0.1.2-alpha

b7a13ef

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Fix comments

c0e0522

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Remove client tests

644437e

Signed-off-by: Mateus Devino <mateus@mdevino.com>

Remove test_detector from test config

4d968de

Signed-off-by: Mateus Devino <mateus@mdevino.com>

Fix test_input_detector_returns_500() status code

8c21edf

Signed-off-by: Mateus Devino <mateus@mdevino.com>

Extract warning message as constant

36f2489

Signed-off-by: Mateus Devino <mateus@mdevino.com>

mdevino force-pushed the streaming_classification_with_gen_integration_testing branch from 640d35b to 36f2489 Compare March 6, 2025 12:24

Fix mocktail dependency on Cargo.toml

f4cbe47

Signed-off-by: Mateus Devino <mdevino@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration tests for input detection cases in `/api/v1/task/server-streaming-classification-with-text-generation` #312

Integration tests for input detection cases in `/api/v1/task/server-streaming-classification-with-text-generation` #312

mdevino commented Feb 24, 2025 •

edited

Loading

gkumbhat left a comment

mdevino commented Feb 25, 2025

mdevino commented Feb 25, 2025 •

edited

Loading

declark1 commented Feb 25, 2025

declark1 commented Feb 25, 2025

mdevino commented Feb 25, 2025

mdevino Feb 25, 2025

declark1 commented Feb 25, 2025

mdevino commented Feb 25, 2025

gkumbhat Feb 26, 2025

mdevino Feb 27, 2025

gkumbhat Feb 26, 2025

mdevino Feb 26, 2025

gkumbhat Feb 27, 2025

mdevino Feb 27, 2025

gkumbhat Feb 26, 2025

mdevino Feb 27, 2025

gkumbhat Feb 26, 2025

mdevino Feb 27, 2025

gkumbhat commented Feb 26, 2025 •

edited

Loading

gkumbhat Mar 3, 2025

mdevino Mar 3, 2025

gkumbhat Mar 6, 2025

gkumbhat Mar 3, 2025

mdevino Mar 3, 2025

gkumbhat Mar 3, 2025

mdevino Mar 3, 2025

gkumbhat Mar 3, 2025

mdevino Mar 3, 2025

gkumbhat Mar 3, 2025

mdevino Mar 3, 2025

	let input_test = "Hi there! how are you? I am great!";
	let input_text = "Hi there! how are you? I am great!";

Integration tests for input detection cases in /api/v1/task/server-streaming-classification-with-text-generation #312

Are you sure you want to change the base?

Integration tests for input detection cases in /api/v1/task/server-streaming-classification-with-text-generation #312

Conversation

mdevino commented Feb 24, 2025 • edited Loading

gkumbhat left a comment

Choose a reason for hiding this comment

mdevino commented Feb 25, 2025

mdevino commented Feb 25, 2025 • edited Loading

declark1 commented Feb 25, 2025

declark1 commented Feb 25, 2025

mdevino commented Feb 25, 2025

Choose a reason for hiding this comment

declark1 commented Feb 25, 2025

mdevino commented Feb 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkumbhat commented Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Integration tests for input detection cases in `/api/v1/task/server-streaming-classification-with-text-generation` #312

Integration tests for input detection cases in `/api/v1/task/server-streaming-classification-with-text-generation` #312

mdevino commented Feb 24, 2025 •

edited

Loading

mdevino commented Feb 25, 2025 •

edited

Loading

gkumbhat commented Feb 26, 2025 •

edited

Loading