Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests for input detection cases in /api/v1/task/server-streaming-classification-with-text-generation #312

Open
wants to merge 64 commits into
base: main
Choose a base branch
from

Conversation

mdevino
Copy link
Collaborator

@mdevino mdevino commented Feb 24, 2025

This addreses #299.

  • The list of implemented test cases can be found on Integration tests for input detections in /api/v1/task/server-streaming-classification-with-text-generation [NLP] #299 and were implemented on tests/streaming_classification_with_gen.nlp. Besides, tests to isolated chunker and generation calls are present in tests/chunker.rs and tests/generation_nlp.rs, respectively.
  • As Integration tests for detection content endpoint #298 got outdated and wasn't merged until now, I closed it and favor of this PR. The tests for it are present in tests/detection_content.rs
  • Created tests/common to shared reusable code among test crates. The structure is as follows:
    • tests/common/chunker.rs hosts code associated to chunkers.
    • tests/common/detectors.rs hosts code associated to detectors.
    • tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).
    • tests/common/orchestrator.rs hosts code associated to the orchestrator, and also contains a test server for it.
    • tests/common/mod.rs declares the submodules and hosts code that is not associated to any of the above.
  • Updated orchestrator test config file:
    • Chunkers: added sentence_chunker
    • Detectors: added angle_brackets detector with two configs - one associed to whole_doc_chunker and one associated to sentence_chunker.
  • Dependency changes:
    • Updated dependencies to a later version (latest at the time of the commit).
    • Added mocktail and test-log as dev-dependencies, and removed tracing-test.

Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of high level points:

  1. Can you please add more details in the PR description. The current description doesn't seem to be matching all the changes proposed in this PR
  2. Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

@mdevino
Copy link
Collaborator Author

mdevino commented Feb 25, 2025

Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

@gkumbhat I could, but don't we implement unit tests in the appropriate files in /src?

@mdevino
Copy link
Collaborator Author

mdevino commented Feb 25, 2025

Can you please add more details in the PR description. The current description doesn't seem to be matching all the changes proposed in this PR

@gkumbhat Done. Let me know if any further changes are required.

@declark1
Copy link
Collaborator

Can we move all the integration tests to tests/integration that way they don't get mixed with unit tests

@gkumbhat I could, but don't we implement unit tests in the appropriate files in /src?

FYI - in Rust /tests is for integration tests only (as it's external to /src), while unit tests are placed in modules under /src

@declark1
Copy link
Collaborator

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@mdevino
Copy link
Collaborator Author

mdevino commented Feb 25, 2025

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

Comment on lines +55 to +59
pub struct TestOrchestratorServer {
base_url: Url,
health_url: Url,
client: reqwest::Client,
_handle: JoinHandle<Result<(), anyhow::Error>>,
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@declark1 I remember you suggesting some kind of builder pattern for the orchestrator config (so we could ditch the test configuration file). Could we address this on a separate issue?

@declark1
Copy link
Collaborator

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

Yea true, I forgot that you need to assert the specific error code. Can't you just use the existing server::Error and client::Error types?

@mdevino
Copy link
Collaborator Author

mdevino commented Feb 25, 2025

tests/common/errors.rs currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under /src is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).

I suggest that we just use anyhow::Error for tests, which all errors can be converted into similar to Box<dyn std::error::Error>

@declark1 Would this enable accessing fields inside the error? I ask because I use OrchestratorError.code and .details for assertions.

Yea true, I forgot that you need to assert the specific error code. Can't you just use the existing server::Error and client::Error types?

I think I tried that, but needed something that implements serde's Serialize/Deserialize traits to parse the responses for assertion. As I wasn't sure if adding these traits to server::Error could have any side-effects, I opted for creating a separate struct altogether.

tests/chunker.rs Outdated
async fn test_isolated_chunker_unary_call() -> Result<(), anyhow::Error> {
// Add detector mock
let chunker_id = "sentence_chunker";
let input_test = "Hi there! how are you? I am great!";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let input_test = "Hi there! how are you? I am great!";
let input_text = "Hi there! how are you? I am great!";

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, can we move to a paradigm (in next iteration) to start the servers once and later on run tests, that way there is no start and close of server at every test. This would simplify the tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but done, but I would advise against it as it would introduce the chance of tests interfering with each other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but starting a new server and tearing it down for each test could slow down the test execution due to the overhead.

Additionally, this is adding lot of boiler plate in each test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point at what you're calling boilerplate?

Most lines in a text body are related to setting mock request/responses. I don't see this going away with running a single orchestrator instance.

In addition, the orchestrator configuration would need to account for all tests. This adds the complexity of having to know all existing mocks to add a new test, otherwise it could impact existing ones.

I strongly advise against reusing the orchestrator server.

.send()
.await?;

// // Example showing how to create an event stream from a bytes stream.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this example here to show how to write even stream involving test? In that case, can we move this at the top and make it more clearer?

Since each test is lengthy, it would be good for it not take up extra lines with such examples. Additionally, it would be not clear why this code is commented out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this piece of code to troubleshoot a few test cases. I've moved it to the top of the file with a suggestion on how to use it.

assert!(messages.len() == 3);
assert!(messages[0].generated_text == Some("I".into()));
assert!(messages[1].generated_text == Some(" am".into()));
assert!(messages[2].generated_text == Some(" great!".into()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also assert / check that the detection block is not available, showing "no detections"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gkumbhat
Copy link
Collaborator

gkumbhat commented Feb 26, 2025

I could, but don't we implement unit tests in the appropriate files in /src?

@mdevino you are right. I changed the repos and missed that the unit tests are embedded in this one.

use bytes::Bytes;
use eventsource_stream::{EventStream, Eventsource};
use fms_guardrails_orchestr8::{config::OrchestratorConfig, orchestrator::Orchestrator};
use futures::{stream::BoxStream, Stream, StreamExt};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: StreamExt isn't getting used ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used on line 205 (boxed()). My understanding is that clippy would complain if it weren't being used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. thats true clippy should complain. But not sure, where am missing, since I don't see reference to StreamExt on 205.

https://github.com/foundation-model-stack/fms-guardrails-orchestrator/pull/312/files#diff-1a9212e8f97b31a5e45cc8b28ebcbe726993229a7c05f9f844e266e9b360cb5aR205

Also did a Ctrl+F


/// Asserts that the NlpClient correctly invokes the streaming endpoint.
#[test(tokio::test)]
async fn test_nlp_streaming_call() -> Result<(), anyhow::Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticing that while the tests for detectors above are at orchestrator level, i.e we are starting up an orchestrator server and the expected response being compared is also orchestrator level output, but over here, we are comparing with the client's response. So these tests are essentially testing the client code itself and not orchestrator code.

I see that you have mentioned in the PR description that these are essentially the "isolated" ones, but a couple of points / questions:

  1. are those tests coming in future ones?
  2. This seems a bit confusing that we have in the same folder and with similar naming test files, we have some tests testing orchestrator level API integration and others testing internal client level code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the client tests. Their main intent was to make sure I was mocking the calls correctly.

}
);
assert!(messages[0].input_token_count == mock_tokenization_response.token_count as u32);
assert!(messages[0].warnings == Some(vec![DetectionWarning{ id: Some(fms_guardrails_orchestr8::models::DetectionWarningReason::UnsuitableInput), message: Some("Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed.".into()) }]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting wise, this seems a bit odd, may be we should split these into 2 lines for better readability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've broken this into two lines, but shouldn't this be configured in the linter/formatter?

hostname: localhost
type: sentence
detectors:
test_detector:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this one getting used ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detector was already there when I started writing tests. I've removed that and tests keep passing, so I'll push the change to remove it.

contents: vec!["This should return a 500".into()],
detector_params: DetectorParams::new(),
}),
MockResponse::json(&expected_detector_error).with_code(StatusCode::NOT_FOUND),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error code here doesn't match the error code in the body, i.e 404 (not found) vs 500. This is pointing out that we give precedence to body than the header of the request, which is fine, but generally these 2 codes should be same, so this test is looking odd

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

mdevino added 14 commits March 6, 2025 09:16
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…tor_id

Signed-off-by: Mateus Devino <mdevino@ibm.com>
…to use random ports

Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…hunker

Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
mdevino and others added 28 commits March 6, 2025 09:24
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…rServer::run()

Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
@mdevino mdevino force-pushed the streaming_classification_with_gen_integration_testing branch from 640d35b to 36f2489 Compare March 6, 2025 12:24
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants