-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests for input detection cases in /api/v1/task/server-streaming-classification-with-text-generation
#312
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of high level points:
- Can you please add more details in the PR description. The current description doesn't seem to be matching all the changes proposed in this PR
- Can we move all the integration tests to
tests/integration
that way they don't get mixed with unit tests
@gkumbhat I could, but don't we implement unit tests in the appropriate files in /src? |
@gkumbhat Done. Let me know if any further changes are required. |
FYI - in Rust |
I suggest that we just use |
@declark1 Would this enable accessing fields inside the error? I ask because I use |
pub struct TestOrchestratorServer { | ||
base_url: Url, | ||
health_url: Url, | ||
client: reqwest::Client, | ||
_handle: JoinHandle<Result<(), anyhow::Error>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@declark1 I remember you suggesting some kind of builder pattern for the orchestrator config (so we could ditch the test configuration file). Could we address this on a separate issue?
Yea true, I forgot that you need to assert the specific error code. Can't you just use the existing |
I think I tried that, but needed something that implements serde's Serialize/Deserialize traits to parse the responses for assertion. As I wasn't sure if adding these traits to |
tests/chunker.rs
Outdated
async fn test_isolated_chunker_unary_call() -> Result<(), anyhow::Error> { | ||
// Add detector mock | ||
let chunker_id = "sentence_chunker"; | ||
let input_test = "Hi there! how are you? I am great!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
let input_test = "Hi there! how are you? I am great!"; | |
let input_text = "Hi there! how are you? I am great!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/detection_content.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question, can we move to a paradigm (in next iteration) to start the servers once and later on run tests, that way there is no start and close of server at every test. This would simplify the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could but done, but I would advise against it as it would introduce the chance of tests interfering with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, but starting a new server and tearing it down for each test could slow down the test execution due to the overhead.
Additionally, this is adding lot of boiler plate in each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point at what you're calling boilerplate?
Most lines in a text body are related to setting mock request/responses. I don't see this going away with running a single orchestrator instance.
In addition, the orchestrator configuration would need to account for all tests. This adds the complexity of having to know all existing mocks to add a new test, otherwise it could impact existing ones.
I strongly advise against reusing the orchestrator server.
.send() | ||
.await?; | ||
|
||
// // Example showing how to create an event stream from a bytes stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this example here to show how to write even stream involving test? In that case, can we move this at the top and make it more clearer?
Since each test is lengthy, it would be good for it not take up extra lines with such examples. Additionally, it would be not clear why this code is commented out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used this piece of code to troubleshoot a few test cases. I've moved it to the top of the file with a suggestion on how to use it.
assert!(messages.len() == 3); | ||
assert!(messages[0].generated_text == Some("I".into())); | ||
assert!(messages[1].generated_text == Some(" am".into())); | ||
assert!(messages[2].generated_text == Some(" great!".into())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also assert / check that the detection block is not available, showing "no detections"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@mdevino you are right. I changed the repos and missed that the unit tests are embedded in this one. |
use bytes::Bytes; | ||
use eventsource_stream::{EventStream, Eventsource}; | ||
use fms_guardrails_orchestr8::{config::OrchestratorConfig, orchestrator::Orchestrator}; | ||
use futures::{stream::BoxStream, Stream, StreamExt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: StreamExt
isn't getting used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is being used on line 205 (boxed()
). My understanding is that clippy
would complain if it weren't being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. thats true clippy should complain. But not sure, where am missing, since I don't see reference to StreamExt
on 205.
Also did a Ctrl+F
tests/generation_nlp.rs
Outdated
|
||
/// Asserts that the NlpClient correctly invokes the streaming endpoint. | ||
#[test(tokio::test)] | ||
async fn test_nlp_streaming_call() -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticing that while the tests for detectors above are at orchestrator
level, i.e we are starting up an orchestrator server and the expected response being compared is also orchestrator level output, but over here, we are comparing with the client's response. So these tests are essentially testing the client code itself and not orchestrator code.
I see that you have mentioned in the PR description that these are essentially the "isolated" ones, but a couple of points / questions:
- are those tests coming in future ones?
- This seems a bit confusing that we have in the same folder and with similar naming test files, we have some tests testing orchestrator level API integration and others testing internal client level code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the client tests. Their main intent was to make sure I was mocking the calls correctly.
} | ||
); | ||
assert!(messages[0].input_token_count == mock_tokenization_response.token_count as u32); | ||
assert!(messages[0].warnings == Some(vec![DetectionWarning{ id: Some(fms_guardrails_orchestr8::models::DetectionWarningReason::UnsuitableInput), message: Some("Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed.".into()) }])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting wise, this seems a bit odd, may be we should split these into 2 lines for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've broken this into two lines, but shouldn't this be configured in the linter/formatter?
tests/test_config.yaml
Outdated
hostname: localhost | ||
type: sentence | ||
detectors: | ||
test_detector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this one getting used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detector was already there when I started writing tests. I've removed that and tests keep passing, so I'll push the change to remove it.
contents: vec!["This should return a 500".into()], | ||
detector_params: DetectorParams::new(), | ||
}), | ||
MockResponse::json(&expected_detector_error).with_code(StatusCode::NOT_FOUND), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error code here doesn't match the error code in the body, i.e 404
(not found) vs 500
. This is pointing out that we give precedence to body than the header of the request, which is fine, but generally these 2 codes should be same, so this test is looking odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…tor_id Signed-off-by: Mateus Devino <mdevino@ibm.com>
…to use random ports Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…hunker Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…rServer::run() Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mdevino@ibm.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
Signed-off-by: Mateus Devino <mateus@mdevino.com>
640d35b
to
36f2489
Compare
Signed-off-by: Mateus Devino <mdevino@ibm.com>
This addreses #299.
/api/v1/task/server-streaming-classification-with-text-generation
[NLP] #299 and were implemented ontests/streaming_classification_with_gen.nlp
. Besides, tests to isolated chunker and generation calls are present intests/chunker.rs
andtests/generation_nlp.rs
, respectively.tests/detection_content.rs
tests/common
to shared reusable code among test crates. The structure is as follows:tests/common/chunker.rs
hosts code associated to chunkers.tests/common/detectors.rs
hosts code associated to detectors.tests/common/errors.rs
currently contains structs representing errors returned from detectors and orchestrator. The idea behind having these classes instead of using already existing ones under/src
is to make API changes more evident (as tests would fail due to deserialization issues in case of API changes).tests/common/orchestrator.rs
hosts code associated to the orchestrator, and also contains a test server for it.tests/common/mod.rs
declares the submodules and hosts code that is not associated to any of the above.sentence_chunker
angle_brackets
detector with two configs - one associed to whole_doc_chunker and one associated tosentence_chunker
.