Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OPCT-226: cmd/report UX enhancements (#76)
The `opct report` provides several improvements in the UX while reviewing the conformance results archive using OPCT, such as: - creating an intuitive HTML report allowing users to explore quickly issues and navigate to the logs for each test failure - introduce several gates/SLO/checks to be used as post-processors and get better visibility in the results, based on existing knowledge base/CI data or external systems - providing a better CLI UI exploring results ## Changes (overview) ### Improvements - Documentation updates for review guides - `report`: now the counters are displaying the percentage of it (when compared with the total). So it can quickly have one idea of failures - in general, we expect lower than 1% of failures in a regular installation for OpenShift Conformance ~~~ ==> Result Summary by test suite: ┌───────────────────────────────────────────┐ │ 05-openshift-cluster-upgrade: ✅ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 1 │ │ Passed │ 0 │ │ Failed │ 0 │ │ Timeout │ 0 │ │ Skipped │ 1 │ │ Result Job │ passed │ └───────────────────────────┴───────────────┘ ┌───────────────────────────────────────────┐ │ 10-openshift-kube-conformance: ✅ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 399 │ │ Passed │ 399 │ │ Failed │ 0 │ │ Timeout │ 0 │ │ Skipped │ 0 │ │ Filter Failed Suite │ 0 (0.00%) │ │ Filter Failed KF │ 0 (0.00%) │ │ Filter Replay │ 0 (0.00%) │ │ Filter Failed Baseline │ 0 (0.00%) │ │ Filter Failed Priority │ 0 (0.00%) │ │ Filter Failed API │ 0 (0.00%) │ │ Failures (Priotity) │ 0 (0.00%) │ │ Result - Job │ passed │ │ Result - Processed │ passed │ └───────────────────────────┴───────────────┘ ┌───────────────────────────────────────────┐ │ 20-openshift-conformance-validated: ❌ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 3783 │ │ Passed │ 1574 │ │ Failed │ 16 │ │ Timeout │ 0 │ │ Skipped │ 2193 │ │ Filter Failed Suite │ 14 (0.37%) │ │ Filter Failed KF │ 14 (0.37%) │ │ Filter Replay │ 13 (0.34%) │ │ Filter Failed Baseline │ 13 (0.34%) │ │ Filter Failed Priority │ 13 (0.34%) │ │ Filter Failed API │ 1 (0.03%) │ │ Failures (Priotity) │ 1 (0.03%) │ │ Result - Job │ failed │ │ Result - Processed │ failed │ └───────────────────────────┴───────────────┘ ~~~ - `report`: a headline with grouped (by test tags) occurrences is displayed before each failed list for both conformance plugins ~~~ => 10-openshift-kube-conformance: (36 failures, 28 flakes) --> Failed tests to Review (without flakes) - Immediate action: [total=8] [sig-apps=2 (25.00%)] [sig-cli=2 (25.00%)] [sig-node=2 (25.00%)] [sig-api-machinery=1 (12.50%)] [sig-arch=1 (12.50%)] [...] --> Failed flake tests - Statistic from OpenShift CI [total=28] [sig-api-machinery=12 (42.86%)] [sig-node=4 (14.29%)] [sig-network-edge=4 (14.29%)] [sig-trt=3 (10.71%)] [sig-architecture=3 (10.71%)] [bz-OLM=1 (3.57%)] [sig-arch=1 (3.57%)] [...] => 20-openshift-conformance-validated: (102 failures, 43 flakes) --> Failed tests to Review (without flakes) - Immediate action: [total=59] [sig-builds=15 (25.42%)] [sig-apps=10 (16.95%)] [sig-cli=7 (11.86%)] [sig-imageregistry=7 (11.86%)] [sig-auth=6 (10.17%)] [sig-network=6 (10.17%)] [sig-arch=2 (3.39%)] [sig-devex=1 (1.69%)] [sig-instrumentation=1 (1.69%)] [sig-api-machinery=1 (1.69%)] [sig-scheduling=1 (1.69%)] [sig-node=1 (1.69%)] [bz-Unknown=1 (1.69%)] [...] --> Failed flake tests - Statistic from OpenShift CI [total=43] [sig-api-machinery=12 (27.91%)] [sig-node=6 (13.95%)] [sig-arch=5 (11.63%)] [sig-network-edge=4 (9.30%)] [sig-trt=3 (6.98%)] [sig-instrumentation=3 (6.98%)] [sig-network=2 (4.65%)] [sig-architecture=2 (4.65%)] [sig-autoscaling=1 (2.33%)] [bz-OLM=1 (2.33%)] [bz-Routing=1 (2.33%)] [bz-Unknown=1 (2.33%)] [bz-kube-apiserver=1 (2.33%)] [bz-DNS=1 (2.33%)] [...] ~~~ - The final results binary [pass/fail] message (after filters) has been removed from the openshift conformance plugin. This field is temporarily removed to prevent mistakes when some issues happen in the filter pipeline, and also to focus on the failures, not on the binary value, considering that the goal is to zero-ed the failed tests. - `run --watch --watch-interval`: Status can set custom watch interval to decrease the number of logs in CI - `report --diff`: created as an alias of `--baseline`, setting it as deprecated - `report` is now showing the "Checks" section, applying some rules to help the reviewer where to focus, and prevent submitting results to the partner support case prematurely. ### Error counters - `report`/Plugins: search for failures by pattern in the plugin logs and aggregate it as "Suite Errors" - `report`/Must-gather: search for failures by pattern in the pod logs (must-gather), aggregating it as "Workload Errors" ### HTML report  - `report`: extracts important information, processes it, and saves it to a local directory to be served by a local web server, allowing one to quickly navigate to the failures for each test using a browser. - `report`: rank by error count - `report` HTML: scrapes the test documentation for [Kuebrnetes Conformance](https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.26.md#custom-resource-openapi-publish-stop-serving-version), shows the link for each item when it is available in the Kubernetes suites - `report` menu "Workload errors": has been added showing information about the pod logs counters - `report` menu "Suite errors": has been added showing information about the pod logs counters - `report` menu "Checks": has been added showing information about the result checklist - `report` tab "CAMGI": redirects to CAMGI static HTML page extracted from must-gather, when it is present, otherwise shows how to use CAMGI when it is not processed by the plugin - `report` tab "Filter": redirects to a static HTML page with all tests allowing the user to explore the test details - `report` tab "Events": redirects to a static HTML page with events created by Must-gather (extracted in the runtime) Many other features in the WebUI. #### Plugin Runtime - `status`: replace the message `waiting for post-processor...` to `complete` when the pod is finished. ~~~ # from Sat, 15 Jul 2023 00:39:31 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting for post-processor... 10-openshift-kube-conformance | complete | | 5/377 (0 failures) | waiting for post-processor... 20-openshift-conformance-validated | running | | 0/3684 (0 failures) | status=waiting-for=10-openshift-kube-conformance=(0/-372/0)=[3/1080] 99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=blocked-by=20-openshift-conformance-validated=(0/-3684/0)=[0/1080] # to Sat, 15 Jul 2023 01:17:58 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | complete 10-openshift-kube-conformance | complete | | 5/377 (0 failures) | complete 20-openshift-conformance-validated | running | | 5/3684 (0 failures) | status=running=T/C/P/F/S=3684/5/3/0/2 99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=waiting-for=20-openshift-conformance-validated=(0/-3679/0)=[8/1080] ~~~ - Supporting new openshift-tests plugin re-arch / refact: the main plugin/step orchestrating the conformance workflow (aka `openshift-tests-plugin`) has been refactored to Golang, especially targeting: - A) unblock complex/parallel operations in the plugin runtime; - B) delegate conformance executions to the `tests` image as a sidecar container, preventing plugin development when new dependencies are required by `openshift-tests` - C) [drastically] decrease the amount of failures impacting the test results related to the OPCT test environment - D) allow JUnit test processor before submitting to aggregator server. - Supporting plugin `Replay` ### Bug fixes - `report`: Flake tests now is querying to correct OCP version on Sippy API - `report`: Flake Filter report prevents duplicated tests from immediate action - `report`: Flake Filter report does not show items reporting than 5% of flake count in OCP CI (by Sippy) ~~~ Flakes Perc TestName 288 23.782% [bz-DNS][invariant] alert/KubePodNotReady should not be at or above pending in ns/openshift-dns 967 79.851% [bz-OLM][invariant] alert/KubePodNotReady should not be at or above pending in ns/openshift-marketplace [...] ~~~ - Filter pipeline is not breaking when the suite list is empty (collected by artifact collector plugin). Example: ~~~ Total tests by conformance suites: - kubernetes/conformance: 0 - openshift/conformance: 0 ~~~ ## Done checklist - [x] create documentation for `report` - [x] create documentation for the `report` checklist section - [x] review tests - [x] review feature description - [x] Create a dedicated Jira Epic/cards for known issues ### Documentation checklist - [x] Review Rules #77 #80 - [x] Command documentation `report`: #113 - [x] Command documentation `adm baseline *`: #113 - [x] Guide to 'Deep Dive' in the `report` options: #113
- Loading branch information