OPCT-226: cmd/report UX enhancements (#76)

The `opct report` provides several improvements in the UX while reviewing the conformance results archive using OPCT, such as: - creating an intuitive HTML report allowing users to explore quickly issues and navigate to the logs for each test failure - introduce several gates/SLO/checks to be used as post-processors and get better visibility in the results, based on existing knowledge base/CI data or external systems - providing a better CLI UI exploring results ## Changes (overview) ### Improvements - Documentation updates for review guides - `report`: now the counters are displaying the percentage of it (when compared with the total). So it can quickly have one idea of failures - in general, we expect lower than 1% of failures in a regular installation for OpenShift Conformance ~~~ ==> Result Summary by test suite: ┌───────────────────────────────────────────┐ │ 05-openshift-cluster-upgrade: ✅ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 1 │ │ Passed │ 0 │ │ Failed │ 0 │ │ Timeout │ 0 │ │ Skipped │ 1 │ │ Result Job │ passed │ └───────────────────────────┴───────────────┘ ┌───────────────────────────────────────────┐ │ 10-openshift-kube-conformance: ✅ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 399 │ │ Passed │ 399 │ │ Failed │ 0 │ │ Timeout │ 0 │ │ Skipped │ 0 │ │ Filter Failed Suite │ 0 (0.00%) │ │ Filter Failed KF │ 0 (0.00%) │ │ Filter Replay │ 0 (0.00%) │ │ Filter Failed Baseline │ 0 (0.00%) │ │ Filter Failed Priority │ 0 (0.00%) │ │ Filter Failed API │ 0 (0.00%) │ │ Failures (Priotity) │ 0 (0.00%) │ │ Result - Job │ passed │ │ Result - Processed │ passed │ └───────────────────────────┴───────────────┘ ┌───────────────────────────────────────────┐ │ 20-openshift-conformance-validated: ❌ │ ├───────────────────────────┬───────────────┤ │ Total tests │ 3783 │ │ Passed │ 1574 │ │ Failed │ 16 │ │ Timeout │ 0 │ │ Skipped │ 2193 │ │ Filter Failed Suite │ 14 (0.37%) │ │ Filter Failed KF │ 14 (0.37%) │ │ Filter Replay │ 13 (0.34%) │ │ Filter Failed Baseline │ 13 (0.34%) │ │ Filter Failed Priority │ 13 (0.34%) │ │ Filter Failed API │ 1 (0.03%) │ │ Failures (Priotity) │ 1 (0.03%) │ │ Result - Job │ failed │ │ Result - Processed │ failed │ └───────────────────────────┴───────────────┘ ~~~ - `report`: a headline with grouped (by test tags) occurrences is displayed before each failed list for both conformance plugins ~~~ => 10-openshift-kube-conformance: (36 failures, 28 flakes) --> Failed tests to Review (without flakes) - Immediate action: [total=8] [sig-apps=2 (25.00%)] [sig-cli=2 (25.00%)] [sig-node=2 (25.00%)] [sig-api-machinery=1 (12.50%)] [sig-arch=1 (12.50%)] [...] --> Failed flake tests - Statistic from OpenShift CI [total=28] [sig-api-machinery=12 (42.86%)] [sig-node=4 (14.29%)] [sig-network-edge=4 (14.29%)] [sig-trt=3 (10.71%)] [sig-architecture=3 (10.71%)] [bz-OLM=1 (3.57%)] [sig-arch=1 (3.57%)] [...] => 20-openshift-conformance-validated: (102 failures, 43 flakes) --> Failed tests to Review (without flakes) - Immediate action: [total=59] [sig-builds=15 (25.42%)] [sig-apps=10 (16.95%)] [sig-cli=7 (11.86%)] [sig-imageregistry=7 (11.86%)] [sig-auth=6 (10.17%)] [sig-network=6 (10.17%)] [sig-arch=2 (3.39%)] [sig-devex=1 (1.69%)] [sig-instrumentation=1 (1.69%)] [sig-api-machinery=1 (1.69%)] [sig-scheduling=1 (1.69%)] [sig-node=1 (1.69%)] [bz-Unknown=1 (1.69%)] [...] --> Failed flake tests - Statistic from OpenShift CI [total=43] [sig-api-machinery=12 (27.91%)] [sig-node=6 (13.95%)] [sig-arch=5 (11.63%)] [sig-network-edge=4 (9.30%)] [sig-trt=3 (6.98%)] [sig-instrumentation=3 (6.98%)] [sig-network=2 (4.65%)] [sig-architecture=2 (4.65%)] [sig-autoscaling=1 (2.33%)] [bz-OLM=1 (2.33%)] [bz-Routing=1 (2.33%)] [bz-Unknown=1 (2.33%)] [bz-kube-apiserver=1 (2.33%)] [bz-DNS=1 (2.33%)] [...] ~~~ - The final results binary [pass/fail] message (after filters) has been removed from the openshift conformance plugin. This field is temporarily removed to prevent mistakes when some issues happen in the filter pipeline, and also to focus on the failures, not on the binary value, considering that the goal is to zero-ed the failed tests. - `run --watch --watch-interval`: Status can set custom watch interval to decrease the number of logs in CI - `report --diff`: created as an alias of `--baseline`, setting it as deprecated - `report` is now showing the "Checks" section, applying some rules to help the reviewer where to focus, and prevent submitting results to the partner support case prematurely. ### Error counters - `report`/Plugins: search for failures by pattern in the plugin logs and aggregate it as "Suite Errors" - `report`/Must-gather: search for failures by pattern in the pod logs (must-gather), aggregating it as "Workload Errors" ### HTML report ![Screenshot from 2023-08-08 16-39-24](https://github.com/redhat-openshift-ecosystem/provider-certification-tool/assets/3216894/6b6bfbe6-58d6-4cbf-ae61-c75ab83b8ca7) - `report`: extracts important information, processes it, and saves it to a local directory to be served by a local web server, allowing one to quickly navigate to the failures for each test using a browser. - `report`: rank by error count - `report` HTML: scrapes the test documentation for [Kuebrnetes Conformance](https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.26.md#custom-resource-openapi-publish-stop-serving-version), shows the link for each item when it is available in the Kubernetes suites - `report` menu "Workload errors": has been added showing information about the pod logs counters - `report` menu "Suite errors": has been added showing information about the pod logs counters - `report` menu "Checks": has been added showing information about the result checklist - `report` tab "CAMGI": redirects to CAMGI static HTML page extracted from must-gather, when it is present, otherwise shows how to use CAMGI when it is not processed by the plugin - `report` tab "Filter": redirects to a static HTML page with all tests allowing the user to explore the test details - `report` tab "Events": redirects to a static HTML page with events created by Must-gather (extracted in the runtime) Many other features in the WebUI. #### Plugin Runtime - `status`: replace the message `waiting for post-processor...` to `complete` when the pod is finished. ~~~ # from Sat, 15 Jul 2023 00:39:31 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting for post-processor... 10-openshift-kube-conformance | complete | | 5/377 (0 failures) | waiting for post-processor... 20-openshift-conformance-validated | running | | 0/3684 (0 failures) | status=waiting-for=10-openshift-kube-conformance=(0/-372/0)=[3/1080] 99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=blocked-by=20-openshift-conformance-validated=(0/-3684/0)=[0/1080] # to Sat, 15 Jul 2023 01:17:58 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | complete 10-openshift-kube-conformance | complete | | 5/377 (0 failures) | complete 20-openshift-conformance-validated | running | | 5/3684 (0 failures) | status=running=T/C/P/F/S=3684/5/3/0/2 99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=waiting-for=20-openshift-conformance-validated=(0/-3679/0)=[8/1080] ~~~ - Supporting new openshift-tests plugin re-arch / refact: the main plugin/step orchestrating the conformance workflow (aka `openshift-tests-plugin`) has been refactored to Golang, especially targeting: - A) unblock complex/parallel operations in the plugin runtime; - B) delegate conformance executions to the `tests` image as a sidecar container, preventing plugin development when new dependencies are required by `openshift-tests` - C) [drastically] decrease the amount of failures impacting the test results related to the OPCT test environment - D) allow JUnit test processor before submitting to aggregator server. - Supporting plugin `Replay` ### Bug fixes - `report`: Flake tests now is querying to correct OCP version on Sippy API - `report`: Flake Filter report prevents duplicated tests from immediate action - `report`: Flake Filter report does not show items reporting than 5% of flake count in OCP CI (by Sippy) ~~~ Flakes Perc TestName 288 23.782% [bz-DNS][invariant] alert/KubePodNotReady should not be at or above pending in ns/openshift-dns 967 79.851% [bz-OLM][invariant] alert/KubePodNotReady should not be at or above pending in ns/openshift-marketplace [...] ~~~ - Filter pipeline is not breaking when the suite list is empty (collected by artifact collector plugin). Example: ~~~ Total tests by conformance suites: - kubernetes/conformance: 0 - openshift/conformance: 0 ~~~ ## Done checklist - [x] create documentation for `report` - [x] create documentation for the `report` checklist section - [x] review tests - [x] review feature description - [x] Create a dedicated Jira Epic/cards for known issues ### Documentation checklist - [x] Review Rules #77 #80 - [x] Command documentation `report`: #113 - [x] Command documentation `adm baseline *`: #113 - [x] Guide to 'Deep Dive' in the `report` options: #113
redhat-openshift-ecosystem · Aug 8, 2024 · f321a27 · f321a27
1 parent e0e2687
commit f321a27
Show file tree

Hide file tree

Showing 82 changed files with 9,096 additions and 2,062 deletions.
diff --git a/.github/workflows/e2e.yaml b/.github/workflows/e2e.yaml
@@ -42,9 +42,16 @@ jobs:
           echo "> Setting run permissions to OPCT:"
           chmod u+x ${OPCT}
 
-          echo "> Running OPCT report:"
+          echo "> Running OPCT report (simple):"
           ${OPCT} report /tmp/result.tar.gz
 
+          echo "> Running OPCT report (advanced):"
+          ${OPCT} report /tmp/result.tar.gz \
+            --log-level=debug \
+            --save-to=/tmp/results-data \
+            --skip-server=true \
+            --skip-baseline-api=true
+
   e2e-cmd_adm-parse-etcd-logs:
     name: "e2e-cmd_adm-parse-etcd-logs"
     runs-on: ubuntu-latest
@@ -146,3 +153,36 @@ jobs:
           ${CUSTOM_BUILD_PATH} adm parse-metrics \
             --input ${LOCAL_TEST_DATA} --output /tmp/metrics
           tree /tmp/metrics
+
+  e2e-cmd_adm-baseline:
+    name: "e2e-cmd_adm-baseline"
+    runs-on: ubuntu-latest
+    steps:
+      - name: Download artifacts
+        uses: actions/download-artifact@v4
+        with:
+          name: opct-linux-amd64
+          path: /tmp/build/
+
+      - name: Preparing testdata
+        env:
+          OPCT: /tmp/build/opct-linux-amd64
+        run: |
+          echo "> Setting exec permissions to OPCT:"
+          chmod u+x ${OPCT}
+
+      - name: "e2e adm baseline: opct adm baseline (list|get)"
+        env:
+          OPCT: /tmp/build/opct-linux-amd64
+        run: |
+          echo -e "\n\t#>> List latest baseline results"
+          ${OPCT} adm baseline list
+
+          echo -e "\n\t#>> List all baseline results"
+          ${OPCT} adm baseline list --all
+
+          echo -e "\n\t#>> Retrieve a baseline result by name"
+          ${OPCT} adm baseline get --name 4.16_None_latest --dump
+
+          echo -e "\n\t#>> Retrieve a baseline result by release and platform"
+          ${OPCT} adm baseline get --release 4.15 --platform None
diff --git a/.github/workflows/go.yaml b/.github/workflows/go.yaml
@@ -22,9 +22,15 @@ jobs:
     name: linters
     uses: ./.github/workflows/pre_linters.yaml
 
+  reviewer:
+    name: reviewer
+    uses: ./.github/workflows/pre_reviewer.yaml
+
   go-test:
     runs-on: ubuntu-latest
-    needs: linters
+    needs:
+      - linters
+      - reviewer
     steps:
       - uses: actions/checkout@v4
       - uses: actions/setup-go@v5

diff --git a/.github/workflows/pre_reviewer.yaml b/.github/workflows/pre_reviewer.yaml
@@ -0,0 +1,51 @@
+---
+name: reviewer
+
+on:
+  workflow_call: {}
+
+# golangci-lint-action requires those permissions to annotate issues in the PR.
+permissions:
+  contents: read
+  checks: write
+  issues: read
+  pull-requests: write
+
+env:
+  GO_VERSION: 1.22
+  GOLANGCI_LINT_VERSION: v1.59
+
+jobs:
+  # reviewdog / misspell:  https://github.com/reviewdog/action-misspell
+  misspell:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: reviewdog/action-misspell@v1
+        with:
+          github_token: ${{ secrets.github_token }}
+          reporter: github-pr-review
+          #level: warning
+          locale: "US"
+
+  # reviewdog / suggester: https://github.com/reviewdog/action-suggester
+  go_fmt:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: gofmt -w -s .
+      - uses: reviewdog/action-suggester@v1
+        with:
+          tool_name: gofmt
+
+  # https://github.com/reviewdog/action-hadolint
+  # containerfile:
+  #   name: runner / hadolint
+  #   runs-on: ubuntu-latest
+  #   steps:
+  #     - name: Check out code
+  #       uses: actions/checkout@v4
+  #     - name: hadolint
+  #       uses: reviewdog/action-hadolint@v1
+  #       with:
+  #         reporter: github-pr-review
diff --git a/.github/workflows/static-website.yml b/.github/workflows/static-website.yml
@@ -4,8 +4,12 @@ name: Documentation
 on:
   # Static pages are build only targeting the main branch
   push:
-    branches: ["main"]
-    paths: ['mkdocs.yml', 'docs/**', 'hack/docs-requirements.txt']
+    branches:
+      - "main"
+    paths:
+      - 'mkdocs.yml'
+      - 'docs/**'
+      - 'hack/docs-requirements.txt'
 
   workflow_dispatch:
 

diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,7 @@ kubeconfig
 
 # build files
 dist/
+build/
 
 # changelog is generated automaticaly by hack/generate-changelog.sh
 # available only in the rendered webpage (built by mkdocs).

diff --git a/Makefile b/Makefile
@@ -5,7 +5,7 @@ export GO111MODULE=on
 export CGO_ENABLED=0
 
 BUILD_DIR ?= $(PWD)/build
-IMG ?= quay.io/ocp-cert/opct
+IMG ?= quay.io/opct/opct
 VERSION=$(shell git rev-parse --short HEAD)
 RELEASE_TAG ?= 0.0.0
 BIN_NAME ?= opct
@@ -57,15 +57,30 @@ build-darwin-arm64: build
 linux-amd64-container: build-linux-amd64
 	podman build -t $(IMG):latest -f hack/Containerfile --build-arg=RELEASE_TAG=$(RELEASE_TAG) .
 
-.PHONY: image-mirror-sonobuoy
-image-mirror-sonobuoy:
-	./hack/image-mirror-sonobuoy/mirror.sh
+# Publish devel binaries (non-official). Must be used only for troubleshooting in development/support.
+.PHONY: publish-amd64-devel
+publish-amd64-devel: build-linux-amd64
+	aws s3 cp $(BUILD_DIR)/opct-linux-amd64 s3://openshift-provider-certification/bin/opct-linux-amd64-devel
 
-# Utils dev
-.PHONY: update-go
-update-go:
-	go get -u
-	go mod tidy
+.PHONY: publish-darwin-arm64-devel
+publish-darwin-arm64-devel: build-darwin-arm64
+	aws s3 cp $(BUILD_DIR)/opct-darwin-arm64 s3://openshift-provider-certification/bin/opct-darwin-arm64-devel
+
+.PHONY: publish-devel
+publish-devel: publish-amd64-devel
+publish-devel: publish-darwin-arm64-devel
+
+#
+# Test
+#
+
+.PHONY: test-lint
+test-lint:
+	@echo "Running linting tools"
+	# Download https://github.com/golangci/golangci-lint/releases/tag/v1.59.1
+	golangci-lint run --timeout=10m
+	# yamllint: pip install yamllint
+	yamllint .github/workflows/*.yaml
 
 .PHONY: test
 test:
@@ -90,3 +105,13 @@ build-changelog:
 .PHONY: build-docs
 build-docs: build-changelog
 	mkdocs build --site-dir ./site
+
+.PHONY: image-mirror-sonobuoy
+image-mirror-sonobuoy:
+	./hack/image-mirror-sonobuoy/mirror.sh
+
+# Utils dev
+.PHONY: update-go
+update-go:
+	go get -u
+	go mod tidy
diff --git a/cmd/root.go → cmd/opct/root.go b/cmd/root.go → cmd/opct/root.go
@@ -5,21 +5,24 @@ import (
 	"os"
 
 	log "github.com/sirupsen/logrus"
+	logwriter "github.com/sirupsen/logrus/hooks/writer"
 
 	"github.com/spf13/cobra"
 	"github.com/spf13/viper"
 	"github.com/vmware-tanzu/sonobuoy/cmd/sonobuoy/app"
 
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/adm"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/get"
+	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/report"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/destroy"
-	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/report"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/retrieve"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/run"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/status"
 	"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/version"
 )
 
+const logFile = "opct.log"
+
 // rootCmd represents the base command when called without any subcommands
 var rootCmd = &cobra.Command{
 	Use:   "opct",
@@ -40,6 +43,24 @@ var rootCmd = &cobra.Command{
 		log.SetFormatter(&log.TextFormatter{
 			FullTimestamp: true,
 		})
+
+		log.SetOutput(os.Stdout)
+		fdLog, err := os.OpenFile(logFile, os.O_WRONLY|os.O_APPEND|os.O_CREATE, 0644)
+		if err != nil {
+			log.Errorf("error opening file %s: %v", logFile, err)
+		} else {
+			log.AddHook(&logwriter.Hook{ // Send logs with level higher than warning to stderr
+				Writer: fdLog,
+				LogLevels: []log.Level{
+					log.PanicLevel,
+					log.FatalLevel,
+					log.ErrorLevel,
+					log.WarnLevel,
+					log.InfoLevel,
+					log.DebugLevel,
+				},
+			})
+		}
 	},
 }
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,7 @@ kubeconfig @@
     # build files
     dist/
+    build/
     # changelog is generated automaticaly by hack/generate-changelog.sh
     # available only in the rendered webpage (built by mkdocs).
@@ Expand Down @@