Skip to content

Commit

Permalink
OPCT-226: cmd/report UX enhancements (#76)
Browse files Browse the repository at this point in the history
The `opct report` provides several improvements in the UX while
reviewing the conformance results archive using OPCT, such as:
- creating an intuitive HTML report allowing users to explore quickly
  issues and navigate to the logs for each test failure
- introduce several gates/SLO/checks to be used as post-processors and
  get better visibility in the results, based on existing knowledge
base/CI data or external systems
- providing a better CLI UI exploring results

## Changes (overview)

### Improvements

-  Documentation updates for review guides
- `report`: now the counters are displaying the percentage of it (when
compared with the total). So it can quickly have one idea of failures -
in general, we expect lower than 1% of failures in a regular
installation for OpenShift Conformance
~~~
==> Result Summary by test suite:
┌───────────────────────────────────────────┐
│ 05-openshift-cluster-upgrade: ✅          │
├───────────────────────────┬───────────────┤
│ Total tests               │ 1             │
│ Passed                    │ 0             │
│ Failed                    │ 0             │
│ Timeout                   │ 0             │
│ Skipped                   │ 1             │
│ Result Job                │ passed        │
└───────────────────────────┴───────────────┘
┌───────────────────────────────────────────┐
│ 10-openshift-kube-conformance: ✅         │
├───────────────────────────┬───────────────┤
│ Total tests               │ 399           │
│ Passed                    │ 399           │
│ Failed                    │ 0             │
│ Timeout                   │ 0             │
│ Skipped                   │ 0             │
│ Filter Failed Suite       │ 0 (0.00%)     │
│ Filter Failed KF          │ 0 (0.00%)     │
│ Filter Replay             │ 0 (0.00%)     │
│ Filter Failed Baseline    │ 0 (0.00%)     │
│ Filter Failed Priority    │ 0 (0.00%)     │
│ Filter Failed API         │ 0 (0.00%)     │
│ Failures (Priotity)       │ 0 (0.00%)     │
│ Result - Job              │ passed        │
│ Result - Processed        │ passed        │
└───────────────────────────┴───────────────┘
┌───────────────────────────────────────────┐
│ 20-openshift-conformance-validated: ❌    │
├───────────────────────────┬───────────────┤
│ Total tests               │ 3783          │
│ Passed                    │ 1574          │
│ Failed                    │ 16            │
│ Timeout                   │ 0             │
│ Skipped                   │ 2193          │
│ Filter Failed Suite       │ 14 (0.37%)    │
│ Filter Failed KF          │ 14 (0.37%)    │
│ Filter Replay             │ 13 (0.34%)    │
│ Filter Failed Baseline    │ 13 (0.34%)    │
│ Filter Failed Priority    │ 13 (0.34%)    │
│ Filter Failed API         │ 1 (0.03%)     │
│ Failures (Priotity)       │ 1 (0.03%)     │
│ Result - Job              │ failed        │
│ Result - Processed        │ failed        │
└───────────────────────────┴───────────────┘

~~~
- `report`: a headline with grouped (by test tags) occurrences is
displayed before each failed list for both conformance plugins
~~~
 => 10-openshift-kube-conformance: (36 failures, 28 flakes)
 --> Failed tests to Review (without flakes) - Immediate action:
[total=8] [sig-apps=2 (25.00%)] [sig-cli=2 (25.00%)] [sig-node=2
(25.00%)] [sig-api-machinery=1 (12.50%)]
[sig-arch=1 (12.50%)]
[...]
--> Failed flake tests - Statistic from OpenShift CI
[total=28] [sig-api-machinery=12 (42.86%)] [sig-node=4 (14.29%)]
[sig-network-edge=4 (14.29%)] [sig-trt=3 (10.71%)]
[sig-architecture=3 (10.71%)] [bz-OLM=1 (3.57%)] [sig-arch=1 (3.57%)]
[...]
 => 20-openshift-conformance-validated: (102 failures, 43 flakes)

 --> Failed tests to Review (without flakes) - Immediate action:
[total=59] [sig-builds=15 (25.42%)] [sig-apps=10 (16.95%)] [sig-cli=7
(11.86%)] [sig-imageregistry=7 (11.86%)]
[sig-auth=6 (10.17%)] [sig-network=6 (10.17%)] [sig-arch=2 (3.39%)]
[sig-devex=1 (1.69%)]
[sig-instrumentation=1 (1.69%)] [sig-api-machinery=1 (1.69%)]
[sig-scheduling=1 (1.69%)] [sig-node=1 (1.69%)]
[bz-Unknown=1 (1.69%)]
[...]
 --> Failed flake tests - Statistic from OpenShift CI
[total=43] [sig-api-machinery=12 (27.91%)] [sig-node=6 (13.95%)]
[sig-arch=5 (11.63%)] [sig-network-edge=4 (9.30%)]
[sig-trt=3 (6.98%)] [sig-instrumentation=3 (6.98%)] [sig-network=2
(4.65%)] [sig-architecture=2 (4.65%)]
[sig-autoscaling=1 (2.33%)] [bz-OLM=1 (2.33%)] [bz-Routing=1 (2.33%)]
[bz-Unknown=1 (2.33%)]
[bz-kube-apiserver=1 (2.33%)] [bz-DNS=1 (2.33%)]
[...]
~~~

- The final results binary [pass/fail] message (after filters) has been
removed from the openshift conformance plugin. This field is temporarily
removed to prevent mistakes when some issues happen in the filter
pipeline, and also to focus on the failures, not on the binary value,
considering that the goal is to zero-ed the failed tests.
- `run --watch --watch-interval`: Status can set custom watch interval
to decrease the number of logs in CI
- `report --diff`: created as an alias of `--baseline`, setting it as
deprecated
- `report` is now showing the "Checks" section, applying some rules to
help the reviewer where to focus, and prevent submitting results to the
partner support case prematurely.

### Error counters

- `report`/Plugins: search for failures by pattern in the plugin logs
and aggregate it as "Suite Errors"
- `report`/Must-gather: search for failures by pattern in the pod logs
(must-gather), aggregating it as "Workload Errors"

### HTML report

![Screenshot from 2023-08-08
16-39-24](https://github.com/redhat-openshift-ecosystem/provider-certification-tool/assets/3216894/6b6bfbe6-58d6-4cbf-ae61-c75ab83b8ca7)

- `report`: extracts important information, processes it, and saves it
to a local directory to be served by a local web server, allowing one to
quickly navigate to the failures for each test using a browser.
- `report`: rank by error count
- `report` HTML: scrapes the test documentation for [Kuebrnetes
Conformance](https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.26.md#custom-resource-openapi-publish-stop-serving-version),
shows the link for each item when it is available in the Kubernetes
suites
- `report` menu "Workload errors": has been added showing information
about the pod logs counters
- `report` menu "Suite errors": has been added showing information about
the pod logs counters
- `report` menu "Checks": has been added showing information about the
result checklist
- `report` tab "CAMGI": redirects to CAMGI static HTML page extracted
from must-gather, when it is present, otherwise shows how to use CAMGI
when it is not processed by the plugin
- `report` tab "Filter": redirects to a static HTML page with all tests
allowing the user to explore the test details
- `report` tab "Events": redirects to a static HTML page with events
created by Must-gather (extracted in the runtime)

Many other features in the WebUI.

#### Plugin Runtime

- `status`: replace the message `waiting for post-processor...` to
`complete` when the pod is finished.

~~~
# from
Sat, 15 Jul 2023 00:39:31 -03> Global Status: running
JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE
05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting
for post-processor...
10-openshift-kube-conformance | complete | | 5/377 (0 failures) |
waiting for post-processor...
20-openshift-conformance-validated | running | | 0/3684 (0 failures) |
status=waiting-for=10-openshift-kube-conformance=(0/-372/0)=[3/1080]
99-openshift-artifacts-collector | running | | 0/0 (0 failures) |
status=blocked-by=20-openshift-conformance-validated=(0/-3684/0)=[0/1080]

# to
Sat, 15 Jul 2023 01:17:58 -03> Global Status: running
JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE
05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | complete
10-openshift-kube-conformance | complete | | 5/377 (0 failures) |
complete
20-openshift-conformance-validated | running | | 5/3684 (0 failures) |
status=running=T/C/P/F/S=3684/5/3/0/2
99-openshift-artifacts-collector | running | | 0/0 (0 failures) |
status=waiting-for=20-openshift-conformance-validated=(0/-3679/0)=[8/1080]
~~~

- Supporting new openshift-tests plugin re-arch / refact: the main
plugin/step orchestrating the conformance workflow (aka
`openshift-tests-plugin`) has been refactored to Golang, especially
targeting:
  - A) unblock complex/parallel operations in the plugin runtime;
- B) delegate conformance executions to the `tests` image as a sidecar
container, preventing plugin development when new dependencies are
required by `openshift-tests`
- C) [drastically] decrease the amount of failures impacting the test
results related to the OPCT test environment
- D) allow JUnit test processor before submitting to aggregator server.
- Supporting plugin `Replay`

### Bug fixes

- `report`: Flake tests now is querying to correct OCP version on Sippy
API
- `report`: Flake Filter report prevents duplicated tests from immediate
action
- `report`: Flake Filter report does not show items reporting than 5% of
flake count in OCP CI (by Sippy)
~~~
Flakes	Perc		 TestName
288 23.782% [bz-DNS][invariant] alert/KubePodNotReady should not be at
or above pending in ns/openshift-dns
967 79.851% [bz-OLM][invariant] alert/KubePodNotReady should not be at
or above pending in ns/openshift-marketplace
[...]
~~~
- Filter pipeline is not breaking when the suite list is empty
(collected by artifact collector plugin). Example:
~~~
 Total tests by conformance suites:
 - kubernetes/conformance: 0 
 - openshift/conformance: 0 
~~~

## Done checklist

- [x] create documentation for `report`
- [x] create documentation for the `report` checklist section
- [x] review tests
- [x] review feature description
- [x] Create a dedicated Jira Epic/cards for known issues

### Documentation checklist

- [x] Review Rules

#77

#80
- [x] Command documentation `report`:
#113
- [x] Command documentation `adm baseline *`:
#113
- [x] Guide to 'Deep Dive' in the `report` options:
#113
  • Loading branch information
mtulio authored Aug 8, 2024
1 parent e0e2687 commit f321a27
Show file tree
Hide file tree
Showing 82 changed files with 9,096 additions and 2,062 deletions.
42 changes: 41 additions & 1 deletion .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,16 @@ jobs:
echo "> Setting run permissions to OPCT:"
chmod u+x ${OPCT}
echo "> Running OPCT report:"
echo "> Running OPCT report (simple):"
${OPCT} report /tmp/result.tar.gz
echo "> Running OPCT report (advanced):"
${OPCT} report /tmp/result.tar.gz \
--log-level=debug \
--save-to=/tmp/results-data \
--skip-server=true \
--skip-baseline-api=true
e2e-cmd_adm-parse-etcd-logs:
name: "e2e-cmd_adm-parse-etcd-logs"
runs-on: ubuntu-latest
Expand Down Expand Up @@ -146,3 +153,36 @@ jobs:
${CUSTOM_BUILD_PATH} adm parse-metrics \
--input ${LOCAL_TEST_DATA} --output /tmp/metrics
tree /tmp/metrics
e2e-cmd_adm-baseline:
name: "e2e-cmd_adm-baseline"
runs-on: ubuntu-latest
steps:
- name: Download artifacts
uses: actions/download-artifact@v4
with:
name: opct-linux-amd64
path: /tmp/build/

- name: Preparing testdata
env:
OPCT: /tmp/build/opct-linux-amd64
run: |
echo "> Setting exec permissions to OPCT:"
chmod u+x ${OPCT}
- name: "e2e adm baseline: opct adm baseline (list|get)"
env:
OPCT: /tmp/build/opct-linux-amd64
run: |
echo -e "\n\t#>> List latest baseline results"
${OPCT} adm baseline list
echo -e "\n\t#>> List all baseline results"
${OPCT} adm baseline list --all
echo -e "\n\t#>> Retrieve a baseline result by name"
${OPCT} adm baseline get --name 4.16_None_latest --dump
echo -e "\n\t#>> Retrieve a baseline result by release and platform"
${OPCT} adm baseline get --release 4.15 --platform None
8 changes: 7 additions & 1 deletion .github/workflows/go.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,15 @@ jobs:
name: linters
uses: ./.github/workflows/pre_linters.yaml

reviewer:
name: reviewer
uses: ./.github/workflows/pre_reviewer.yaml

go-test:
runs-on: ubuntu-latest
needs: linters
needs:
- linters
- reviewer
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
Expand Down
51 changes: 51 additions & 0 deletions .github/workflows/pre_reviewer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
name: reviewer

on:
workflow_call: {}

# golangci-lint-action requires those permissions to annotate issues in the PR.
permissions:
contents: read
checks: write
issues: read
pull-requests: write

env:
GO_VERSION: 1.22
GOLANGCI_LINT_VERSION: v1.59

jobs:
# reviewdog / misspell: https://github.com/reviewdog/action-misspell
misspell:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: reviewdog/action-misspell@v1
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review
#level: warning
locale: "US"

# reviewdog / suggester: https://github.com/reviewdog/action-suggester
go_fmt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: gofmt -w -s .
- uses: reviewdog/action-suggester@v1
with:
tool_name: gofmt

# https://github.com/reviewdog/action-hadolint
# containerfile:
# name: runner / hadolint
# runs-on: ubuntu-latest
# steps:
# - name: Check out code
# uses: actions/checkout@v4
# - name: hadolint
# uses: reviewdog/action-hadolint@v1
# with:
# reporter: github-pr-review
8 changes: 6 additions & 2 deletions .github/workflows/static-website.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ name: Documentation
on:
# Static pages are build only targeting the main branch
push:
branches: ["main"]
paths: ['mkdocs.yml', 'docs/**', 'hack/docs-requirements.txt']
branches:
- "main"
paths:
- 'mkdocs.yml'
- 'docs/**'
- 'hack/docs-requirements.txt'

workflow_dispatch:

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ kubeconfig

# build files
dist/
build/

# changelog is generated automaticaly by hack/generate-changelog.sh
# available only in the rendered webpage (built by mkdocs).
Expand Down
43 changes: 34 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ export GO111MODULE=on
export CGO_ENABLED=0

BUILD_DIR ?= $(PWD)/build
IMG ?= quay.io/ocp-cert/opct
IMG ?= quay.io/opct/opct
VERSION=$(shell git rev-parse --short HEAD)
RELEASE_TAG ?= 0.0.0
BIN_NAME ?= opct
Expand Down Expand Up @@ -57,15 +57,30 @@ build-darwin-arm64: build
linux-amd64-container: build-linux-amd64
podman build -t $(IMG):latest -f hack/Containerfile --build-arg=RELEASE_TAG=$(RELEASE_TAG) .

.PHONY: image-mirror-sonobuoy
image-mirror-sonobuoy:
./hack/image-mirror-sonobuoy/mirror.sh
# Publish devel binaries (non-official). Must be used only for troubleshooting in development/support.
.PHONY: publish-amd64-devel
publish-amd64-devel: build-linux-amd64
aws s3 cp $(BUILD_DIR)/opct-linux-amd64 s3://openshift-provider-certification/bin/opct-linux-amd64-devel

# Utils dev
.PHONY: update-go
update-go:
go get -u
go mod tidy
.PHONY: publish-darwin-arm64-devel
publish-darwin-arm64-devel: build-darwin-arm64
aws s3 cp $(BUILD_DIR)/opct-darwin-arm64 s3://openshift-provider-certification/bin/opct-darwin-arm64-devel

.PHONY: publish-devel
publish-devel: publish-amd64-devel
publish-devel: publish-darwin-arm64-devel

#
# Test
#

.PHONY: test-lint
test-lint:
@echo "Running linting tools"
# Download https://github.com/golangci/golangci-lint/releases/tag/v1.59.1
golangci-lint run --timeout=10m
# yamllint: pip install yamllint
yamllint .github/workflows/*.yaml

.PHONY: test
test:
Expand All @@ -90,3 +105,13 @@ build-changelog:
.PHONY: build-docs
build-docs: build-changelog
mkdocs build --site-dir ./site

.PHONY: image-mirror-sonobuoy
image-mirror-sonobuoy:
./hack/image-mirror-sonobuoy/mirror.sh

# Utils dev
.PHONY: update-go
update-go:
go get -u
go mod tidy
23 changes: 22 additions & 1 deletion cmd/root.go → cmd/opct/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,24 @@ import (
"os"

log "github.com/sirupsen/logrus"
logwriter "github.com/sirupsen/logrus/hooks/writer"

"github.com/spf13/cobra"
"github.com/spf13/viper"
"github.com/vmware-tanzu/sonobuoy/cmd/sonobuoy/app"

"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/adm"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/get"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/cmd/report"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/destroy"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/report"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/retrieve"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/run"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/status"
"github.com/redhat-openshift-ecosystem/provider-certification-tool/pkg/version"
)

const logFile = "opct.log"

// rootCmd represents the base command when called without any subcommands
var rootCmd = &cobra.Command{
Use: "opct",
Expand All @@ -40,6 +43,24 @@ var rootCmd = &cobra.Command{
log.SetFormatter(&log.TextFormatter{
FullTimestamp: true,
})

log.SetOutput(os.Stdout)
fdLog, err := os.OpenFile(logFile, os.O_WRONLY|os.O_APPEND|os.O_CREATE, 0644)
if err != nil {
log.Errorf("error opening file %s: %v", logFile, err)
} else {
log.AddHook(&logwriter.Hook{ // Send logs with level higher than warning to stderr
Writer: fdLog,
LogLevels: []log.Level{
log.PanicLevel,
log.FatalLevel,
log.ErrorLevel,
log.WarnLevel,
log.InfoLevel,
log.DebugLevel,
},
})
}
},
}

Expand Down
Loading

0 comments on commit f321a27

Please sign in to comment.