Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics 4/x] Metrics exporter rules #732

Merged

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Jul 10, 2024

PrometheusRules allow recording pre-defined queries. Use
sriov_kubepoddevice metric to add pod|namespace pair
to the sriov metrics.

Here is an example of the raw exported metrics:

sriov_kubepoddevice{container="testpmd",dev_type="openshift.io/inteldpdk",namespace="cnf-4916",pciAddr="0000:17:01.4",pod="dpdk-intel-client"} 1

sriov_vf_rx_broadcast{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_rx_bytes{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 7.83952134e+08
sriov_vf_rx_dropped{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_rx_multicast{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_rx_packets{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_tx_bytes{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0
sriov_vf_tx_dropped{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 1.0926018e+07
sriov_vf_tx_packets{numa_node="0",pciAddr="0000:17:01.4",pf="ens2f0",vf="4"} 0

Proposed prometheus rules allow to query the following new metrics:

  • network:sriov_vf_tx_packets
  • network:sriov_vf_rx_packets
  • network:sriov_vf_tx_bytes
  • network:sriov_vf_rx_bytes
  • network:sriov_vf_tx_dropped
  • network:sriov_vf_rx_dropped
  • network:sriov_vf_rx_broadcast
  • network:sriov_vf_rx_multicast

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch from e1551a6 to 86719de Compare July 10, 2024 15:53
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch from 86719de to 0069e5e Compare July 11, 2024 08:00
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch from 0069e5e to fed3f8c Compare July 11, 2024 09:58
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch from fed3f8c to 5ba65f8 Compare July 15, 2024 10:59
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch from 5ba65f8 to 5b085ea Compare July 15, 2024 12:16
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-rules branch 4 times, most recently from 3eced4d to 7b25cc6 Compare July 19, 2024 13:58
@zeeke zeeke force-pushed the metrics-exporter-rules branch from 7b25cc6 to f004d91 Compare August 5, 2024 11:16
@zeeke zeeke marked this pull request as ready for review August 5, 2024 11:17
@coveralls
Copy link

coveralls commented Aug 5, 2024

Pull Request Test Coverage Report for Build 10903994186

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.004%) to 45.052%

Totals Coverage Status
Change from base Build 10903544901: 0.004%
Covered Lines: 6628
Relevant Lines: 14712

💛 - Coveralls

@zeeke zeeke force-pushed the metrics-exporter-rules branch 3 times, most recently from d8ce5ef to 9493e50 Compare August 20, 2024 11:36
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -29,6 +29,7 @@ rules:
- monitoring.coreos.com
resources:
- servicemonitors
- prometheusrules
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support deletion of prometheus objects ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For when sno is redeployed with prometheus disabled (e.g in helm chart)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. we have to support object deletion as well. I handle it in

@zeeke
Copy link
Member Author

zeeke commented Sep 12, 2024

@adrianchiris can we move this forward?

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PrometheusRules allow recording pre-defined queries. Use
`sriov_kubepoddevice` metric to add `pod|namespace` pair
to the sriov metrics.

Feature is enabled via the `METRICS_EXPORTER_PROMETHEUS_DEPLOY_RULE`
environment variable.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
When the `metricsExporter` feature is turned off, deployed resources
should be removed. These changes fix the error:

```
│ 2024-08-28T14:07:57.699760017Z    ERROR    controller/controller.go:266    Reconciler error    {"controller": "sriovoperatorconfig", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovOperatorConfig", "SriovOperatorConfig": {"name":"default","namespace":"openshift-sriov-network-operator"},  │
│ "namespace": "openshift-sriov-network-operator", "name": "default", "reconcileID": "fa841c50-dbb8-4c4c-9ddd-b98624fd2a24", "error": "failed to delete object &{map[apiVersion:monitoring.coreos.com/v1 kind:ServiceMonitor metadata:map[name:sriov-network-metrics-exporter namespace:openshift-sriov-network-operator]  │
│ spec:map[endpoints:[map[bearerTokenFile:/var/run/secrets/kubernetes.io/serviceaccount/token honorLabels:true interval:30s port:sriov-network-metrics scheme:https tlsConfig:map[caFile:/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt insecureSkipVerify:false serverName:sriov-network-metrics-expor │
│ ter-service.openshift-sriov-network-operator.svc]]] namespaceSelector:map[matchNames:[openshift-sriov-network-operator]] selector:map[matchLabels:map[name:sriov-network-metrics-exporter-service]]]]} with err: could not delete object (monitoring.coreos.com/v1, Kind=ServiceMonitor) openshift-sriov-network-operato │
│ r/sriov-network-metrics-exporter: servicemonitors.monitoring.coreos.com \"sriov-network-metrics-exporter\" is forbidden: User \"system:serviceaccount:openshift-sriov-network-operator:sriov-network-operator\" cannot delete resource \"servicemonitors\" in API group \"monitoring.coreos.com\" in the namespace \"ope │
│ nshift-sriov-network-operator\""}
```

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke zeeke force-pushed the metrics-exporter-rules branch from 504ce7d to b49cf15 Compare September 17, 2024 13:23
@adrianchiris adrianchiris merged commit aecb4bb into k8snetworkplumbingwg:master Sep 19, 2024
13 checks passed
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Sep 20, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Sep 20, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Oct 11, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
zeeke added a commit to zeeke/sriov-network-operator that referenced this pull request Dec 10, 2024
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/sriov-network-operator that referenced this pull request Feb 4, 2025
Make the operator creating PrometheusRules to browse
metrics in the Developer Console.

refs:
- k8snetworkplumbingwg/sriov-network-operator#732

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants