Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metrics. #186

Merged
merged 1 commit into from
Sep 12, 2024
Merged

add metrics. #186

merged 1 commit into from
Sep 12, 2024

Conversation

morvencao
Copy link
Contributor

@morvencao morvencao commented Sep 2, 2024

@morvencao morvencao force-pushed the br_add_metrics branch 4 times, most recently from d33e9a2 to 755fae3 Compare September 3, 2024 12:23
@morvencao
Copy link
Contributor Author

/hold Wait for the merge of open-cluster-management-io/sdk-go#76

@morvencao
Copy link
Contributor Author

/assign @clyang82

@clyang82
Copy link
Contributor

clyang82 commented Sep 5, 2024

could you append the results for metrics?

@morvencao
Copy link
Contributor Author

morvencao commented Sep 5, 2024

Metrics after running e2e testing:

# HELP advisory_lock_count Number of advisory lock requests.
# TYPE advisory_lock_count counter
advisory_lock_count{status="OK",type="events"} 16
advisory_lock_count{status="OK",type="instances"} 22
advisory_lock_count{status="OK",type="resource_status"} 38
advisory_lock_count{status="OK",type="resources"} 10
# HELP advisory_lock_duration Advisory Lock durations in seconds.
# TYPE advisory_lock_duration histogram
advisory_lock_duration_bucket{status="OK",type="events",le="0.1"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="0.2"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="0.5"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="1"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="2"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="10"} 16
advisory_lock_duration_bucket{status="OK",type="events",le="+Inf"} 16
advisory_lock_duration_sum{status="OK",type="events"} 0.07476534900000001
advisory_lock_duration_count{status="OK",type="events"} 16
advisory_lock_duration_bucket{status="OK",type="instances",le="0.1"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="0.2"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="0.5"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="1"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="2"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="10"} 22
advisory_lock_duration_bucket{status="OK",type="instances",le="+Inf"} 22
advisory_lock_duration_sum{status="OK",type="instances"} 0.032695262999999995
advisory_lock_duration_count{status="OK",type="instances"} 22
advisory_lock_duration_bucket{status="OK",type="resource_status",le="0.1"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="0.2"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="0.5"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="1"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="2"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="10"} 38
advisory_lock_duration_bucket{status="OK",type="resource_status",le="+Inf"} 38
advisory_lock_duration_sum{status="OK",type="resource_status"} 0.40729072899999996
advisory_lock_duration_count{status="OK",type="resource_status"} 38
advisory_lock_duration_bucket{status="OK",type="resources",le="0.1"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="0.2"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="0.5"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="1"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="2"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="10"} 10
advisory_lock_duration_bucket{status="OK",type="resources",le="+Inf"} 10
advisory_lock_duration_sum{status="OK",type="resources"} 0.08142236900000001
advisory_lock_duration_count{status="OK",type="resources"} 10
# HELP grpc_server_called_total Total number of RPCs called on the server.
# TYPE grpc_server_called_total counter
grpc_server_called_total{source="sourceclient-testfpnxv",type="Publish"} 6
grpc_server_called_total{source="sourceclient-testfpnxv",type="Subscribe"} 3
# HELP grpc_server_message_received_total Total number of messages received on the server from agent and client.
# TYPE grpc_server_message_received_total counter
grpc_server_message_received_total{source="sourceclient-testfpnxv",type="Publish"} 6
grpc_server_message_received_total{source="sourceclient-testfpnxv",type="Subscribe"} 3
# HELP grpc_server_message_sent_total Total number of messages sent by the server to agent and client.
# TYPE grpc_server_message_sent_total counter
grpc_server_message_sent_total{source="sourceclient-testfpnxv",type="Publish"} 6
grpc_server_message_sent_total{source="sourceclient-testfpnxv",type="Subscribe"} 30
# HELP grpc_server_processed_duration_seconds Histogram of the duration of RPCs processed on the server.
# TYPE grpc_server_processed_duration_seconds histogram
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.005"} 0
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.01"} 5
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.025"} 5
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.05"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.1"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.25"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="0.5"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="1"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="2.5"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="5"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="10"} 6
grpc_server_processed_duration_seconds_bucket{source="sourceclient-testfpnxv",type="Publish",le="+Inf"} 6
grpc_server_processed_duration_seconds_sum{source="sourceclient-testfpnxv",type="Publish"} 0.075799143
grpc_server_processed_duration_seconds_count{source="sourceclient-testfpnxv",type="Publish"} 6
# HELP grpc_server_processed_total Total number of RPCs processed on the server, regardless of success or failure.
# TYPE grpc_server_processed_total counter
grpc_server_processed_total{code="OK",source="sourceclient-testfpnxv",type="Publish"} 6
grpc_server_processed_total{code="OK",source="sourceclient-testfpnxv",type="Subscribe"} 3
# HELP resource_processed_total Number of processed resources.
# TYPE resource_processed_total counter
resource_processed_total{action="update",id="01dbff20-f86d-41dd-b972-ea9bbfdef7f0"} 4
resource_processed_total{action="update",id="240b3897-d620-4749-8ad9-eb0a0e631ef6"} 4
resource_processed_total{action="update",id="3a3d6f1f-75f2-4169-be70-c9ef90bbf743"} 6
resource_processed_total{action="update",id="4628bd82-5a8d-457a-bcc3-ade46f54a690"} 2
resource_processed_total{action="update",id="629a9314-1670-410d-9dde-a3dada2667df"} 12
resource_processed_total{action="update",id="8b1cc8aa-5ad1-4aba-b6cd-5ca90b0a0444"} 3
resource_processed_total{action="update",id="d4d3b89c-e11e-4a51-a9c0-e3da68955c7f"} 6
# HELP rest_api_inbound_request_count Number of requests served.
# TYPE rest_api_inbound_request_count counter
rest_api_inbound_request_count{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-"} 2
rest_api_inbound_request_count{code="200",method="GET",path="/api/maestro/v1/resources/-"} 8
rest_api_inbound_request_count{code="200",method="PATCH",path="/api/maestro/v1/resources/-"} 1
rest_api_inbound_request_count{code="201",method="POST",path="/api/maestro/v1/resources"} 4
rest_api_inbound_request_count{code="204",method="DELETE",path="/api/maestro/v1/resources/-"} 5
rest_api_inbound_request_count{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-"} 1
rest_api_inbound_request_count{code="404",method="GET",path="/api/maestro/v1/resources/-"} 1
# HELP rest_api_inbound_request_duration Request duration in seconds.
# TYPE rest_api_inbound_request_duration histogram
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-",le="0.1"} 2
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-",le="1"} 2
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-",le="10"} 2
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-",le="30"} 2
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-",le="+Inf"} 2
rest_api_inbound_request_duration_sum{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-"} 0.004490571
rest_api_inbound_request_duration_count{code="200",method="GET",path="/api/maestro/v1/resource-bundles/-"} 2
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resources/-",le="0.1"} 8
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resources/-",le="1"} 8
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resources/-",le="10"} 8
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resources/-",le="30"} 8
rest_api_inbound_request_duration_bucket{code="200",method="GET",path="/api/maestro/v1/resources/-",le="+Inf"} 8
rest_api_inbound_request_duration_sum{code="200",method="GET",path="/api/maestro/v1/resources/-"} 0.021776432999999998
rest_api_inbound_request_duration_count{code="200",method="GET",path="/api/maestro/v1/resources/-"} 8
rest_api_inbound_request_duration_bucket{code="200",method="PATCH",path="/api/maestro/v1/resources/-",le="0.1"} 1
rest_api_inbound_request_duration_bucket{code="200",method="PATCH",path="/api/maestro/v1/resources/-",le="1"} 1
rest_api_inbound_request_duration_bucket{code="200",method="PATCH",path="/api/maestro/v1/resources/-",le="10"} 1
rest_api_inbound_request_duration_bucket{code="200",method="PATCH",path="/api/maestro/v1/resources/-",le="30"} 1
rest_api_inbound_request_duration_bucket{code="200",method="PATCH",path="/api/maestro/v1/resources/-",le="+Inf"} 1
rest_api_inbound_request_duration_sum{code="200",method="PATCH",path="/api/maestro/v1/resources/-"} 0.013018656
rest_api_inbound_request_duration_count{code="200",method="PATCH",path="/api/maestro/v1/resources/-"} 1
rest_api_inbound_request_duration_bucket{code="201",method="POST",path="/api/maestro/v1/resources",le="0.1"} 4
rest_api_inbound_request_duration_bucket{code="201",method="POST",path="/api/maestro/v1/resources",le="1"} 4
rest_api_inbound_request_duration_bucket{code="201",method="POST",path="/api/maestro/v1/resources",le="10"} 4
rest_api_inbound_request_duration_bucket{code="201",method="POST",path="/api/maestro/v1/resources",le="30"} 4
rest_api_inbound_request_duration_bucket{code="201",method="POST",path="/api/maestro/v1/resources",le="+Inf"} 4
rest_api_inbound_request_duration_sum{code="201",method="POST",path="/api/maestro/v1/resources"} 0.03167025
rest_api_inbound_request_duration_count{code="201",method="POST",path="/api/maestro/v1/resources"} 4
rest_api_inbound_request_duration_bucket{code="204",method="DELETE",path="/api/maestro/v1/resources/-",le="0.1"} 5
rest_api_inbound_request_duration_bucket{code="204",method="DELETE",path="/api/maestro/v1/resources/-",le="1"} 5
rest_api_inbound_request_duration_bucket{code="204",method="DELETE",path="/api/maestro/v1/resources/-",le="10"} 5
rest_api_inbound_request_duration_bucket{code="204",method="DELETE",path="/api/maestro/v1/resources/-",le="30"} 5
rest_api_inbound_request_duration_bucket{code="204",method="DELETE",path="/api/maestro/v1/resources/-",le="+Inf"} 5
rest_api_inbound_request_duration_sum{code="204",method="DELETE",path="/api/maestro/v1/resources/-"} 0.057168279
rest_api_inbound_request_duration_count{code="204",method="DELETE",path="/api/maestro/v1/resources/-"} 5
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-",le="0.1"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-",le="1"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-",le="10"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-",le="30"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-",le="+Inf"} 1
rest_api_inbound_request_duration_sum{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-"} 0.001403407
rest_api_inbound_request_duration_count{code="404",method="GET",path="/api/maestro/v1/resource-bundles/-"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resources/-",le="0.1"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resources/-",le="1"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resources/-",le="10"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resources/-",le="30"} 1
rest_api_inbound_request_duration_bucket{code="404",method="GET",path="/api/maestro/v1/resources/-",le="+Inf"} 1
rest_api_inbound_request_duration_sum{code="404",method="GET",path="/api/maestro/v1/resources/-"} 0.001669214
rest_api_inbound_request_duration_count{code="404",method="GET",path="/api/maestro/v1/resources/-"} 1
# HELP resources_spec_resync_duration_seconds The duration of the resource spec resync in seconds.
# TYPE resources_spec_resync_duration_seconds histogram
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="0.1"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="0.2"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="0.5"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="1"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="2"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="10"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="30"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles",le="+Inf"} 1
resources_spec_resync_duration_seconds_sum{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles"} 0.000883473
resources_spec_resync_duration_seconds_count{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifestbundles"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="0.1"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="0.2"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="0.5"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="1"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="2"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="10"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="30"} 1
resources_spec_resync_duration_seconds_bucket{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests",le="+Inf"} 1
resources_spec_resync_duration_seconds_sum{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests"} 0.001288316
resources_spec_resync_duration_seconds_count{cluster="32326ea5-77aa-487a-b7c5-916e7862571e",source="maestro",type="io.open-cluster-management.works.v1alpha1.manifests"} 1

@clyang82
Copy link
Contributor

clyang82 commented Sep 6, 2024

/ok-to-test

@morvencao morvencao force-pushed the br_add_metrics branch 4 times, most recently from ec38997 to 59b13af Compare September 9, 2024 08:06
Signed-off-by: morvencao <lcao@redhat.com>
Copy link
Contributor

@clyang82 clyang82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clyang82 clyang82 merged commit 22436c2 into openshift-online:main Sep 12, 2024
7 checks passed
@morvencao morvencao deleted the br_add_metrics branch September 12, 2024 04:41
@@ -142,6 +148,15 @@ func (s *sqlResourceService) Update(ctx context.Context, resource *api.Resource)
return nil, handleUpdateError("Resource", err)
}

// Create the set of labels that we will add to all the resource process:
labels := prometheus.Labels{
metricsIDLabel: updated.ID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the id field is the resource ID right?
if yes, such a label will lead to cardinality explosion. consider removing the label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good advice! we’re keeping this for two reasons: it helps to diagnose the frequent updates by a single resource, and we don’t yet have enough resources to cause cardinality issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants