Inconsistently-collected metrics #11950
Unanswered
zack-littke-smith-ai
asked this question in
Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there! Newish Linkerd user here, so forgive me if these questions are common knowledge. When investigating the collected values of some core metrics, I became somewhat suspicious of them (I don't want to be suspicious of my metrics!). Let me be more specific:
response_total
androute_response_total
collectstatus_code
(which is strange because this is a gRPC service only) but only sometimes collectgrpc_status_code
, even though they seem to collectgrpc_status
response_total
collect 5XX connection errors whileroute_response_total
does not?status_code: 200
when they also reportclassification: failure
? Which of these should I trust?route_actual_response_total
vsroute_response_total
?)route_actual_response_total
hilariously only has three results across the entire internet. What does "Total count of actual route HTTP responses." mean??These issues are with a bread-and-butter metric that I'd really prefer to trust, and in light of these issues I am considering instrumenting our services instead of using linkerd's metrics. Am I being too rash?
Beta Was this translation helpful? Give feedback.
All reactions