Skip to content

Commit

Permalink
Add experiment data
Browse files Browse the repository at this point in the history
Signed-off-by: Richard Wall <richard.wall@venafi.com>
  • Loading branch information
wallrj committed Apr 11, 2024
1 parent 8d78e39 commit bbffca0
Show file tree
Hide file tree
Showing 30 changed files with 1,808 additions and 12 deletions.
33 changes: 21 additions & 12 deletions content/docs/devops-tips/large-clusters.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,15 @@ It takes 34 minutes to reconcile all 2000 Certificate resources.
In the cert-manager controller logs you will see messages such as:
> `I0409 12:42:48.601911 1 request.go:697] Waited for 1.241596263s due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/cert-manager.io/v1/namespaces/team-mplnf/certificates/app-1/status`

<img src="/docs/devops-tips/large-clusters/default-cpu-1.png" alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with default cert-manager configuration" />
<img src="/docs/devops-tips/large-clusters/experiment.2024-04-07-1/Screenshot 2024-04-09 121627.png"
alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with default cert-manager configuration" />

After disabling client-side rate-limiting and repeating the experiment, the CPU use is much more uniform.
cert-manager is now free to make API requests as rapidly as the API server will allow and makes use of all available CPU time.
It takes 22 minutes to reconcile all 2000 Certificate resources.
<img src="/docs/devops-tips/large-clusters/default-cpu-2.png" alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with client-side rate-limiting disabled" />

<img src="/docs/devops-tips/large-clusters/experiment.2024-04-07-4/Screenshot 2024-04-09 130635.png"
alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with client-side rate-limiting disabled" />

## Prefer ECDSA keys over RSA for performance

Expand All @@ -68,7 +71,8 @@ Note that the CPU usage is significantly lower than with the RSA 4096 experiment
And the rate of reconciliation is significantly higher: `~285/min` vs `~58/min`.
(and this is probably limited by the rate at which the benchmark creates the Certificates)

<img src="/docs/devops-tips/large-clusters/cpu-server-side-ecdsa.png" alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with server-side rate-limiting and 2000 ECDSA Certificates" />
<img src="/docs/devops-tips/large-clusters/experiment.2024-04-09-3/Screenshot 2024-04-09 194601.png"
alt="Scatter chart showing cert-manager CPU usage and cluster resource counts over time with server-side rate-limiting and 2000 ECDSA Certificates" />

## Restrict the use of large RSA keys

Expand All @@ -82,20 +86,25 @@ Certificate took 9 minutes and 37 seconds to become ready.

## Set appropriate memory requests and limits

Here are some `memory.request` recommendations for each of the cert-manager components in different scenarios.
Here are some [memory request](https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#motivation-for-memory-requests-and-limits) recommendations for each of the cert-manager components in different scenarios.
The values are calculated from the maximum memory value for each component
during each scenario, with a 15% buffer, rounded up to the nearest `5Mi`.
The minimum is set to `50Mi`.

| Scenario | controller (Mi) | cainjector (Mi) | webhook (Mi) |
|----------------------------------|-----------------|-----------------|--------------|
| [2000 RSA 4096 Certificates][1] | 350 | 150 | 50 |
| [5000 RSA 4096 Certificates][2] | 800 | 300 | 50 |
| [2000 ECDSA 256 Certificates][3] | 300 | 150 | 50 |
| [5000 ECDSA 256 Certificates][3] | | | |
| [2000 RSA 4096 Certificates][1] | 300 | 125 | 50 |
| [5000 RSA 4096 Certificates][2] | 690 | 240 | 50 |
| [2000 ECDSA 256 Certificates][3] | 245 | 95 | 50 |
| [5000 ECDSA 256 Certificates][4] | 465 | 150 | 50 |

[1]: /docs/devops-tips/large-clusters/memory-server-side-ecdsa.png
[2]: /docs/devops-tips/large-clusters/memory-server-side-ecdsa.png
[3]: /docs/devops-tips/large-clusters/default-memory-2000-ecdsa.png
[4]: /docs/devops-tips/large-clusters/memory-server-side-ecdsa.png
[1]: /docs/devops-tips/large-clusters/experiment.2024-04-09-1/index.yaml
[2]: /docs/devops-tips/large-clusters/experiment.2024-04-09-2/index.yaml
[3]: /docs/devops-tips/large-clusters/experiment.2024-04-09-3/index.yaml
[4]: /docs/devops-tips/large-clusters/experiment.2024-04-10-1/index.yaml

> ℹ️ The memory calculation is adapted from the [Robusta KRR simple algorithm](https://github.com/robusta-dev/krr#algorithm).
>
> 📖️ Read [What Everyone Should Know About Kubernetes Memory Limits](https://home.robusta.dev/blog/kubernetes-memory-limit),
> to learn why the best practice is to set memory limit equal to memory request.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# README

* Default installation of cert-manager 1.14
* 2000 RSA 4096 Certificates
Loading

0 comments on commit bbffca0

Please sign in to comment.