Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codify custom search attributes for TemporalNamespace #776

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

TheHiddenLayer
Copy link
Contributor

@TheHiddenLayer TheHiddenLayer commented Jul 23, 2024

Closes #567

This PR codifies custom search attributes, so one can specify:

apiVersion: temporal.io/v1beta1
kind: TemporalNamespace
metadata:
  name: default
  namespace: demo
spec:
  customSearchAttributes:
    name1: value1
    name2: value2
    name3: value2
  // additional fields

@TheHiddenLayer
Copy link
Contributor Author

TODO(@TheHiddenLayer) add tests?

@TheHiddenLayer
Copy link
Contributor Author

TheHiddenLayer commented Jul 24, 2024

Hi @alexandrevilain,

  1. I'm experiencing various breakages running make test-e2e-dev.
  2. I needed helm and buildx to properly build the container when running make test-e2e-dev. Thinking of updating the local development docs to include those deps. Also, do you prefer the local development docs stay minimal/brief? I noticed there's a lot of commands in the Makefile that aren't mentioned in the local development docs.
  3. Any way you could give me some guidance on what tests you want to see for custom search attributes? This is actually my first time contributing to a k8s operator, and I'm referencing this blog post for the basics.

P.S. regarding:

  1. I'm experiencing various breakages running make test-e2e-dev.

the error is different each time. Here's the error I happened to see this time:

I0724 23:18:56.709046     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 5 milliseconds
I0724 23:18:57.177835     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:57.664057     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0724 23:18:58.167209     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 3 milliseconds
I0724 23:19:06.648034     135 round_trippers.go:553] GET https://temporal-becb55e-control-plane:6443/healthz?timeout=10s  in 7993 milliseconds
[api-check] The API server is not healthy after 4m0.02803015s

Unfortunately, an error has occurred:
	context deadline exceeded

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.

@TheHiddenLayer
Copy link
Contributor Author

TheHiddenLayer commented Jul 25, 2024

This is running on an M3 apple silicon with colima

fails again, this time with a different error.

docker save temporal-operator > /tmp/temporal-operator.tar
OPERATOR_IMAGE_PATH=/tmp/temporal-operator.tar go test ./tests/e2e -v -timeout 60m -args "--v=4"
I0724 20:24:34.366456   59239 kind.go:157] Creating kind cluster temporal-fcc1c45
I0724 20:24:34.367030   59239 command.go:37] "Found Provider tooling already installed on the machine" command="kind"
I0724 20:24:34.574632   59239 kind.go:171] Launching:kind create cluster --name temporal-fcc1c45
F0724 20:24:35.841364   59239 env.go:375] Setup failure: failed to create kind cluster: exit status 1 : exit status 1: Creating cluster "temporal-fcc1c45" ...
 • Ensuring node image (kindest/node:v1.30.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.30.0) 🖼
 • Preparing nodes 📦   ...
 ✗ Preparing nodes 📦 
Deleted nodes: ["temporal-fcc1c45-control-plane"]
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
FAIL	github.com/alexandrevilain/temporal-operator/tests/e2e	2.034s
FAIL
make: *** [test-e2e-dev] Error 1

@TheHiddenLayer
Copy link
Contributor Author

even while running the make test-e2e-dev on the main branch I get

=== RUN   TestPersistence/cassandra_persistence/Temporal_cluster_ready_after_upgrade#03
panic: test timed out after 1h0m0s

@alexandrevilain are you able to run the whole e3e test suite without any errors or timeouts?

@TheHiddenLayer TheHiddenLayer marked this pull request as ready for review August 6, 2024 21:32
@@ -136,13 +140,126 @@ func (r *TemporalNamespaceReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
}

err = r.reconcileCustomSearchAttributes(ctx, &logger, namespace, cluster)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexandrevilain is this the right place to start reconciling custom search attribtues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should wait for the namespace to be marked ready, before reconciling custom search attributes, right? But where should I put that waiting logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my current implementation is causing the TemporalNamespace_ready e2e test to fail

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the good place for this :)

@alexandrevilain
Copy link
Owner

Sorry @TheHiddenLayer I'm currently having issues with internet connectivity while I'm in holidays. I can't even pull the cert-manager image ...

@alexandrevilain
Copy link
Owner

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: #777

@TheHiddenLayer
Copy link
Contributor Author

Hi @TheHiddenLayer following PR succesfully ended e2e without any retry: #777

Nice, I'm glad that it worked. It could be something with my logic then? Pls LMK what you think about this.

@@ -58,16 +62,19 @@ type TemporalNamespaceReconciler struct {
func (r *TemporalNamespaceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Result, reterr error) {
logger := log.FromContext(ctx)

logger.Info("Starting reconciliation")
logger.Info("=== (NS) STARTING RECONCILIATION")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're ok on the fast that those kind of logging will be removed once the code is working, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. These logging statements are for my debugging. I pushed them up to keep my branch up to date. Unfortunately, they show up in the PR.

@@ -136,13 +140,126 @@ func (r *TemporalNamespaceReconciler) Reconcile(ctx context.Context, req ctrl.Re
}
}

err = r.reconcileCustomSearchAttributes(ctx, &logger, namespace, cluster)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the good place for this :)

@TheHiddenLayer
Copy link
Contributor Author

TheHiddenLayer commented Aug 25, 2024

@alexandrevilain I haven't forgotten about this PR. I'm still planning to finish it off. I’ve been tied up lately with work. Thanks for your patience!

Copy link

@TheHiddenLayer TheHiddenLayer marked this pull request as draft August 29, 2024 16:55
@alexandrevilain
Copy link
Owner

Hi @TheHiddenLayer !
Do you need some help for the last few things to fix ?

@TheHiddenLayer
Copy link
Contributor Author

TheHiddenLayer commented Sep 26, 2024

hey @alexandrevilain! the feature itself works as expected. I've done manual tests confirming:

  • Add an attr whose name exists with same type -> no change (b/c no reconciliation needed)
  • Add an attr whose name exists with different type -> properly returns error (b/c user first needs to delete existing attr with same name)
  • Add an attr with invalid/gibberish type -> correctly returns parsing error (invalid type)
  • Add an attr with ":" in the name -> adds attr correctly (this tests a potential bug in postgres where key might not work if it has a colon)
  • Add a brand new attr -> adds attr correctly
  • Remove an existing attr -> removes attr correctly

BUT I didn't deeply investigate why this doesn't pass some E2E tests, so any assistance on that would be great.

I wanted to filter out and run a few tests that were failing to isolate my debugging, but the option flags I passed into the test command didn't do anything. So, the fact that I had to run the entire E2E pipeline (30-50min on my computer) has made this arduous to debug.

Copy link

sonarqubecloud bot commented Oct 1, 2024

@TheHiddenLayer
Copy link
Contributor Author

I'll clean up the comments/dead code/debugging log statements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Custom Search Attributes to TemporalNamespace
2 participants