Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"invalid cursor script" from redis on clicking 'stop' button for server #252

Closed
kellyrowland opened this issue Aug 20, 2024 · 7 comments · Fixed by #253
Closed

"invalid cursor script" from redis on clicking 'stop' button for server #252

kellyrowland opened this issue Aug 20, 2024 · 7 comments · Fixed by #253
Labels
bug Something isn't working

Comments

@kellyrowland
Copy link

kellyrowland commented Aug 20, 2024

Hi - I recently started seeing an error from redis while doing some development work on a test hub.

The behavior is that I start a server, navigate back to the hub control panel, and then click the 'stop' button. The 'stop' button then disappears immediately, but then if I refresh the hub control panel page, the 'stop' button appears again with the server still running.

Unfortunately I'm not sure what's caused this so I don't have an exact set of steps to reproduce this. I've been testing out different parts of the authenticator code for the hub, but that doesn't seem like it should impact redis/traefik-proxy.

I'm running JupyterHub 5.1.0, JupyterLab 4.2.4, and the latest traefik-proxy code installed from GitHub via pip.

I've attached a log from the hub container from yesterday afternoon: hub-5445698fc7-mc62r_hub.log

Seeing this again today the relevant error lines seem to be:

2024-08-20T21:10:07.952716223Z [I 2024-08-20 14:10:07.952 JupyterHub log:192](B 204 DELETE /hub/api/users/rowlandk/servers/perlmutter-login-node-base?_xsrf=[secret] (rowlandk@10.42.10.129) 1078.71ms
2024-08-20T21:10:07.953886053Z Exception in callback TraefikRedisProxy._kv_atomic_delete.<locals>.<lambda>(<Task finishe...r_script:5.')>) at /
opt/env/lib/python3.11/site-packages/jupyterhub_traefik_proxy/redis.py:148
2024-08-20T21:10:07.953896173Z handle: <Handle TraefikRedisProxy._kv_atomic_delete.<locals>.<lambda>(<Task finishe...r_script:5.')>) at /opt/env/lib/python3.11/site-packages/jupyterhub_traefik_proxy/redis.py:148>
2024-08-20T21:10:07.953900133Z Traceback (most recent call last):
2024-08-20T21:10:07.953902793Z   File "/usr/local/lib/python3.11/asyncio/events.py", line 84, in _run
2024-08-20T21:10:07.953905213Z     self._context.run(self._callback, *self._args)
2024-08-20T21:10:07.953908033Z   File "/opt/env/lib/python3.11/site-packages/jupyterhub_traefik_proxy/redis.py", line 149, in <lambda>
2024-08-20T21:10:07.953911024Z     "Deleted %i keys in %s", f.result(), prefix
2024-08-20T21:10:07.953913834Z                              ^^^^^^^^^^
2024-08-20T21:10:07.953916294Z   File "/opt/env/lib/python3.11/site-packages/redis/commands/core.py", line 5172, in __call__
2024-08-20T21:10:07.953919214Z     return await client.evalsha(self.sha, len(keys), *args)
2024-08-20T21:10:07.953921724Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-20T21:10:07.953924214Z   File "/opt/env/lib/python3.11/site-packages/redis/asyncio/client.py", line 610, in execute_command
2024-08-20T21:10:07.953927004Z     return await conn.retry.call_with_retry(
2024-08-20T21:10:07.953929194Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-20T21:10:07.954001916Z   File "/opt/env/lib/python3.11/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
2024-08-20T21:10:07.954014546Z     return await do()
2024-08-20T21:10:07.954022696Z            ^^^^^^^^^^
2024-08-20T21:10:07.954025356Z   File "/opt/env/lib/python3.11/site-packages/redis/asyncio/client.py", line 584, in _send_command_parse_response
2024-08-20T21:10:07.954027767Z     return await self.parse_response(conn, command_name, **options)
2024-08-20T21:10:07.954030576Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-20T21:10:07.954033117Z   Fi
le "/opt/env/lib/python3.11/site-packages/redis/asyncio/client.py", line 631, in parse_response
2024-08-20T21:10:07.954035717Z     response = await connection.read_response()
2024-08-20T21:10:07.954038127Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-20T21:10:07.954040327Z   File "/opt/env/lib/python3.11/site-packages/redis/asyncio/connection.py", line 569, in read_response
2024-08-20T21:10:07.954043167Z     raise response from None
2024-08-20T21:10:07.954045337Z redis.exceptions.ResponseError: invalid cursor script: b62ca590c1fc35c60eac74a06eae6a49d500ed5c, on @user_script:5.

Please let me know if there's any other information that I can provide or things that I might test out.

@kellyrowland kellyrowland added the bug Something isn't working label Aug 20, 2024
@kellyrowland
Copy link
Author

I did try redis-cli flushall over in the redis container to see if that cleaned anything out, but the issue still persists.

@minrk
Copy link
Member

minrk commented Aug 21, 2024

Thanks for the detailed report! Does restarting the Hub fix it?

Deletion is handled by a registered script. My first guess is the script cache is cleared and we aren't using the client properly to re-register it. The docs look like we're doing it right. So if restarting the Hub doesn't fix it, I don't think that will be it.

If that's not it, it may be something about having multiple concurrent script-deletes outstanding, which I think ought to be a redis.asyncio bug, but should be possible to workaround by awaiting them all in order, or rewriting the script to one script delete for all the prefixes.

@minrk
Copy link
Member

minrk commented Aug 21, 2024

What is your version of redis-server (redis-server --version)? I think something might have changed since 7.2.5

@minrk
Copy link
Member

minrk commented Aug 21, 2024

Indeed, this has actually started coming up in our CI with the latest redis-server version. So easily reproducible and I have a fix. Coming up shortly.

@minrk
Copy link
Member

minrk commented Aug 21, 2024

Should be fixed by #253 if you want to test. We should see it in the tests because the test run with latest CI is reproducing this error right now.

@kellyrowland
Copy link
Author

Thanks for the quick look at this and bug fix!

I can confirm that downgrading from redis:latest to redis:7.2.5 fixes the issue running against the code in this repo; I'll test out your branch against redis:7.4.0 as well.

@kellyrowland
Copy link
Author

Also confirmed that the changes in #253 work with redis:7.4.0. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants