Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permissions error in file-transfer service is treated as transient when it will never resolve itself #3374

Open
jarhodes314 opened this issue Jan 31, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jarhodes314
Copy link
Contributor

Describe the bug
When triggering a configuration snapshot where tedge does not have permissions to write to/create /var/tedge/file-transfer/<device_id>/, the file-transfer service generates a 500 internal server error. This appears to be treated as a transient error by the uploader code, so it attempts to retry the upload even though it is essentially guaranteed to fail without some human intervention.

To Reproduce

  1. Delete /var/tedge/file-transfer/<device_id>/
  2. sudo chown root "/var/tedge/file-transfer/ && sudo chmod 755 /var/tedge/file-transfer
  3. Trigger a configuration-snapshot operation in Cumulocity

The operation gets stuck in the executing state. You can confirm there is a permissions error through observing the following:

$ systemctl status tedge-agent
Jan 31 17:45:21 james-Latitude-5511 tedge-agent[687003]:     Permission denied (os error 13)
Jan 31 17:45:21 james-Latitude-5511 tedge-agent[687003]: 2025-01-31T17:45:21.24358042Z  WARN upload::upload: Temporary failure: HTTP status server error (500 Internal Server Error) for url (http://127.0.0.1:8000/tedge/file-transfer/jrh-test.5/config_snapshot/tedge-configuration-plugin-c8y-mapper-141376). Retrying in 22s
Jan 31 17:45:43 james-Latitude-5511 tedge-agent[687003]: 2025-01-31T17:45:43.741483458Z ERROR tedge_agent::http_server::error: Request to upload to "jrh-test.5/config_snapshot/tedge-configuration-plugin-c8y-mapper-141376" failed: Directory Error. Check permissions for /var/tedge/file-transfer/jrh-test.5/config_snapshot.
Jan 31 17:45:43 james-Latitude-5511 tedge-agent[687003]: Caused by:
Jan 31 17:45:43 james-Latitude-5511 tedge-agent[687003]:     Permission denied (os error 13)
Jan 31 17:45:43 james-Latitude-5511 tedge-agent[687003]: 2025-01-31T17:45:43.741968038Z  WARN upload::upload: Temporary failure: HTTP status server error (500 Internal Server Error) for url (http://127.0.0.1:8000/tedge/file-transfer/jrh-test.5/config_snapshot/tedge-configuration-plugin-c8y-mapper-141376). Retrying in 36s
Jan 31 17:46:20 james-Latitude-5511 tedge-agent[687003]: 2025-01-31T17:46:20.488489158Z ERROR tedge_agent::http_server::error: Request to upload to "jrh-test.5/config_snapshot/tedge-configuration-plugin-c8y-mapper-141376" failed: Directory Error. Check permissions for /var/tedge/file-transfer/jrh-test.5/config_snapshot.
Jan 31 17:46:20 james-Latitude-5511 tedge-agent[687003]: Caused by:
Jan 31 17:46:20 james-Latitude-5511 tedge-agent[687003]:     Permission denied (os error 13)
Jan 31 17:46:20 james-Latitude-5511 tedge-agent[687003]: 2025-01-31T17:46:20.489027966Z  WARN upload::upload: Temporary failure: HTTP status server error (500 Internal Server Error) for url (http://127.0.0.1:8000/tedge/file-transfer/jrh-test.5/config_snapshot/tedge-configuration-plugin-c8y-mapper-141376). Retrying in 45s

Expected behavior
If the uploader cannot upload to the file transfer service because of a misconfiguration such as this it should abort the request rather than retrying it. I suspect we just want to have different transient-error detection behaviour depending on whether we are connecting to an internal API (where it is probably reasonable to assume a 500 is a permanent error) or an external one (including the Cumulocity proxy, as that will echo the status code from Cumulocity).

@jarhodes314 jarhodes314 added the bug Something isn't working label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant