-
Prefer RabbitMQ or Redis as broker (never use a relational database as production broker).
-
Do not use complex objects in task as parameters. E.g.: Avoid Django model objects:
# Good @app.task def my_task(user_id): user = User.objects.get(id=user_id) print(user.name) # ...
# Bad @app.task def my_task(user): print(user.name) # ...
-
Do not wait for other tasks inside a task.
-
Prefer idempotent tasks:
- "Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application." - Wikipedia.
-
Prefer atomic tasks:
- "An operation (or set of operations) is atomic ... if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definition—they either successfully change the state of the system, or have no apparent effect." - Wikipedia.
-
Retry when possible. But make sure tasks are idempotent and atomic before doing so. (Retrying)
-
Set
retry_limit
to avoid broken tasks to keep retrying forever. -
Exponentially backoff if things look like they are not going to get fixed soon. Throw in a random factor to avoid cluttering services:
def exponential_backoff(task_self): minutes = task_self.default_retry_delay / 60 rand = random.uniform(minutes, minutes * 1.3) return int(rand ** task_self.request.retries) * 60 # in the task raise self.retry(exc=e, countdown=exponential_backoff(self))
-
Use
autoretry_for
to reduce the boilerplate code for retrying tasks. -
Use
retry_backoff
to reduce the boilerplate code when doing exponention backoff. -
For tasks that require high level of reliability, use
acks_late
in combination withretry
. Again, make sure tasks are idempotent and atomic. (Should I use retry or acks_late?) -
Set hard and soft time limits. Recover gracefully if things take longer than expected:
from celery.exceptions import SoftTimeLimitExceeded @app.task(task_time_limit=60, task_soft_time_limit=45) def my_task(): try: something_possibly_long() except SoftTimeLimitExceeded: recover()
-
Use multiple queues to have more control over throughput and make things more scalable. (Routing Tasks)
-
Extend the base task class to define default behaviour. (Custom Task Classes)
-
Use canvas features to control task flows and deal with concurrency. (Canvas: Designing Work-flows)
- Log as much as possible. Use
get_task_logger
to automatically get the task name and unique id as part of the logs. - In case of failure, make sure stack traces get logged and people get notified (services like Sentry are a good idea).
- Monitor activity using Flower. (Flower: Real-time Celery web-monitor)
- Use
task_always_eager
to test your tasks are geting called.
- Celery: an overview of the architecture and how it works by Vinta.
- Celery in the wild: tips and tricks to run async tasks in the real world by Vinta.
- Celery Best Practices by Balthazar Rouberol.
- Dealing with resource-consuming tasks on Celery by Vinta.
- Tips and Best Practices from the official documentation.
- Task Queues by Full Stack Python Flower: Real-time Celery web-monitor from the official documentation.
- Celery Best Practices: practical approach by Adil.
- 3 GOTCHAS FOR CELERY from Wiredcraft.
- CELERY - BEST PRACTICES by Deni Bertovic.
- Hacker News thread on the above post.
- [video] Painting on a Distributed Canvas: An Advanced Guide to Celery Workflows by David Gouldin.
- Celery in Production by Dan Poirier from Caktus Group.
- [video] Implementing Celery, Lessons Learned by Michael Robellard.
- [video] Advanced Celery by Ask Solem Hoel.