- Prefer RabbitMQ or Redis as broker (never use a relational database as production broker).
- Do not use complex objects in task as parameters. E.g.: Avoid Django model objects.
# Good
@app.task
def my_task(user_id):
user = User.objects.get(id=user_id)
print(user.name)
# ...
# Bad
@app.task
def my_task(user):
print(user.name)
# ...
- Do not wait for other tasks inside a task.
- Prefer idempotent tasks.
"Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application." - Wikipedia
- Prefer atomic tasks.
"An operation (or set of operations) is atomic ... if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definition—they either successfully change the state of the system, or have no apparent effect." - Wikipedia
- Retry when possible. But make sure tasks are idempotent and atomic before doing so. (Retrying)
- Set
retry_limit
to avoid broken tasks to keep retrying forever. - Exponentially backoff if things look like they are not going to get fixed soon. Throw in a randon factor to avoid cluttering services.
def exponencial_backoff(task_self):
minutes = task_self.default_retry_delay / 60
rand = random.uniform(minutes, minutes * 1.3)
return int(rand ** task_self.request.retries) * 60
# in the task
self.retry(exc=e, countdown=exponencial_backoff(self))
- For tasks that require high level of reliability, use
acks_late
in combination withretry
. Again, make sure taks are idempotent and atomic. (Should I use retry or acks_late?) - Set hard and soft time limits. Recover gracefully if things take longer than expected.
from celery.exceptions import SoftTimeLimitExceeded
@app.task(task_time_limit=60, task_soft_time_limit=45)
def my_task():
try:
something_possibly_long()
except SoftTimeLimitExceeded:
recover()
- Use multiple queues to have more control over thruoghput and make things more scalable. (Routing Tasks)
- Extend the base task class to define default behaviour. (Custom Task Classes)
- Use canvas features to control task flows and deal with concurrency. (Canvas: Designing Work-flows)
- Log as much as possible.
- In case of failure, make sure stack traces get logged and people get notified (services like Sentry are a good idea).
- Monitor activity using Flower. (Flower: Real-time Celery web-monitor)
- Use
task_aways_eager
to test your tasks are geting called.
- Tips and Best Practices from the official documentation.
- Task Queues by Full Stack Python
- Flower: Real-time Celery web-monitor from the official documentation.
- 3 GOTCHAS FOR CELERY from Wiredcraft
- CELERY - BEST PRACTICES by Deni Bertovic
- Hacker News thread on the above post
- [video] Painting on a Distributed Canvas: An Advanced Guide to Celery Workflows by David Gouldin
- Celery in Production by Dan Poirier from Caktus Group
- [video] Implementing Celery, Lessons Learned by Michael Robellard
- [video] Advanced Celery by Ask Solem Hoel
- Celery Best Practices by Balthazar Rouberol