Skip to content

Latest commit

 

History

History
76 lines (71 loc) · 5.88 KB

checklist-en.md

File metadata and controls

76 lines (71 loc) · 5.88 KB

1. Best Practices

  • Prefer RabbitMQ or Redis as broker (never use a relational database as production broker).

  • Do not use complex objects in task as parameters. E.g.: Avoid Django model objects:

    # Good
    @app.task
    def my_task(user_id):
        user = User.objects.get(id=user_id)
        print(user.name)
        # ...
    
    # Bad
    @app.task
    def my_task(user):
        print(user.name)
        # ...
    
  • Do not wait for other tasks inside a task.

  • Prefer idempotent tasks:

    • "Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application." - Wikipedia.
  • Prefer atomic tasks:

    • "An operation (or set of operations) is atomic ... if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definition—they either successfully change the state of the system, or have no apparent effect." - Wikipedia.
  • Retry when possible. But make sure tasks are idempotent and atomic before doing so. (Retrying)

  • Set retry_limit to avoid broken tasks to keep retrying forever.

  • Exponentially backoff if things look like they are not going to get fixed soon. Throw in a random factor to avoid cluttering services:

    def exponential_backoff(task_self):
        minutes = task_self.default_retry_delay / 60
        rand = random.uniform(minutes, minutes * 1.3)
        return int(rand ** task_self.request.retries) * 60
    
    # in the task
    raise self.retry(exc=e, countdown=exponential_backoff(self))
    
  • Use autoretry_for to reduce the boilerplate code for retrying tasks.

  • Use retry_backoff to reduce the boilerplate code when doing exponention backoff.

  • For tasks that require high level of reliability, use acks_late in combination with retry. Again, make sure tasks are idempotent and atomic. (Should I use retry or acks_late?)

  • Set hard and soft time limits. Recover gracefully if things take longer than expected:

    from celery.exceptions import SoftTimeLimitExceeded
    
    @app.task(task_time_limit=60, task_soft_time_limit=45)
    def my_task():
        try:
            something_possibly_long()
        except SoftTimeLimitExceeded:
            recover()
    
  • Use multiple queues to have more control over throughput and make things more scalable. (Routing Tasks)

  • Extend the base task class to define default behaviour. (Custom Task Classes)

  • Use canvas features to control task flows and deal with concurrency. (Canvas: Designing Work-flows)

2. Monitoring & Tests

  • Log as much as possible. Use get_task_logger to automatically get the task name and unique id as part of the logs.
  • In case of failure, make sure stack traces get logged and people get notified (services like Sentry are a good idea).
  • Monitor activity using Flower. (Flower: Real-time Celery web-monitor)
  • Use task_always_eager to test your tasks are geting called.

3. Resources to check