An orchestrator is a system that provides automation for deploying, scaling, and otherwise managing containers.
To focus on our main goal, we’re going to limit the number of tools and libraries we use. Here’s the list of tools and libraries we’re going to use:
- Go
- chi
- Docker SDK
- BoltDB
- Linux
- goprocinfo
- The task
- The job
- The scheduler
- The manager
- The worker
- The cluster
- The command-line interface (CLI)
Some of these components can be seen here:
The manager is the brain of an orchestrator and the main entry point for users. To run jobs in the orchestration system, users submit their jobs to the manager. The manager, using the scheduler, then finds a machine where the job’s tasks can run. The manager also periodically collects metrics from each of its workers, which are used in the scheduling process.
The manager should do the following:
- Accept requests from users to start and stop tasks.
- Schedule tasks onto worker machines.
- Keep track of tasks, their states, and the machine on which they run.
The task is the smallest unit of work in an orchestration system and typically runs in a container.
A task should specify the following:
- The amount of memory, CPU, and disk it needs to run effectively
- What the orchestrator should do in case of failures, typically called a restart policy
- The name of the container image used to run the task
The job is an aggregation of tasks. It has one or more tasks that typically form a larger logical grouping of tasks to perform a set of functions.
A job should specify details at a high level and will apply to all tasks it defines:
- Each task that makes up the job
- Which data centers the job should run in
- How many instances of each task should run
- The type of the job (should it run continuously or run to completion and stop?)
The scheduler decides what machine can best host the tasks defined in the job. The decision-making process can be as simple as selecting a node from a set of machines in a round-robin fashion or as complex as the Enhanced Parallel Virtual Machine (E-PVM) scheduler.
The scheduler should perform these functions:
- Determine a set of candidate machines on which a task could run
- Score the candidate machines from best to worst
- Pick the machine with the best score
The worker provides the muscles of an orchestrator. It is responsible for running the tasks assigned to it by the manager. If a task fails for any reason, it must attempt to restart the task. The worker also makes metrics about its tasks and overall machine health available for the manager to poll.
The worker is responsible for the following:
- Running tasks as Docker containers
- Accepting tasks to run from a manager
- Providing relevant statistics to the manager for the purpose of scheduling tasks
- Keeping track of its tasks and their states
The cluster is the logical grouping of all the previous components. An orchestration cluster could be run from a single physical or virtual machine. More commonly, however, a cluster is built from multiple machines, from as few as five to as many as thousands or more. The cluster is the level at which topics like high availability (HA) and scalability come into play.
CLI, the main user interface, should allow a user to
- Start and stop tasks
- Get the status of tasks
- See the state of machines (i.e., the workers)
- Start the manager
- Start the worker