-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Decouple the runner from the outpack_server store.
In the previous design, outpack_server, the runner API and the workers all shared a single outpack directory, and a Git repository (mostly, the workers used a clone of the shared repository for the actual execution). This creates a very tight and brittle coupling between all the components. It makes it impossible to deploy the different components on separate machines. It requires careful reasoning about data races and conflicts between the different bits. It prevents us from sharing worker processes across multiple Packit instances, and it prevents us from using multiple Git repositories within a single instance. The new design completely splits up the storage. - The API server and each worker have their own local Git clones of the repositories, that are directly pulled from the upstream (eg. GitHub). - The API servers and workers store bare Git clones of the repositories, without any worktree. When running a report, workers create a new worktree in a temporary directory, run the report and delete the worktree. This ensures a completely clean slate every time. - The workers use their own outpack store, that is not shared with any other process. - The workers can pull and push packets using any protocol supported by orderly2. In practice, we will be using HTTP to interact with the outpack_server used by Packit. Currently, the workers create a new outpack store for each run, meaning they do not cache any of the packet dependencies and need to download them from the outpack_server from scratch every time. Given that, at least for now, workers and outpack_server will be operating on the same or nearby machines, this seems like a reasonable overhead. Ideally we would keep a per-worker cache, however we need to be careful not to mix packets between different instances. One possible approach may be to re-use the file store, but start from an empty metadata store everytime. This way large unnecessary file downloads are avoided, while preserving some degree of isolation between runs and instances.
- Loading branch information
Showing
20 changed files
with
583 additions
and
552 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,66 +1,34 @@ | ||
runner_run <- function(orderly_root, reportname, parameters, branch, ref, ...) { | ||
# Setup | ||
worker_id <- Sys.getenv("RRQ_WORKER_ID") | ||
worker_path <- file.path(orderly_root, ".packit", "workers", worker_id) | ||
point_head_to_ref(worker_path, branch, ref) | ||
|
||
# Initial cleanup | ||
git_clean(worker_path) | ||
|
||
# Run | ||
id <- withr::with_envvar( | ||
c(ORDERLY_SRC_ROOT = file.path(worker_path, "src", reportname)), | ||
orderly2::orderly_run(reportname, parameters = parameters, | ||
root = orderly_root, ...) | ||
) | ||
|
||
# Cleanup | ||
git_clean(worker_path) | ||
|
||
runner_run <- function(url, branch, ref, reportname, parameters, location, ...) { | ||
storage <- Sys.getenv("ORDERLY_WORKER_STORAGE") | ||
stopifnot(nzchar(storage) && fs::dir_exists(storage)) | ||
|
||
repositories <- fs::dir_create(storage, "git") | ||
worktree_base <- fs::dir_create(storage, "worktrees") | ||
|
||
repo <- git_sync(repositories, url) | ||
|
||
# We could create the worktree with a detached HEAD and not bother with | ||
# creating the branch, but then Orderly's metadata wouldn't be as complete. | ||
# | ||
# Using a named branch does introduce some persistent state in the Git | ||
# repository, which the runner generally does its best to avoid. That is an | ||
# acceptable compromise given that repositories are private to each worker. | ||
# | ||
# The branch/ref association here does not have to match the remote: if | ||
# the upstream repository has just been pushed to, the commit associated with | ||
# this run may be an older commit of that branch. We will happily use that | ||
# older commit. | ||
gert::git_branch_create(branch, ref = ref, force = TRUE, checkout = FALSE, | ||
repo = repo) | ||
worktree <- create_temporary_worktree(repo, branch, worktree_base) | ||
|
||
orderly2::orderly_init(worktree) | ||
orderly2::orderly_location_add("upstream", location$type, location$args, | ||
root = worktree) | ||
|
||
id <- orderly2::orderly_run(reportname, parameters = parameters, | ||
fetch_metadata = TRUE, allow_remote = TRUE, | ||
location = "upstream", root = worktree, ...) | ||
orderly2::orderly_location_push(id, "upstream", root = worktree) | ||
id | ||
} | ||
|
||
point_head_to_ref <- function(worker_path, branch, ref) { | ||
gert::git_fetch(repo = worker_path) | ||
gert::git_branch_checkout(branch, repo = worker_path) | ||
gert::git_reset_hard(ref, repo = worker_path) | ||
} | ||
|
||
add_dir_parent_if_empty <- function(files_to_delete, path) { | ||
contained_files <- list.files(path, full.names = TRUE) | ||
if (length(setdiff(contained_files, files_to_delete)) > 0) { | ||
return(files_to_delete) | ||
} | ||
add_dir_parent_if_empty(c(files_to_delete, path), dirname(path)) | ||
} | ||
|
||
get_empty_dirs <- function(worker_path) { | ||
dirs <- fs::dir_ls(worker_path, recurse = TRUE, type = "directory") | ||
Reduce(add_dir_parent_if_empty, c(list(character()), dirs)) | ||
} | ||
|
||
git_clean <- function(worker_path) { | ||
# gert does not have git clean but this should achieve the same thing | ||
tryCatch( | ||
{ | ||
gert::git_stash_save( | ||
include_untracked = TRUE, | ||
include_ignored = TRUE, | ||
repo = worker_path | ||
) | ||
gert::git_stash_drop(repo = worker_path) | ||
}, | ||
error = function(e) { | ||
# we don't need to rethrow the error here since it doesn't break any | ||
# further report runs | ||
if (e$message != "cannot stash changes - there is nothing to stash.") { | ||
# TODO add logger here | ||
message(e$message) | ||
} | ||
NULL | ||
} | ||
) | ||
# however git ignores all directories, only cares about files, so we may | ||
# have empty directories left | ||
unlink(get_empty_dirs(worker_path), recursive = TRUE) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.