Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚙️ Add auto-cleanup for GoodJob #574

Merged
merged 6 commits into from
Dec 9, 2023

Conversation

jeremyf
Copy link
Contributor

@jeremyf jeremyf commented Dec 1, 2023

This commit introduces automatic clean-up of finished successful jobs. It is here to help with the overall performance of GoodJobs as well as the application database. Note the cleanup schedule is a guess on what's appropriate.

Code Walk Through of Good Jobs Configuration

The
README explains the configuration for the Scheduler.

- `cleanup_interval_seconds` (integer) Number of seconds a Scheduler will wait before cleaning up preserved jobs. Defaults to `nil`. Can also be set with  the environment variable `GOOD_JOB_CLEANUP_INTERVAL_SECONDS`.

Here's the implementation of
GoodJob::Configuration regarding the cleanup_interval_seconds. Which is in the GoodJob::CleanupTracker.

def cleanup_interval_seconds
  value = (
    rails_config[:cleanup_interval_seconds] ||
    env['GOOD_JOB_CLEANUP_INTERVAL_SECONDS'] ||
    DEFAULT_CLEANUP_INTERVAL_SECONDS
  )
  value.present? ? value.to_i : nil
end

The
GoodJob::CleanupTracker has a #cleanup? method that looks at either job counts or elapsed seconds. Which informs the GoodJob::Scheduler.

def cleanup?
  (cleanup_interval_jobs && job_count > cleanup_interval_jobs) ||
    (cleanup_interval_seconds && last_at < Time.current - cleanup_interval_seconds) ||
    false
end

The
GoodJob::Scheduler observes the tasks as they complete. And one of those is conditionally running #cleanup.

def task_observer(time, output, thread_error)
  error = thread_error || (output.is_a?(GoodJob::ExecutionResult) ? output.unhandled_error : nil)
  GoodJob._on_thread_error(error) if error

  instrument("finished_job_task", { result: output, error: thread_error, time: time })
  return unless output

  @cleanup_tracker.increment
  if @cleanup_tracker.cleanup?
    cleanup
  else
    create_task
  end
end

The
GoodJob::Scheduler's #cleanup method delegates the clean_up to the performer; which is a GoodJob::JobPerformer.

def cleanup
  @cleanup_tracker.reset

  future = Concurrent::Future.new(args: [self, @performer], executor: executor) do |_thr_scheduler, thr_performer|
    Rails.application.executor.wrap do
      thr_performer.cleanup
    end
  end

  observer = lambda do |_time, _output, thread_error|
    GoodJob._on_thread_error(thread_error) if thread_error
    create_task
  end
  future.add_observer(observer, :call)
  future.execute
end

The
GoodJob::JobPerformer then runs the general process GoodJob.cleanup_preserved_jobs (which is available via the CLI).

def cleanup
  GoodJob.cleanup_preserved_jobs
end

The
GoodJob.cleanup_preserved_jobs method is the one that ultimately cleans up preserved jobs.

Note that the include_discarded does some logical hoops with some grammatical antics (e.g. old_jobs.not_discarded unless include_discarded). We are not including discarded jobs so the query will limit to jobs that are not_discarded.

def self.cleanup_preserved_jobs(older_than: nil)
  configuration = GoodJob::Configuration.new({})
  older_than ||= configuration.cleanup_preserved_jobs_before_seconds_ago
  timestamp = Time.current - older_than
  include_discarded = configuration.cleanup_discarded_jobs?

  ActiveSupport::Notifications.instrument("cleanup_preserved_jobs.good_job", { older_than: older_than, timestamp: timestamp }) do |payload|
    old_jobs = GoodJob::ActiveJobJob.where('finished_at <= ?', timestamp)
    old_jobs = old_jobs.not_discarded unless include_discarded
    old_jobs_count = old_jobs.count

    GoodJob::Execution.where(job: old_jobs).destroy_all
    payload[:destroyed_records_count] = old_jobs_count
  end
end

Related to:

Story

Refs #issuenumber

Expected Behavior Before Changes

Expected Behavior After Changes

Screenshots / Video

Notes

@jeremyf jeremyf requested a review from laritakr December 1, 2023 16:51
This commit introduces automatic clean-up of finished successful jobs.
It is here to help with the overall performance of GoodJobs as well as
the application database.  Note the cleanup schedule is a guess on
what's appropriate.

<details><summary>Code Walk Through of Good Jobs Configuration</summary>

The
[README](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/README.md?plain=1#L280)
explains the configuration for the Scheduler.

```markdown
- `cleanup_interval_seconds` (integer) Number of seconds a Scheduler will wait before cleaning up preserved jobs. Defaults to `nil`. Can also be set with  the environment variable `GOOD_JOB_CLEANUP_INTERVAL_SECONDS`.

```

Here's the implementation of
[GoodJob::Configuration](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/configuration.rb#L211-L220)
regarding the cleanup_interval_seconds.  Which is in the `GoodJob::CleanupTracker`.

```ruby
def cleanup_interval_seconds
  value = (
    rails_config[:cleanup_interval_seconds] ||
    env['GOOD_JOB_CLEANUP_INTERVAL_SECONDS'] ||
    DEFAULT_CLEANUP_INTERVAL_SECONDS
  )
  value.present? ? value.to_i : nil
end

```

The
[GoodJob::CleanupTracker](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/cleanup_tracker.rb#L23-L29)
has a `#cleanup?` method that looks at either job counts or elapsed
seconds.  Which informs the `GoodJob::Scheduler`.

```ruby
def cleanup?
  (cleanup_interval_jobs && job_count > cleanup_interval_jobs) ||
    (cleanup_interval_seconds && last_at < Time.current - cleanup_interval_seconds) ||
    false
end
```

The
[GoodJob::Scheduler](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/scheduler.rb#L180-L193)
observes the tasks as they complete.  And one of those is conditionally
running `#cleanup`.

```ruby
def task_observer(time, output, thread_error)
  error = thread_error || (output.is_a?(GoodJob::ExecutionResult) ? output.unhandled_error : nil)
  GoodJob._on_thread_error(error) if error

  instrument("finished_job_task", { result: output, error: thread_error, time: time })
  return unless output

  @cleanup_tracker.increment
  if @cleanup_tracker.cleanup?
    cleanup
  else
    create_task
  end
end
```

The
[GoodJob::Scheduler](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/scheduler.rb#L233-L250)'s
`#cleanup` method delegates the clean_up to the performer; which is a `GoodJob::JobPerformer`.

```ruby
def cleanup
  @cleanup_tracker.reset

  future = Concurrent::Future.new(args: [self, @Performer], executor: executor) do |_thr_scheduler, thr_performer|
    Rails.application.executor.wrap do
      thr_performer.cleanup
    end
  end

  observer = lambda do |_time, _output, thread_error|
    GoodJob._on_thread_error(thread_error) if thread_error
    create_task
  end
  future.add_observer(observer, :call)
  future.execute
end

```

The
[GoodJob::JobPerformer](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/job_performer.rb#L60-L64)
then runs the general process `GoodJob.cleanup_preserved_jobs` (which is
available via the CLI).

```ruby
def cleanup
  GoodJob.cleanup_preserved_jobs
end

```

The
[GoodJob.cleanup_preserved_jobs](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job.rb#L130-L153)
method is the one that ultimately cleans up preserved jobs.

Note that the `include_discarded` does some logical hoops with some
grammatical antics (e.g. `old_jobs.not_discarded unless
include_discarded`).  We are not including discarded jobs so the query
will limit to jobs that are not_discarded.

```ruby
def self.cleanup_preserved_jobs(older_than: nil)
  configuration = GoodJob::Configuration.new({})
  older_than ||= configuration.cleanup_preserved_jobs_before_seconds_ago
  timestamp = Time.current - older_than
  include_discarded = configuration.cleanup_discarded_jobs?

  ActiveSupport::Notifications.instrument("cleanup_preserved_jobs.good_job", { older_than: older_than, timestamp: timestamp }) do |payload|
    old_jobs = GoodJob::ActiveJobJob.where('finished_at <= ?', timestamp)
    old_jobs = old_jobs.not_discarded unless include_discarded
    old_jobs_count = old_jobs.count

    GoodJob::Execution.where(job: old_jobs).destroy_all
    payload[:destroyed_records_count] = old_jobs_count
  end
end

```

</details>

Related to:

- notch8/adventist-dl#690
@jeremyf jeremyf force-pushed the adding-auto-clean-up-to-good-job branch from 04f4229 to 2426ef9 Compare December 1, 2023 16:52
jeremyf added a commit to samvera/hyku that referenced this pull request Dec 1, 2023
This commit introduces automatic clean-up of finished successful Good
Job jobs.  It is here to help with the overall performance of GoodJobs
as well as the application database.  Note the cleanup schedule is a
guess on what's appropriate.

<details><summary>Code Walk Through of Good Jobs Configuration</summary>

The
[README](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/README.md?plain=1#L280)
explains the configuration for the Scheduler.

```markdown
- `cleanup_interval_seconds` (integer) Number of seconds a Scheduler will wait before cleaning up preserved jobs. Defaults to `nil`. Can also be set with  the environment variable `GOOD_JOB_CLEANUP_INTERVAL_SECONDS`.

```

Here's the implementation of
[GoodJob::Configuration](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/configuration.rb#L211-L220)
regarding the cleanup_interval_seconds.  Which is in the `GoodJob::CleanupTracker`.

```ruby
def cleanup_interval_seconds
  value = (
    rails_config[:cleanup_interval_seconds] ||
    env['GOOD_JOB_CLEANUP_INTERVAL_SECONDS'] ||
    DEFAULT_CLEANUP_INTERVAL_SECONDS
  )
  value.present? ? value.to_i : nil
end

```

The
[GoodJob::CleanupTracker](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/cleanup_tracker.rb#L23-L29)
has a `#cleanup?` method that looks at either job counts or elapsed
seconds.  Which informs the `GoodJob::Scheduler`.

```ruby
def cleanup?
  (cleanup_interval_jobs && job_count > cleanup_interval_jobs) ||
    (cleanup_interval_seconds && last_at < Time.current - cleanup_interval_seconds) ||
    false
end
```

The
[GoodJob::Scheduler](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/scheduler.rb#L180-L193)
observes the tasks as they complete.  And one of those is conditionally
running `#cleanup`.

```ruby
def task_observer(time, output, thread_error)
  error = thread_error || (output.is_a?(GoodJob::ExecutionResult) ? output.unhandled_error : nil)
  GoodJob._on_thread_error(error) if error

  instrument("finished_job_task", { result: output, error: thread_error, time: time })
  return unless output

  @cleanup_tracker.increment
  if @cleanup_tracker.cleanup?
    cleanup
  else
    create_task
  end
end
```

The
[GoodJob::Scheduler](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/scheduler.rb#L233-L250)'s
`#cleanup` method delegates the clean_up to the performer; which is a `GoodJob::JobPerformer`.

```ruby
def cleanup
  @cleanup_tracker.reset

  future = Concurrent::Future.new(args: [self, @Performer], executor: executor) do |_thr_scheduler, thr_performer|
    Rails.application.executor.wrap do
      thr_performer.cleanup
    end
  end

  observer = lambda do |_time, _output, thread_error|
    GoodJob._on_thread_error(thread_error) if thread_error
    create_task
  end
  future.add_observer(observer, :call)
  future.execute
end

```

The
[GoodJob::JobPerformer](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job/job_performer.rb#L60-L64)
then runs the general process `GoodJob.cleanup_preserved_jobs` (which is
available via the CLI).

```ruby
def cleanup
  GoodJob.cleanup_preserved_jobs
end

```

The
[GoodJob.cleanup_preserved_jobs](https://github.com/bensheldon/good_job/blob/11b05e525d6cc0d4023b8b8b6b9824c40503b712/lib/good_job.rb#L130-L153)
method is the one that ultimately cleans up preserved jobs.

Note that the `include_discarded` does some logical hoops with some
grammatical antics (e.g. `old_jobs.not_discarded unless
include_discarded`).  We are not including discarded jobs so the query
will limit to jobs that are not_discarded.

```ruby
def self.cleanup_preserved_jobs(older_than: nil)
  configuration = GoodJob::Configuration.new({})
  older_than ||= configuration.cleanup_preserved_jobs_before_seconds_ago
  timestamp = Time.current - older_than
  include_discarded = configuration.cleanup_discarded_jobs?

  ActiveSupport::Notifications.instrument("cleanup_preserved_jobs.good_job", { older_than: older_than, timestamp: timestamp }) do |payload|
    old_jobs = GoodJob::ActiveJobJob.where('finished_at <= ?', timestamp)
    old_jobs = old_jobs.not_discarded unless include_discarded
    old_jobs_count = old_jobs.count

    GoodJob::Execution.where(job: old_jobs).destroy_all
    payload[:destroyed_records_count] = old_jobs_count
  end
end

```

</details>

Related to:

- notch8/adventist-dl#690
- notch8/utk-hyku#574
- https://github.com/scientist-softserv/nnp/issues/309
@ShanaLMoore ShanaLMoore self-requested a review December 4, 2023 14:52
@@ -12,6 +12,9 @@ ENGINE_MOUNT=authorities
FCREPO_HOST=fcrepo
FCREPO_PORT=8080
FCREPO_URL=http://fcrepo:8080/rest
GOOD_JOB_CLEANUP_DISCARDED_JOBS=false
Copy link
Contributor

@ShanaLMoore ShanaLMoore Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these values need to be on the server, you also should add them to the [server_name-deploy].tmpl.yml files

Such as:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShanaLMoore is correct here. I've added them so this can get merged and be part of the QA run

@ShanaLMoore ShanaLMoore self-requested a review December 4, 2023 15:28
Copy link
Contributor

@ShanaLMoore ShanaLMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to comment

jeremyf and others added 5 commits December 5, 2023 11:58
The maintainers yanked 0.3.2 version; so we're falling back to 0.3.1.

Related to:

- dryruby/json-canonicalization#2
- notch8/palni-palci#918
…lization

🧹 Pin to json-canonicalization 0.3.1
* upgrade bulkrax to be less likely to miss existing works when searching for them

* ⚙️ Upgrade to v6.0.0 of Bulkrax

---------

Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>
…st-softserv/utk-hyku into adding-auto-clean-up-to-good-job
@orangewolf orangewolf merged commit 82fbad5 into main Dec 9, 2023
3 of 4 checks passed
@orangewolf orangewolf deleted the adding-auto-clean-up-to-good-job branch December 9, 2023 01:09
@orangewolf orangewolf restored the adding-auto-clean-up-to-good-job branch January 9, 2024 17:28
@orangewolf orangewolf deleted the adding-auto-clean-up-to-good-job branch January 9, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants