Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a page to manage jobs in the job queue. #2232

Merged
merged 6 commits into from
Jan 8, 2024

Conversation

drgrice1
Copy link
Member

All jobs for a course are listed on this page. The table displays the job id (this is used to reference specific jobs in action messages, otherwise it would not be shown), task name, created time, started time, finished time, and state. Also a button that opens a popover containing the job result is in the state column if the job has completed.

Note that the Minion job queue automatically removes jobs from the job queue after two days (that is the default at least which we don't change). So the real importance of this page is to allow the instructor to see the status of recently completed or in progress jobs.

At this point the actions available on the page are filter, sort, and delete. Jobs can be filtered by id, task name, or state. Jobs can be sorted by clicking on the headers, or by using the sort form. Jobs that are not active can be deleted.

Minion does not allow deletion of active jobs. Note that an active job means a job that is currently running. As such they can not be selected on this page. Perhaps an option to stop running jobs could be added at if there is a problem with jobs hanging, but active jobs can not be directly stopped. The Minion worker is in a different process so the Mojolicious app needs to broadcast a signal to the Minion worker to do so.

An inactive job (i.e., a job that has been queued but has not started running yet) can be selected and deleted. However, it is possible that the inactive job could start before the form is submitted. In that case the job can not be deleted, and so an alert will show that.

In order to reliably associate a course with a job there is a new rule for tasks. The job must pass the course id via the "notes" option of the Minion enqueue method. The existing tasks have been updated to do this. There is also a backwards compatibility check to find jobs that passed it one of the ways the two jobs did it before in the job arguments.

Since the job fail/finish messages are now displayed in the UI, those messages are now translated. That is all except the first few messages in each task before the course environment is established, since a course environment is required to obtain the language of the course.

The send_instructor_email task no longer sends an email to the instructor after sending the emails to the students. Instead the job result contains all of the information that would have been in that email. This is a far more reliable way of getting that information to the instructor sending the email. The instructor just needs to go to the "Job Manager" page to see the result. The message on the "Email" page tells the instructor this.

This page is also available for the admin course. In the admin course all jobs for all courses are shown. There is an additional column in the jobs table that shows the course id for the course the job was enqueued by.

Also, the errors that are reported when sending emails are made less verbose by calling the message method of the Mojo::Exception which does not include the traceback.

@drgrice1 drgrice1 force-pushed the job-manager branch 7 times, most recently from c2c9bbf to 3e74e23 Compare October 22, 2023 20:00
@somiaj
Copy link
Contributor

somiaj commented Oct 22, 2023

I like this idea, trying to think of a good way to test this, as the only place I use the queue is on my live server. Is there a way I can copy the queue from a live server just to check out the page?

@drgrice1
Copy link
Member Author

You could copy the DATA/webwork2_job_queue.db file from your server to your test server. Archive a course that uses the job queue, and also copy the archived course file to your test server. Then open the Job Manager page in that course on your test server.

You can also just run some jobs in a course on your test server by configuring LTI grade passback (but with invalid credentials or even an invalid LMS to begin with). The jobs will still run. Furthermore the failure will show up in the Job Manager. You can do the same with sending emails. Even if you don't have smtp configured, the job will run, and the errors will be shown in the Job Manager.

@drgrice1
Copy link
Member Author

@somiaj: By the way, if you copy the sqlite db file and a course archive, make sure you either disable grade pass back for the course, or don't have the job queue running when you test this. You don't want to pass back grades from your test server. Note that you don't need the job queue running to test this pull request. It doesn't use the job queue worker process, just the database.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

@drgrice1 thanks for the warning/remind, I always just make the secret invalid and use a fake url on my dev system, this should give me lots of errors to see in progress. What are the webwork2_job_queue.db-{shm,wal} for?

@drgrice1
Copy link
Member Author

I am not entirely sure. They are used internally by sqlite. I suspect sqlite uses them for caching and locking and such. You only need the .db file though.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

Here are the results of my tests. I like the new page, though even though all of my jobs should have all failed, they didn't (though maybe the errors of trying to send to localhost with no response wasn't caught). But instead some of my lti mass updates finished and others failed due to what appears to be a bug.

This is the message I get on the updates that failed. Note I could not find a way to actually copy/paste any of the info in the info popup, so I had to take a screenshot.

image

Unsure if this bug is new with this PR, or something that has been hiding.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

The updates that are failing are for a single user and all sets.

@drgrice1
Copy link
Member Author

There was a mistake on line 79 of LTIMassUpdate.pm. It has been fixed.

Note that the LTI update job will often succeed even if it can not actually send grades. The actual update method in WeBWorK::Authen::LTIAdvanced::SubmitGrade (or WeBWorK::Authen::LTIAdvantage::SubmitGrade) can fail and the job still succeed. For those failures you will need to check the logs/webwork2.log file. Job success only means that the submit_course_grade method was called and returned.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

Now I get this error when restarting webwork2.service

Oct 22 20:10:51 webwork hypnotoad[1998]: Can't load application from file "/opt/webwork/webwork2-develop/bin/webwork2": Version control conflict marker at /opt/webwork/webwork2-develop/lib/Mojolicious/WeBWorK/Tasks/LTIMassUpdate.pm line 79, near "<<<<<<<"
Oct 22 20:10:51 webwork hypnotoad[1998]: Version control conflict marker at /opt/webwork/webwork2-develop/lib/Mojolicious/WeBWorK/Tasks/LTIMassUpdate.pm line 81, near "======="
Oct 22 20:10:51 webwork hypnotoad[1998]: Version control conflict marker at /opt/webwork/webwork2-develop/lib/Mojolicious/WeBWorK/Tasks/LTIMassUpdate.pm line 83, near ">>>>>>>"

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

grr, nevermind...my error --- I did a pull instead of reset.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

Thanks, this is working fine now. Would it be possible to get errors from submit_course_grade? I could see this confusing a user who sees that the update job has finished sucessfully, but yet when they check their LMS, the grades haven't been updated. In general it should be fine, but it could help users who don't have access to the logs if something goes wrong.

Copy link
Contributor

@somiaj somiaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition.

@drgrice1
Copy link
Member Author

I plan to do more work on the LTI mass update task to make failures come through to this page. The challenge is that the same code is used for when students submit answers. So care is needed to get messages out of the task without interfering with that. I have some ideas though.

I think that is for another pull request though.

As to sending emails, you must have localhost set up to accept emails in some way. Otherwise, the message would show up on this page.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

Actually, I don't see any records of my updates failing in logs/webwork2.log, and enabling $debug_lti_grade_passback doesn't give any info about the jobs failing either. So I can't seem to find any evidence of LTI update failures from the webwork size, as it seems to think they will always be successful.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

@drgrice1 thanks for the info, yea that can be in another pull request, as this isn't changing the fact that info about failed LTI updates doesn't seem to currently be logged anywhere.

@drgrice1
Copy link
Member Author

Are you sure the users have LTI pass back info? If not there is no failure. There is nowhere to pass grades back to, so the users will be quietly skipped. To have pass back info the user must have accessed an assignment from the LMS.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

They should, this is an archive of a live course that grade pass back is currently working, that I made right before doing all these tests. The only thing I changed was to make the secret equal to invalid and the url https://localhost so any attempt to pass the grades back would fail.

@drgrice1
Copy link
Member Author

I suspect there is some issue with your test setup. Check permissions on the webwork2.log file. Make sure that the job queue and the webwork2 app are both running as the same user, and that both have permission to write to the file.

I am certain that the log is written to by the job.

@drgrice1
Copy link
Member Author

Also, on your test server are you running the minion worker in production mode? If not, then it will not write to the webwork2.log file. Instead it will show the messages in the terminal that you are running the worker in.

@somiaj
Copy link
Contributor

somiaj commented Oct 23, 2023

Yea, I figured it out, I looked at the logs for the service webwork2-job-queue.service and discovered I didn't have the LTI setup correctly, so an undefined variable $ce->{LTIVersion} was either failing or assuming I was using LTI 1.3, and things weren't working as a result. The job is now running, but taking a long time to finish, thanks for working though this with me. In any case, this PR seems ready to go, and issues are on my end now.

@drgrice1 drgrice1 force-pushed the job-manager branch 2 times, most recently from 49c7d4f to e4c686b Compare October 23, 2023 11:24
@drgrice1
Copy link
Member Author

@somiaj: I switched to a modal dialog for showing the results. This should do what you want.

@somiaj
Copy link
Contributor

somiaj commented Oct 27, 2023

I do like the modal a bit better if you decide to keep all the details logged, thanks. Though if you do decide to revert it so it only gives a few lines of info, the modal won't be needed. I could see that info being helpful in rare situations to debug the jobs, but in most cases probably not needed.

Probably best to see if anyone else has any opinions on how much info should be shared via this page, but overall it gives a nice record of grade updates and emails sent via the job queue.

One thing I noticed is 'failed' jobs seem to stick around a lot longer than the other jobs. All my 'finished' jobs from last testing this on Oct 22nd are no longer available, but the 'failed' ones are still there. Looking at the admin course, I see some failed jobs from other courses back in may still present (from when I was having trouble with the lti update then). So it seems that failed jobs aren't removed from the database automatically like finished jobs are.

@drgrice1
Copy link
Member Author

Yes, "failed" jobs are not removed from the database automatically. Only jobs that have transitioned to the "finished" state are automatically removed.

@drgrice1 drgrice1 force-pushed the job-manager branch 4 times, most recently from 0d9a9b1 to 32021c4 Compare November 11, 2023 11:30
@drgrice1 drgrice1 force-pushed the job-manager branch 2 times, most recently from 687d25a to 0e53b0c Compare November 20, 2023 23:19
@drgrice1 drgrice1 force-pushed the job-manager branch 3 times, most recently from 7677b3e to dc9221f Compare December 4, 2023 22:36
@drgrice1 drgrice1 force-pushed the job-manager branch 2 times, most recently from eb417d1 to 863d5bb Compare December 5, 2023 00:31
@drgrice1
Copy link
Member Author

drgrice1 commented Dec 5, 2023

The achievement email notification task is updated with the changes needed to work with this. So this is ready for further review.

Copy link
Contributor

@Alex-Jordan Alex-Jordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did basic testing with this and I don't have anything to report. I did not do more thorough testing involving copying stuff from production over to my test server.

This seems like a good new feature, where perhaps, maybe if something was overlooked and not working as intended, that would come out if we merge this now and try using it more before 2.19 is released.

All jobs for a course are listed on this page. The table displays the
job id (this is used to reference specific jobs in action messages,
otherwise it would not be shown), task name, created time, started time,
finished time, and state.  Also a button that opens a popover containing
the job result is in the state column if the job has completed.

Note that the Minion job queue automatically removes jobs from the job
queue after two days (that is the default at least which we don't
change).  So the real importance of this page is to allow the instructor
to see the status of recently completed or in progress jobs.

At this point the actions available on the page are filter, sort, and
delete.  Jobs can be filtered by id, task name, or state. Jobs can be
sorted by clicking on the headers, or by using the sort form.  Jobs that
are not active can be deleted.

Minion does not allow deletion of active jobs.  Note that an active job
means a job that is currently running. As such they can not be selected
on this page. Perhaps an option to stop running jobs could be added at
if there is a problem with jobs hanging, but active jobs can not be
directly stopped. The Minion worker is in a different process so the
Mojolicious app needs to broadcast a signal to the Minion worker to do
so.

An inactive job (i.e., a job that has been queued but has not started
running yet) can be selected and deleted.  However, it is possible that
the inactive job could start before the form is submitted.  In that case
the job can not be deleted, and so an alert will show that.

In order to reliably associate a course with a job there is a new rule
for tasks.  The job must pass the course id via the "notes" option of
the Minion enqueue method.  The existing tasks have been updated to do
this. There is also a backwards compatibility check to find jobs that
passed it one of the ways the two jobs did it before in the job
arguments.

Since the job fail/finish messages are now displayed in the UI, those
messages are now translated. That is all except the first few messages
in each task before the course environment is established, since a
course environment is required to obtain the language of the course.

The send_instructor_email task no longer sends an email to the
instructor after sending the emails to the students.  Instead the job
result contains all of the information that would have been in that
email. This is a far more reliable way of getting that information to
the instructor sending the email.  The instructor just needs to go to
the "Job Manager" page to see the result.  The message on the "Email"
page tells the instructor this.

This page is also available for the admin course.  In the admin course
all jobs for all courses are shown.  There is an additional column in
the jobs table that shows the course id for the course the job was
enqueued by.

The errors that are reported when sending emails are made less verbose
by calling the `message` method of the Mojo::Exception which does not
include the traceback.
…esult.

Note that the job will still succeed but now some failures/messages will
appear in the job result.  If `debug_lti_grade_passback` or
`debug_lti_parameters` are enabled then the job result will now be
rather extensive.  Everything that is sent to the log will be in result.

Also add logs back that were removed initially.  Since Minion removes
jobs after two days, the messages in the result are lost with them.  So
also log them so that they are more permanently stored.
Also use a list group instead of a native list for the lines of the job
result.
effect so that the job queue result is more thorough.
…or cleanup.

Also use the notes to pass the course id as in the other tasks.
@pstaabp pstaabp merged commit 8a23597 into openwebwork:develop Jan 8, 2024
1 check passed
@drgrice1 drgrice1 deleted the job-manager branch January 8, 2024 20:54
@pstaabp pstaabp mentioned this pull request Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants