This document describes the components of the video workflow for OCW.
SECTIONS
- Overview
- Google Drive Sync and AWS Transcoding
- YouTube Submission
- Captioning and 3Play Transcript Request
- Completing the Workflow
- Management Commands
- Testing PRs with Transcoding
This assumes that Google Drive sync, YouTube integration, AWS MediaConvert, and 3Play submission are all enabled, which is required for the video workflow.
The high-level description of the process is below, and each subsequent section contains additional details, including links to the relevant code.
- Browse to a course site in the Studio UI, go to the Resources page and click the icon to the right of the
Sync w/ Google Drive
button to open the site's Google Drive folder in the Google Drive UI. - Upload a video with the name
<video_name>.<video_extension>
to thevideos_final
folder on Google Drive, where<video_extension>
is a valid video extension, such asmp4
. If there are pre-existing captions that should be uploaded with the video (as opposed to requesting captions/transcript from 3Play), then these should be named exactly<video_name>_captions.vtt
and<video_name>_transcript.pdf
, and uploaded into thefiles_final
folder on Google Drive. - Sync using the Studio UI. This uploads the video to S3.
- As soon as the upload to S3 is complete, Studio initiates a celery task to submit the video to the AWS Media Convert service.
- Once trancoding is complete, the video is uploaded to YouTube (set as unlisted prior to the course being published).
- After the video has been successfully uploaded to YouTube, and if there are no pre-existing captions, Studio sends a transcript request to 3Play.
- Once 3Play completes the transcript job, the captions (
.vtt
format) and transcript (.pdf
format) are fetched and associated with the video. - On any publish action, the video metadata and YouTube metadata are updated, assuming the information has been received from the external services.
- The YouTube video is set to public once the course has been published to live/production.
Users upload videos in a valid video format to the videos_final
folder. Whether a file is located in this folder is used for defining the is_video property. The file is processed using the process_drive_file function, which triggers the stream_to_s3
and transcode_gdrive_video
functions, which submit the AWS MediaConvert transcoding job.
The parameters of the AWS transcode request are defined through the AWS interface, and the role is defined here. Some example JSONs used for triggering MediaConvert job are in this folder.
The TranscodeJobView
endpoint listens for the webhook that is sent when the transcoding job is complete.
Videos are uploaded to YouTube via the resumable_upload
function. The YouTube upload success notification is sent by email when the update_youtube_statuses
task is complete; exceptions in this task trigger the YouTube upload failure notification. When the course is published to draft/staging, the video is set to unlisted
. However, when it is published to live/production, the video is made public on YouTube, via the update_youtube_metadata
function. When a video is made public on YouTube, all YouTube subscribers will be notified. There are nearly 5 million subscribers to the OCW YouTube channel, so be careful with this setting.
If there are no pre-existing captions, a 3Play transcript request is generated. This is done via the threeplay_transcript_api_request
function.
Once the workflow is completed, the updates to the Video
and WebsiteContent
objects are nearly complete. The only remaining steps are triggered on course publish: updating the video metadata via update_transcripts_for_website
and updating the YouTube metadata via update_youtube_metadata
.
In cases where something may have gone wrong with the data, often due to legacy data issues, there are management commands that can be run to resolve them. The commands are defined here. These commands are:
- backpopulate_video_downloads In the existing video workflow, the MediaConvert job creates a downloadable verion as well as the YouTube version. Initially, these downloadable versions were not in the same S3 path as the course site's other resource content, and running this command moves them to the appropriate location.
- clear_webvtt_files Some captions were initially saved without an extension; this management command deletes them from S3 and clears the resource metadata, allowing them to be re-created.
- sync_missing_captions This management command syncs captions and transcripts from 3Play to course videos missing them.
- sync_transcripts. This management command syncs captions and transcripts for any videos missing them from one course (
from_course
) to another (to_course
).
Before working on, testing, or reviewing any PR that requires a video to be uploaded to YouTube, make sure that AWS buckets (instead of local Minio storage) are being used for testing. To do that, set OCW_STUDIO_ENVIRONMENT
to any value other than dev
.
Set the following variables to the same values as for RC:
AWS_ACCOUNT_ID
AWS_ACCESS_KEY_ID
AWS_REGION
AWS_ROLE_NAME
AWS_SECRET_ACCESS_KEY
AWS_STORAGE_BUCKET_NAME
DRIVE_SERVICE_ACCOUNT_CREDS
DRIVE_SHARED_ID
VIDEO_S3_TRANSCODE_ENDPOINT
VIDEO_S3_TRANSCODE_PREFIX
Upload the video to the course's Google Drive folder, as described in the Google Drive Sync and AWS Transcoding section above. Wait for the video transcoding job to complete, which requires an amount of time proportional to the length of the video; for a very short video, this should only take a few minutes.
Next, the response to the transcode request needs to be simulated. This is because the AWS MediaConvert service will not send a webhook notification to the local OCW Studio instance, but rather to the RC URL.
To simulate the response, use cURL, Postman, or an equivalent tool to POST a message to https://localhost:8043/api/transcode-jobs/
, with the body as in the example below, updated to match the relevant environment variables, course name, and video name.
{
"version": "0",
"id": "c120fe11-87db-c292-b3e5-1cc90740f6e1",
"detail-type": "MediaConvert Job State Change",
"source": "aws.mediaconvert",
"account": "<settings.AWS_ACCOUNT_ID>",
"detail": {
"timestamp": 1629911639065,
"accountId": "<settings.AWS_ACCOUNT_ID>",
"queue": "arn:aws:mediaconvert:us-east-1:919801701561:queues/Default",
"jobId": "<VideoJob.job_id>",
"status": "COMPLETE",
"userMetadata": {},
"outputGroupDetails": [
{
"outputDetails": [
{
"outputFilePaths": [
"s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_youtube.mp4"
],
"durationInMs": 45466,
"videoDetails": {
"widthInPx": 320,
"heightInPx": 176
}
},
{
"outputFilePaths": [
"s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_360p_16_9.mp4"
],
"durationInMs": 45466,
"videoDetails": {
"widthInPx": 640,
"heightInPx": 360
}
},
{
"outputFilePaths": [
"s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_360p_4_3.mp4"
],
"durationInMs": 45466,
"videoDetails": {
"widthInPx": 480,
"heightInPx": 360
}
}
],
"type": "FILE_GROUP"
}
]
}
}
making sure to set the values in <>
. In particular, set
<settings.AWS_ACCOUNT_ID>
<VideoJob.job_id>
<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>
The DriveFile
will be the one associated with the video: http://localhost:8043/admin/gdrive_sync/drivefile/.
If this completes successfully, the VideoJob
status in Django admin should be COMPLETE
, and there should now be three new VideoFile
objects populated with status
, destination
, and s3_key
fields.