Video Workflow

This document describes the components of the video workflow for OCW.

SECTIONS

Overview
Google Drive Sync and AWS Transcoding
YouTube Submission
Captioning and 3Play Transcript Request
Completing the Workflow
Management Commands
Testing PRs with Transcoding

Overview

This assumes that Google Drive sync, YouTube integration, AWS MediaConvert, and 3Play submission are all enabled, which is required for the video workflow.

The high-level description of the process is below, and each subsequent section contains additional details, including links to the relevant code.

Browse to a course site in the Studio UI, go to the Resources page and click the icon to the right of the Sync w/ Google Drive button to open the site's Google Drive folder in the Google Drive UI.
Upload a video with the name <video_name>.<video_extension> to the videos_final folder on Google Drive, where <video_extension> is a valid video extension, such as mp4. If there are pre-existing captions that should be uploaded with the video (as opposed to requesting captions/transcript from 3Play), then these should be named exactly <video_name>_captions.vtt and <video_name>_transcript.pdf, and uploaded into the files_final folder on Google Drive.
Sync using the Studio UI. This uploads the video to S3.
As soon as the upload to S3 is complete, Studio initiates a celery task to submit the video to the AWS Media Convert service.
Once trancoding is complete, the video is uploaded to YouTube (set as unlisted prior to the course being published).
After the video has been successfully uploaded to YouTube, and if there are no pre-existing captions, Studio sends a transcript request to 3Play.
Once 3Play completes the transcript job, the captions (.vtt format) and transcript (.pdf format) are fetched and associated with the video.
On any publish action, the video metadata and YouTube metadata are updated, assuming the information has been received from the external services.
The YouTube video is set to public once the course has been published to live/production.

Google Drive Sync and AWS Transcoding

Users upload videos in a valid video format to the videos_final folder. Whether a file is located in this folder is used for defining the is_video property. The file is processed using the process_drive_file function, which triggers the stream_to_s3 and transcode_gdrive_video functions, which submit the AWS MediaConvert transcoding job.

The parameters of the AWS transcode request are defined through the AWS interface, and the role is defined here. Some example JSONs used for triggering MediaConvert job are in this folder.

The TranscodeJobView endpoint listens for the webhook that is sent when the transcoding job is complete.

YouTube Submission

Videos are uploaded to YouTube via the resumable_upload function. The YouTube upload success notification is sent by email when the update_youtube_statuses task is complete; exceptions in this task trigger the YouTube upload failure notification. When the course is published to draft/staging, the video is set to unlisted. However, when it is published to live/production, the video is made public on YouTube, via the update_youtube_metadata function. When a video is made public on YouTube, all YouTube subscribers will be notified. There are nearly 5 million subscribers to the OCW YouTube channel, so be careful with this setting.

Captioning and 3Play Transcript Request

If there are no pre-existing captions, a 3Play transcript request is generated. This is done via the threeplay_transcript_api_request function.

Completing the Workflow

Once the workflow is completed, the updates to the Video and WebsiteContent objects are nearly complete. The only remaining steps are triggered on course publish: updating the video metadata via update_transcripts_for_website and updating the YouTube metadata via update_youtube_metadata.

Management Commands

In cases where something may have gone wrong with the data, often due to legacy data issues, there are management commands that can be run to resolve them. The commands are defined here. These commands are:

backpopulate_video_downloads In the existing video workflow, the MediaConvert job creates a downloadable verion as well as the YouTube version. Initially, these downloadable versions were not in the same S3 path as the course site's other resource content, and running this command moves them to the appropriate location.
clear_webvtt_files Some captions were initially saved without an extension; this management command deletes them from S3 and clears the resource metadata, allowing them to be re-created.
sync_missing_captions This management command syncs captions and transcripts from 3Play to course videos missing them.
sync_transcripts. This management command syncs captions and transcripts for any videos missing them from one course (from_course) to another (to_course).

Testing PRs with Transcoding

Before working on, testing, or reviewing any PR that requires a video to be uploaded to YouTube, make sure that AWS buckets (instead of local Minio storage) are being used for testing. To do that, set OCW_STUDIO_ENVIRONMENT to any value other than dev.

Set the following variables to the same values as for RC:

AWS_ACCOUNT_ID
AWS_ACCESS_KEY_ID
AWS_REGION
AWS_ROLE_NAME
AWS_SECRET_ACCESS_KEY
AWS_STORAGE_BUCKET_NAME
DRIVE_SERVICE_ACCOUNT_CREDS
DRIVE_SHARED_ID
VIDEO_S3_TRANSCODE_ENDPOINT
VIDEO_S3_TRANSCODE_PREFIX

Upload the video to the course's Google Drive folder, as described in the Google Drive Sync and AWS Transcoding section above. Wait for the video transcoding job to complete, which requires an amount of time proportional to the length of the video; for a very short video, this should only take a few minutes.

Next, the response to the transcode request needs to be simulated. This is because the AWS MediaConvert service will not send a webhook notification to the local OCW Studio instance, but rather to the RC URL.

To simulate the response, use cURL, Postman, or an equivalent tool to POST a message to https://localhost:8043/api/transcode-jobs/, with the body as in the example below, updated to match the relevant environment variables, course name, and video name.

{
  "version": "0",
  "id": "c120fe11-87db-c292-b3e5-1cc90740f6e1",
  "detail-type": "MediaConvert Job State Change",
  "source": "aws.mediaconvert",
  "account": "<settings.AWS_ACCOUNT_ID>",
  "detail": {
    "timestamp": 1629911639065,
    "accountId": "<settings.AWS_ACCOUNT_ID>",
    "queue": "arn:aws:mediaconvert:us-east-1:919801701561:queues/Default",
    "jobId": "<VideoJob.job_id>",
    "status": "COMPLETE",
    "userMetadata": {},
    "outputGroupDetails": [
      {
        "outputDetails": [
          {
            "outputFilePaths": [
              "s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_youtube.mp4"
            ],
            "durationInMs": 45466,
            "videoDetails": {
              "widthInPx": 320,
              "heightInPx": 176
            }
          },
          {
            "outputFilePaths": [
              "s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_360p_16_9.mp4"
            ],
            "durationInMs": 45466,
            "videoDetails": {
              "widthInPx": 640,
              "heightInPx": 360
            }
          },
          {
            "outputFilePaths": [
              "s3://<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>_360p_4_3.mp4"
            ],
            "durationInMs": 45466,
            "videoDetails": {
              "widthInPx": 480,
              "heightInPx": 360
            }
          }
        ],
        "type": "FILE_GROUP"
      }
    ]
  }
}

making sure to set the values in <>. In particular, set

<settings.AWS_ACCOUNT_ID>
<VideoJob.job_id>
<settings.AWS_STORAGE_BUCKET_NAME>/aws_mediaconvert_transcodes/<Website.short_id>/<DriveFile.file_id>/<original_video_filename_base>

The DriveFile will be the one associated with the video: http://localhost:8043/admin/gdrive_sync/drivefile/.

If this completes successfully, the VideoJob status in Django admin should be COMPLETE, and there should now be three new VideoFile objects populated with status, destination, and s3_key fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Video Workflow

Overview

Google Drive Sync and AWS Transcoding

YouTube Submission

Captioning and 3Play Transcript Request

Completing the Workflow

Management Commands

Testing PRs with Transcoding

Files

README.md

Latest commit

History

README.md

File metadata and controls

Video Workflow

Overview

Google Drive Sync and AWS Transcoding

YouTube Submission

Captioning and 3Play Transcript Request

Completing the Workflow

Management Commands

Testing PRs with Transcoding