Skip to content

Commit

Permalink
feat: Adjusting to be compatible with ADC regions
Browse files Browse the repository at this point in the history
  • Loading branch information
RanbirAulakh committed Feb 27, 2024
1 parent fad2f04 commit 19255a4
Show file tree
Hide file tree
Showing 16 changed files with 1,592 additions and 568 deletions.
32 changes: 7 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@ This project demonstrates a CDK Construct Library which includes the constructs

* [Useful commands](#useful-commands)
* [Components](#components)
* [Model Runner Components](#model-runner-components)
* [OSML Components](#osml-components)
* [Feature Flags](#feature-flags)
* [Deploying to ADC](#deploying-to-adc)
* [Support & Feedback](#support--feedback)
* [Security](#security)
* [License](#license)
Expand All @@ -23,29 +22,12 @@ This project demonstrates a CDK Construct Library which includes the constructs

This package contains an assortment of CDK components that may be re-used and re-purposed for your own projects. These components are split into two groups:

1. `MR*`: these are model runner components. These create the model runner infrastructure, which is responsible for leveraging the user-provided AI model against images.
2. `OSML*`: these are OversightML components. These create the OSML infrastructure for storing imagery, queueing jobs, setting up SageMaker endpoints, creating a VPC, and more.

### Model Runner Components

* `MRAutoScaling`: Creates a custom autoscaling implementation for model runner. Will automatically accommodate ADC regions as well as public/commercial regions. If set, this component will use the settings defined in the MRAutoscalingConfig.
* `MRDataplane`: This construct is responsible for managing the data plane of the model runner application. This construct makes use of many OSML constructs to create resources like the VPC, DDB tables, SQS queues, SNS topics, ECS clusters, and more. If set, this component will use the settings defined in the MRDataplaneConfig
* `MRMonitoring`: Creates a CloudWatch Dashboard for monitoring the status of the Model Runner. Tracks metrics like the number of requests in the ImageRequestQueue, SageMakerEndpoint latency, etc.
* `MRSMRole`: Creates a SageMaker execution role for hosting CV models at the SM endpoint. This role will give SageMaker full access to SQS, S3, DynamoDB, SageMaker, CloudWatch, SecretsManager, and ECS.
* `MRTaskRole`: Creates a role for Fargate/ECS, so they can access everything they need. This role will give Fargate/ECS full access to SQS, S3, DynamoDB, SageMaker, CloudWatch, SecretsManager, and ECS.
* `MRTesting`: Creates a construct for testing the Model Runner. This construct will provision resources for storing test images, for storing test results, and everything else needed for Model Runner to run tests against the testing models provided in the [osml-model-runner-test package](https://github.com/aws-solutions-library-samples/osml-model-runner-test). If set, this component will use the settings defined in the MRTestingConfig.

### OSML Components

* `OSMLAccount`: An interface that handles settings such as whether to enable auto-scaling, whether to enable monitoring, whether to use an existing VPC or create a new one, and more.
* `OSMLBucket`: Creates an OSML bucket and access logging bucket. This construct makes use of security best practices, such as encryption, enforcing SSL, and access logging.
* `OSMLECRContainer`: This construct takes a local directory and copies it to a docker image asset and deploys it to an ECR repository with the "latest" tag if a repository is provided.
* `OSMLQueue`: Creates an encrypted Queue and Dead Letter Queue.
* `OSMLRepository`: Creates an encrypted ECR repository for storing Docker images. The repository can be configured to auto-delete images when the repository is removed from the stack or the stack is deleted.
* `OSMLSMEndpoint`: Creates a SageMaker endpoint for the specified model. A model is specified by providing the URI for the container image of the model.
* `OSMLTable`: Creates a DynamoDB table with the specified partition key and sort key (if one is provided). The table will come with encryption enabled and point-in-time-recovery enabled.
* `OSMLTopic`: Creates an encrypted SNS Topic.
* `OSMLVpc`: Creates or imports a VPC for OSML. If one is created, it will have 2 subnets - one will be public, and the other will be private with egress (it can touch the internet, but not vice-versa).
1. `osml_*`: these are OversightML (osml) components. These create the OSML infrastructure for storing imagery, queueing jobs, setting up SageMaker endpoints, creating a VPC, and more.
1. `mr*`: these are model runner (mr) components. These create the model runner infrastructure, which is responsible for leveraging the user-provided AI model against images.
2. `me*`: these are model endpoint (me) components. These create the model endpoint infrastructure, which is responsible for leveraging the user-provided AI models.
3. `ts*`: these are tile serve (ts) components. These create the tile server infrastructure for queuing jobs, lambda sweeper, and more.

You can find documentation for each components hosted on our [GitHub project page](https://aws-solutions-library-samples.github.io/osml-cdk-constructs/).

## Feature Flags

Expand Down
1 change: 1 addition & 0 deletions lib/osml/model_endpoint/me_container.ts
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ export class MEContainer extends Construct {
this,
"MEContainerECRDeployment",
{
account: props.account,
sourceUri: this.config.ME_DEFAULT_CONTAINER,
repositoryName: this.config.ME_CONTAINER_REPOSITORY,
removalPolicy: this.removalPolicy,
Expand Down
5 changes: 2 additions & 3 deletions lib/osml/model_endpoint/roles/me_http_role.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,8 @@ export class MEHTTPRole extends Construct {
managedPolicies: [
ManagedPolicy.fromAwsManagedPolicyName("CloudWatchFullAccess"),
ManagedPolicy.fromAwsManagedPolicyName(
"AmazonElasticContainerRegistryPublicFullAccess"
),
ManagedPolicy.fromAwsManagedPolicyName("CloudWatchFullAccess")
"AmazonEC2ContainerRegistryFullAccess"
)
],
description:
"Allows the OversightML HTTP model endpoint to access necessary resources."
Expand Down
2 changes: 1 addition & 1 deletion lib/osml/model_endpoint/roles/me_sm_role.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ export class MESMRole extends Construct {
ManagedPolicy.fromAwsManagedPolicyName("CloudWatchFullAccess"),
ManagedPolicy.fromAwsManagedPolicyName("SecretsManagerReadWrite"),
ManagedPolicy.fromAwsManagedPolicyName(
"AmazonElasticContainerRegistryPublicFullAccess"
"AmazonEC2ContainerRegistryFullAccess"
)
],
description:
Expand Down
1 change: 1 addition & 0 deletions lib/osml/model_runner/mr_container.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ export class MRContainer extends Construct {
this,
"MRContainerECRDeployment",
{
account: props.account,
sourceUri: this.mrAppContainerConfig.MR_DEFAULT_CONTAINER,
repositoryName: this.mrAppContainerConfig.MR_CONTAINER_REPOSITORY,
removalPolicy: this.removalPolicy,
Expand Down
142 changes: 110 additions & 32 deletions lib/osml/model_runner/mr_dataplane.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@ import {
ContainerImage,
FargateService,
FireLensLogDriver,
FireLensLogDriverProps,
FirelensLogRouterType,
LogDriver,
LogDrivers,
obtainDefaultFluentBitECRImage,
Protocol,
TaskDefinition
} from "aws-cdk-lib/aws-ecs";
import { IRole } from "aws-cdk-lib/aws-iam";
import { Effect, IRole, PolicyStatement } from "aws-cdk-lib/aws-iam";
import { LogGroup, RetentionDays } from "aws-cdk-lib/aws-logs";
import { SqsSubscription } from "aws-cdk-lib/aws-sns-subscriptions";
import { Construct } from "constructs";
Expand Down Expand Up @@ -80,11 +82,12 @@ export class MRDataplaneConfig {
public MR_CONTAINER_NAME: string = "OSMLModelRunnerContainer",
public MR_TASK_MEMORY: number = 16384,
public MR_TASK_CPU: number = 8192,
public MR_CONTAINER_MEMORY: number = 15360,
public MR_CONTAINER_MEMORY: number = 10240,
public MR_CONTAINER_CPU: number = 7168,
public MR_LOGGING_MEMORY: number = 1024,
public MR_LOGGING_CPU: number = 1024,
public MR_WORKERS_PER_CPU: number = 2,
public MR_DEFAULT_DESIRE_COUNT: number = 1,
public MR_REGION_SIZE: string = "(8192, 8192)",
public MR_ENABLE_IMAGE_STATUS: boolean = true,
public MR_ENABLE_REGION_STATUS: boolean = false
Expand Down Expand Up @@ -210,6 +213,24 @@ export class MRDataplane extends Construct {
}).role;
}

// Set up an ADC (isolated) regional S3 endpoint for GDAL to use
class S3FactISO implements region_info.IFact {
public readonly region = "us-iso-east-1";
public readonly name =
region_info.FactName.servicePrincipal("s3.amazonaws.com");
public readonly value = "s3.us-iso-east-1.c2s.ic.gov";
}

class S3FactISOB implements region_info.IFact {
public readonly region = "us-isob-east-1";
public readonly name =
region_info.FactName.servicePrincipal("s3.amazonaws.com");
public readonly value = "s3.us-isob-east-1.sc2s.sgov.gov";
}

region_info.Fact.register(new S3FactISO(), true);
region_info.Fact.register(new S3FactISOB(), true);

// Set up a regional S3 endpoint for GDAL to use
this.regionalS3Endpoint = region_info.Fact.find(
props.account.region,
Expand Down Expand Up @@ -346,32 +367,6 @@ export class MRDataplane extends Construct {
// Build our container to run our service
const containerEnv = this.buildContainerEnv(props);

// Build a container definition to run our service
this.containerDefinition = this.taskDefinition.addContainer(
"MRContainerDefinition",
{
containerName: this.mrDataplaneConfig.MR_CONTAINER_NAME,
image: props.mrContainerImage,
memoryLimitMiB: this.mrDataplaneConfig.MR_CONTAINER_MEMORY,
cpu: this.mrDataplaneConfig.MR_CONTAINER_CPU,
environment: containerEnv,
startTimeout: Duration.minutes(1),
stopTimeout: Duration.minutes(1),
// Create a log group for console output (STDOUT)
logging: new FireLensLogDriver({
options: {
Name: "cloudwatch",
region: props.account.region,
log_key: "log",
log_format: "json/emf",
log_group_name: this.logGroup.logGroupName,
log_stream_prefix: "${TASK_ID}/"
}
}),
disableNetworking: false
}
);

// Add port mapping to container
this.taskDefinition.defaultContainer?.addPortMappings({
containerPort: 80,
Expand All @@ -396,15 +391,58 @@ export class MRDataplane extends Construct {
cluster: this.cluster,
minHealthyPercent: 100,
securityGroups: this.securityGroups,
vpcSubnets: props.osmlVpc.selectedSubnets
vpcSubnets: props.osmlVpc.selectedSubnets,
desiredCount: this.mrDataplaneConfig.MR_DEFAULT_DESIRE_COUNT
});

// Set up Logging Options
const loggingOptions: { [key: string]: any } = {
options: {
Name: "cloudwatch",
region: props.account.region,
log_group_name: this.logGroup.logGroupName,
log_format: "json/emf",
log_key: "log",
log_stream_prefix: "${TASK_ID}/"
}
};

const logging = LogDrivers.firelens(
loggingOptions as FireLensLogDriverProps
);

// Build a fluent bit log router for the MR container
this.taskDefinition.addFirelensLogRouter("MRFireLensContainer", {
image: obtainDefaultFluentBitECRImage(
let fluentBitImage;
if (props.account.isAdc) {
// Get Fluent Bit Container from ADC Repo
if (props.account.region === "us-iso-east-1") {
fluentBitImage = ContainerImage.fromRegistry(
`${props.account.id}.dkr.ecr.us-iso-east-1.c2s.ic.gov/aws-for-fluent-bit:latest`
);
loggingOptions.options.endpoint =
"https://logs.us-iso-east-1.c2s.ic.gov";
} else if (props.account.region === "us-isob-east-1") {
fluentBitImage = ContainerImage.fromRegistry(
`${props.account.id}.dkr.ecr.us-isob-east-1.sc2s.sgov.gov/aws-for-fluent-bit:latest`
);
loggingOptions.options.endpoint =
"https://logs.us-isob-east-1.sc2s.sgov.gov";
} else {
fluentBitImage = obtainDefaultFluentBitECRImage(
this.taskDefinition,
this.taskDefinition.defaultContainer?.logDriverConfig
);
}
} else {
fluentBitImage = obtainDefaultFluentBitECRImage(
this.taskDefinition,
this.taskDefinition.defaultContainer?.logDriverConfig
),
);
}

// Build a fluent bit log router for the MR container
this.taskDefinition.addFirelensLogRouter("MRFireLensContainer", {
image: fluentBitImage,
essential: true,
firelensConfig: {
type: FirelensLogRouterType.FLUENTBIT
Expand All @@ -429,6 +467,46 @@ export class MRDataplane extends Construct {
retries: 3
}
});

// Build a container definition to run our service
this.containerDefinition = this.taskDefinition.addContainer(
"MRContainerDefinition",
{
containerName: this.mrDataplaneConfig.MR_CONTAINER_NAME,
image: props.mrContainerImage,
memoryLimitMiB: this.mrDataplaneConfig.MR_CONTAINER_MEMORY,
cpu: this.mrDataplaneConfig.MR_CONTAINER_CPU,
environment: containerEnv,
startTimeout: Duration.minutes(1),
stopTimeout: Duration.minutes(1),
// Create a log group for console output (STDOUT)
logging: logging,
disableNetworking: false
}
);

if (props.account.isAdc) {
const partition: string = region_info.Fact.find(
props.account.region,
region_info.FactName.PARTITION
)!;

// need to add permission to access fluent bit container
// within the account
this.taskDefinition.addToExecutionRolePolicy(
new PolicyStatement({
effect: Effect.ALLOW,
actions: [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
resources: [
`arn:${partition}:ecr:${props.account.region}:${props.account.id}:repository/aws-for-fluent-bit`
]
})
);
}
}

buildContainerEnv(props: MRDataplaneProps) {
Expand Down
2 changes: 1 addition & 1 deletion lib/osml/model_runner/roles/mr_task_role.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ export class MRTaskRole extends Construct {
ManagedPolicy.fromAwsManagedPolicyName("CloudWatchFullAccess"),
ManagedPolicy.fromAwsManagedPolicyName("SecretsManagerReadWrite"),
ManagedPolicy.fromAwsManagedPolicyName(
"AmazonElasticContainerRegistryPublicFullAccess"
"AmazonEC2ContainerRegistryFullAccess"
)
],
description:
Expand Down
3 changes: 2 additions & 1 deletion lib/osml/model_runner/testing/mr_endpoints.ts
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,8 @@ export class MREndpoints extends Construct {
this.securityGroupId = props.osmlVpc.vpcDefaultSecurityGroup;
}
}
if (props.deployHttpCenterpointModel != false) {
// Disabling in ADC due to EphemeralStorage not available
if (props.deployHttpCenterpointModel != false && !props.account.isAdc) {
// Check if a role was provided for the HTTP endpoint
if (props.httpEndpointRole != undefined) {
// Import passed custom role for the HTTP endpoint
Expand Down
8 changes: 7 additions & 1 deletion lib/osml/model_runner/testing/mr_sync.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
*/

import { RemovalPolicy } from "aws-cdk-lib";
import { Stream, StreamMode } from "aws-cdk-lib/aws-kinesis";
import { CfnStream, Stream, StreamMode } from "aws-cdk-lib/aws-kinesis";
import { Construct } from "constructs";

import { OSMLAccount } from "../../osml_account";
Expand Down Expand Up @@ -123,6 +123,12 @@ export class MRSync extends Construct {
streamMode: StreamMode.PROVISIONED,
shardCount: 1
});

// https://github.com/aws/aws-cdk/issues/19652
if (props.account.isAdc) {
const cfnStream = this.resultStream.node.defaultChild as CfnStream;
cfnStream.addPropertyDeletionOverride("StreamModeDetails");
}
}
}
}
11 changes: 10 additions & 1 deletion lib/osml/osml_ecr_deployment.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,21 @@ import { DockerImageName, ECRDeployment } from "cdk-ecr-deployment";
import { Construct } from "constructs";

import { OSMLRepository } from "./osml_repository";
import { OSMLAccount } from "./osml_account";

/**
* Interface representing the properties for the OSMLECRDeployment Construct.
*
* @interface OSMLECRDeploymentProps
*/
export interface OSMLECRDeploymentProps {
/**
* The OSML account associated with this VPC.
*
* @type {OSMLAccount}
*/
account: OSMLAccount;

/**
* The URI of the source for the container image.
*
Expand Down Expand Up @@ -89,7 +97,8 @@ export class OSMLECRDeployment extends Construct {
// Build an ECR repository for the model runner container.
this.ecrRepository = new OSMLRepository(this, `ECRRepository${id}`, {
repositoryName: props.repositoryName,
removalPolicy: props.removalPolicy
removalPolicy: props.removalPolicy,
isAdc: props.account.isAdc
}).repository;

// Get the latest image associated with the repository.
Expand Down
6 changes: 6 additions & 0 deletions lib/osml/osml_repository.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ export interface OSMLRepositoryProps {
* @type {RemovalPolicy}
*/
removalPolicy: RemovalPolicy;

/**
* Check to see if its region is ADC
* @type {RemovalPolicy}
*/
isAdc?: boolean;
}

/**
Expand Down
3 changes: 3 additions & 0 deletions lib/osml/osml_vpc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ export class OSMLVpc extends Construct {
constructor(scope: Construct, id: string, props: OSMLVpcProps) {
super(scope, id);

const isIsoB = props.account.region === "us-isob-east-1";

// if an osmlVpc ID is not explicitly given, use the default osmlVpc
if (props.vpcId) {
this.vpc = Vpc.fromLookup(this, "OSMLImportVPC", {
Expand All @@ -89,6 +91,7 @@ export class OSMLVpc extends Construct {
// Create a new VPC
const vpc = new Vpc(this, "OSMLVPC", {
vpcName: props.vpcName,
maxAzs: isIsoB ? 2 : 3,
subnetConfiguration: [
{
cidrMask: 23,
Expand Down
1 change: 1 addition & 0 deletions lib/osml/tile_server/ts_container.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ export class TSContainer extends Construct {
this,
"TSContainerECRDeployment",
{
account: props.account,
sourceUri: this.config.TS_CONTAINER,
repositoryName: this.config.TS_REPOSITORY,
removalPolicy: this.removalPolicy,
Expand Down
Loading

0 comments on commit 19255a4

Please sign in to comment.