Skip to content

Commit

Permalink
[ETL-636] Raw Lambda (#138)
Browse files Browse the repository at this point in the history
* raw lambda initial commit

* yield compressed data up to part threshold

* complete implementation and add tests

* minor update to dispatch lambda module docstring

* Add analogous prod stacks
  • Loading branch information
philerooski authored Sep 6, 2024
1 parent fbd83a6 commit a68065b
Show file tree
Hide file tree
Showing 12 changed files with 756 additions and 4 deletions.
13 changes: 13 additions & 0 deletions config/develop/namespaced/lambda-raw-role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
template:
path: lambda-raw-role.yaml
stack_name: "{{ stack_group_config.namespace }}-lambda-raw-role"
dependencies:
- develop/namespaced/sqs-dispatch-to-raw.yaml
- develop/s3-cloudformation-bucket.yaml
- develop/s3-raw-bucket.yaml
parameters:
SQSQueueArn: !stack_output_external "{{ stack_group_config.namespace }}-sqs-dispatch-to-raw::PrimaryQueueArn"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
S3TargetBucketName: {{ stack_group_config.raw_bucket_name }}
stack_tags:
{{ stack_group_config.default_stack_tags }}
17 changes: 17 additions & 0 deletions config/develop/namespaced/lambda-raw.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
template:
type: sam
path: src/lambda_function/raw/template.yaml
artifact_bucket_name: {{ stack_group_config.template_bucket_name }}
artifact_prefix: "{{ stack_group_config.namespace }}/src/lambda"
dependencies:
- develop/namespaced/lambda-raw-role.yaml
- develop/namespaced/sqs-dispatch-to-raw.yaml
- develop/s3-cloudformation-bucket.yaml
- develop/s3-raw-bucket.yaml
stack_name: "{{ stack_group_config.namespace }}-lambda-raw"
parameters:
RoleArn: !stack_output_external "{{ stack_group_config.namespace }}-lambda-raw-role::RoleArn"
SQSQueueArn: !stack_output_external "{{ stack_group_config.namespace }}-sqs-dispatch-to-raw::PrimaryQueueArn"
S3RawBucket: {{ stack_group_config.raw_bucket_name }}
S3RawKeyPrefix: "{{ stack_group_config.namespace }}/json/"
stack_tags: {{ stack_group_config.default_stack_tags }}
2 changes: 1 addition & 1 deletion config/develop/namespaced/sqs-dispatch-to-raw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ template:
parameters:
MessageRetentionPeriod: "1209600"
ReceiveMessageWaitTimeSeconds: "20"
VisibilityTimeout: "120"
VisibilityTimeout: "900"
SNSTopicSubscription: !stack_output_external "{{ stack_group_config.namespace }}-sns-dispatch::SnsTopicArn"
dependencies:
- develop/namespaced/sns-dispatch.yaml
Expand Down
13 changes: 13 additions & 0 deletions config/prod/namespaced/lambda-raw-role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
template:
path: lambda-raw-role.yaml
stack_name: "{{ stack_group_config.namespace }}-lambda-raw-role"
dependencies:
- prod/namespaced/sqs-dispatch-to-raw.yaml
- prod/s3-cloudformation-bucket.yaml
- prod/s3-raw-bucket.yaml
parameters:
SQSQueueArn: !stack_output_external "{{ stack_group_config.namespace }}-sqs-dispatch-to-raw::PrimaryQueueArn"
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
S3TargetBucketName: {{ stack_group_config.raw_bucket_name }}
stack_tags:
{{ stack_group_config.default_stack_tags }}
17 changes: 17 additions & 0 deletions config/prod/namespaced/lambda-raw.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
template:
type: sam
path: src/lambda_function/raw/template.yaml
artifact_bucket_name: {{ stack_group_config.template_bucket_name }}
artifact_prefix: "{{ stack_group_config.namespace }}/src/lambda"
dependencies:
- prod/namespaced/lambda-raw-role.yaml
- prod/namespaced/sqs-dispatch-to-raw.yaml
- prod/s3-cloudformation-bucket.yaml
- prod/s3-raw-bucket.yaml
stack_name: "{{ stack_group_config.namespace }}-lambda-raw"
parameters:
RoleArn: !stack_output_external "{{ stack_group_config.namespace }}-lambda-raw-role::RoleArn"
SQSQueueArn: !stack_output_external "{{ stack_group_config.namespace }}-sqs-dispatch-to-raw::PrimaryQueueArn"
S3RawBucket: {{ stack_group_config.raw_bucket_name }}
S3RawKeyPrefix: "{{ stack_group_config.namespace }}/json/"
stack_tags: {{ stack_group_config.default_stack_tags }}
2 changes: 1 addition & 1 deletion config/prod/namespaced/sqs-dispatch-to-raw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ template:
parameters:
MessageRetentionPeriod: "1209600"
ReceiveMessageWaitTimeSeconds: "20"
VisibilityTimeout: "120"
VisibilityTimeout: "900"
SNSTopicSubscription: !stack_output_external "{{ stack_group_config.namespace }}-sns-dispatch::SnsTopicArn"
dependencies:
- prod/namespaced/sns-dispatch.yaml
Expand Down
5 changes: 3 additions & 2 deletions src/lambda_function/dispatch/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
Dispatch Lambda
This Lambda polls the input-to-dispatch SQS queue and publishes to the dispatch SNS topic.
Its purpose is to inspect each export and dispatch each file as a separate job in which
the file will be decompressed and uploaded to S3.
Its purpose is to inspect each export and dispatch each file with a non-zero size as a
separate job.
"""

import json
import logging
import os
Expand Down
36 changes: 36 additions & 0 deletions src/lambda_function/raw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Raw Lambda

The raw Lambda polls the dispatch-to-raw SQS queue and uploads an object to the raw S3 bucket.
Its purpose is to compress a single JSON file from an export (zip archive) and store it to S3.
It makes heavy use of Python file objects and multipart uploads and can download/compress/upload
with a relatively low, fixed memory overhead with respect to the size of the uncompressed JSON.

## Development

The Serverless Application Model Command Line Interface (SAM CLI) is an
extension of the AWS CLI that adds functionality for building and testing
Lambda applications.

To use the SAM CLI, you need the following tools.

* SAM CLI - [Install the SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html)
* Docker - [Install Docker community edition](https://hub.docker.com/search/?type=edition&offering=community)

You may need the following for local testing.
* [Python 3 installed](https://www.python.org/downloads/)

You will also need to configure your AWS credentials, if you have not already done so.

## Creating a local build

Use the SAM CLI to build and test your lambda locally.
Build your application with the `sam build` command.

```bash
cd src/lambda_function/raw/
sam build
```

## Tests

Tests are available in `tests/test_raw_lambda.py`.
Loading

0 comments on commit a68065b

Please sign in to comment.