Skip to content

Releases: aws-solutions-library-samples/data-lakes-on-aws

Serverless Data Lake Framework 2.0.1

15 Jan 16:59
Compare
Choose a tag to compare

This is a minor release for SDLF 2.x. See the release notes for SDLF 2.0.0 to learn about all the changes in this major version.

For users of SDLF 1.x, version 1 is still available on the master branch. Development of newer versions of SDLF (2.x) happens on branch main. The workshop still contains sections for version 1 as well.

What's Changed

  • remove sdlf-utils exclusion in static-checking workflow by @cnfait in #234
  • remove unused checkjob lambda from sdlf-stageB by @gwenika in #237
  • [preview] Replace Deequ with Glue Data Quality by @Mreddy19 in #235
  • wait for glue crawler to complete in stageB by @cnfait in #240
  • update s3 access logs and artifacts buckets names to avoid length issue by @cnfait in #241

Full Changelog: 2.0.0...2.0.1

Thanks

We thank @kangsoon93 for raising the issue on S3 bucket names, and we thank all the contributors/users for their work on this release!

Serverless Data Lake Framework 2.0.0

01 Dec 16:35
Compare
Choose a tag to compare

Serverless Data Lake Framework 2.0.0 is now available.

The workshop has been updated.

For users of SDLF 1.x, version 1 is still available on the master branch. Development of newer versions of SDLF (2.x) happens on branch main. The workshop still contains sections for version 1 as well.

GPG signature files are included for archives in this release's assets files. You can use our public key to verify that the downloaded archive file is original and unmodified.

We welcome your feedback!

What’s New

SDLF 2.0 is very much in the same spirit as SDLF 1.0 - the constructs are the same and CloudFormation is still used as the language for provisioning the infrastructure. 2.0 intends to fix long-standing issues with SDLF, extend its usage to more data architecture patterns and bring commonly-asked features to the framework.

  • SDLF components are now CloudFormation modules
    • there is one module per component: foundations, team, pipeline, stageA, stageB, dataset.
    • datalakeLibrary is used to build a Lambda layer, it is not a CloudFormation module.
    • deploy.sh takes care of deploying the CICD infrastructure used to build these modules, and register them in the private CloudFormation registry of each account. Modules are updated whenever there is a deployment that requires them.
  • SDLF CICD pipelines now live in the Shared DevOps account
    • CloudFormation stacks are created in child accounts through crossaccount IAM roles.
  • SDLF can deploy an arbitrary number of child accounts driven from a single devops account.
    • pDomain (which defaults to datalake) can be provided when deploying foundations.
    • each domain can have the usual three environments (dev, test, prod).
  • Deploying foundations and teams is now done from a new repository called sdlf-main.
    • this repository is created during the initial setup with deploy.sh.
    • foundations and teams deployment happens in datadomain-{domain}-{env}.yaml.
    • sdlf-main works the same way everything works in SDLF - main, test and dev branches are expected.
    • it is easier to know which teams have been created, and to remove them as they don’t share the same set of parameters in parameters-{env}.json.
  • Deploying pipelines and datasets is now done from a new repository called sdlf-main-{domain}-{team}.
    • this repository is created when a new team is created.
    • pipelines deployment happens in pipelines.yaml and datasets in datasets.yaml.
    • sdlf-main-{domain}-{team} works the same way everything works in SDLF - main, test and dev branches are expected.
    • it is easier to know which pipelines and datasets have been created, and to remove them as they don’t share the same set of parameters in parameters-{env}.json.
  • Mappings between datasets and transforms in stageB is done directly when defining a dataset.
    • this mapping used to be done by a CodeBuild project and a script in sdlf-datalakeLibrary. They are no longer needed and have been removed.
    • it is now defined through the pPipelineDetails parameter when defining a dataset in sdlf-dataset. This parameter goes even further and can be used to store more information that stages can use. These details are stored in the Datasets DynamoDB table (as was already the case in SDLF 1.x).
  • Stages in a pipeline are now driven by EventBridge rules exclusively.
    • the rule can be an event pattern or a schedule (cron expression).
    • stageA is no longer sending messages to a queue for stageB to process. StageB is configured with an event pattern to listen for stageA runs (pEventPattern in the example), and then process these events on a schedule (pSchedule)
    • it is easier now to have pipelines with a single stage, pipelines with dependent stages and overall more complex pipelines than in SDLF 1.x, as long as there is an event pattern to listen for.
  • VPC support is available as an optional feature.
  • sdlf-pipLibrary is now part of an optional feature called Lambda Layer Deployer. Files related to this feature are part of a team main repository (sdlf-main-{domain}-{team}) under the layers folder.
  • The Glue Job Deployer is now an optional feature built in SDLF rather than an add-on from sdlf-utils. Files related to this feature are part of a team main repository (sdlf-main-{domain}-{team}) under the transforms folder.
  • New optional component: sdlf-monitoring, with CloudTrail, ELK forwarding and SNS.
    • in SDLF 1.x Cloudtrail is optional but enabled by default. Here it is optional and not enabled as long as sdlf-monitoring is not deployed.
  • Deequ support has been removed entirely. While it wasn’t enabled by default in SDLF 1.x, dedicated infrastructure was still created while deploying sdlf-foundations. This is no longer the case.
    • sdlf-stage-dataquality will soon be available as an example on how to add a third stage making use of Glue Data Quality.
  • Outside the initial deploy.sh, there is no more shell scripts.

Full Changelog: 1.5.2...2.0.0

New Contributors

Thanks

We thank all the contributors/users for their work on this release!

Serverless Data Lake Framework 2.0.0-rc.4

01 Dec 11:45
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • refactor: remove usage of boto3 resource by @cnfait in #228
  • move the -{env} suffix from layers and transforms too by @cnfait in #229
  • add logs permissions to glue crawler role by @cnfait in #230
  • remove everything that we do not officially support at launch by @cnfait in 1afb677
  • remove role name for legislators glue job role, update path by @cnfait in 13a36d2
  • sdlf-monitoring: remove ELK stack by @cnfait in 0970b53

Full Changelog: 2.0.0-rc.3...2.0.0-rc.4

Thanks

We thank all the contributors/users for their work on this release.

Serverless Data Lake Framework 2.0.0-rc.3

20 Nov 08:33
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • access control using lakeformation only by @cnfait in #219
  • add job in static-checking github action: black by @gwenika in #220
  • add job in static-checking github action: ruff by @gwenika in #221
  • add job in static-checking github action: shellcheck by @gwenika in #222
  • fix all static checking findings (black, ruff, shellcheck, cfn-lint) by @cnfait in #223
  • Fix all cfn_nag findings by @cnfait in #225
  • remove the -{env} suffix from files in team repository by @cnfait in #226
  • add workshop examples to sdlf-utils by @cnfait in #227

Full Changelog: 2.0.0-rc.2...2.0.0-rc.3

Thanks

We thank all the contributors/users for their work on this release.

Serverless Data Lake Framework 2.0.0-rc.2

07 Nov 09:30
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • get size and last modified metadata from s3 in a single request by @cnfait in #215
  • Crossaccount roles simplification by @cnfait in #216
  • add batch capability to updating object metadata in datalakeLibrary by @cnfait in #217
  • fix single account workshop/demo setup by @cnfait in #218

Full Changelog: 2.0.0-rc.1...2.0.0-rc.2

Serverless Data Lake Framework 2.0.0-rc.1

10 Oct 09:22
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • create an athena workgroup per team by @gwenika in #208
  • add optional glue job deployer feature by @cnfait in #209
  • Optional VPC support for SDLF by @cnfait in #211
  • trigger rMain pipelines on sdlf-cicd repository change by @cnfait in 4fa2a4a
  • sdlf-team: log groups for datasets/pipelines-dynamodb lambda functions by @cnfait in #212
  • restore min/max_items_process feature at the pipeline level by @cnfait in #213
  • provide vpc connection to glue jobs in glue-job-deployer, add disable-proxy-v2 to default arguments by @cnfait in #214

Full Changelog: 2.0.0-rc.0...2.0.0-rc.1

Thanks

We thank all the contributors/users for their work on this release, in particular @gwenika.

Serverless Data Lake Framework 2.0.0-rc.0

19 Sep 07:37
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • automatically create codepipeline infrastructure for new stages by @cnfait in #190
  • StageA state machine update by @cnfait in #191
  • Python 3.11 as default for CodeBuild runtimes. Also update Codebuild image to latest AmazonLinux2 by @Mreddy19 in #192
  • Python 3.11 as default for Lambda functions by @Mreddy19 in #193
  • StageB state machine update by @cnfait in #194
  • Enforce the use of aws-cli >= 2.13.0 by @Mreddy19 in #196
  • fix sdlf-stageB glue arguments from dynamodb by @cnfait in #198
  • Configurable lambda stageA is now configurable through dynamodb by @cnfait in #199
  • StageB: configurable glue job name through dynamodb by @cnfait in #200
  • deploy datalakelibrary using cloudformation by @cnfait in #201
  • use 'main' as the default branch name instead of 'master' by @cnfait in #202
  • Use KMS for base encryption of S3 buckets by @cnfait in #203
  • Build CloudFormation modules as part of sdlf-main/sdlf-main-* repository pipelines by @cnfait in #204
  • update GlueVersion from 2.0 to 4.0 by @cnfait in #205

Full Changelog: 2.0.0-beta.4...2.0.0-rc.0

Thanks

We thank all the contributors/users for their work on this release, in particular @Mreddy19.

Serverless Data Lake Framework 2.0.0-beta.4

04 Aug 16:14
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • Team-specific IAM role for pipelines and datasets creation by @cnfait in #183
  • sdlf-pipeline cloudformation short form syntax by @cnfait in #184
  • Codebuild awscli version by @mousamm in #185
  • sdlf-cicd refactoring by @cnfait in #186
  • Fix BuildLambdaLayers (previously known as sdlf-pipLibrary) by @cnfait in #187
  • use sdlf-main-{domain}-{team} naming scheme instead of sdlf-{domain}-{team}-main by @cnfait in #188
  • restore StageA+StageB codepipeline by @cnfait in #189

Full Changelog: 2.0.0-beta.3...2.0.0-beta.4

Thanks

We thank all the contributors/users for their work on this release, in particular @mousamm.

Serverless Data Lake Framework 2.0.0-beta.3

22 Jul 11:06
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • sdlf-foundations: use s3 eventbridge support by @cnfait in #180
  • Fix cfn_nag issues in sdlf-foundations by @cnfait in #181
  • store the aws-sam-cli zip in s3 when setting up sdlf by @cnfait in #182
  • sdlf-monitoring: migrate from Elasticsearch 7.10 to OpenSearch 2.5 by @tomaszwrzonski in #167

Full Changelog: 2.0.0-beta.2...2.0.0-beta.3

Thanks

We thank all the contributors/users for their work on this release, in particular @tomaszwrzonski.

Serverless Data Lake Framework 2.0.0-beta.2

11 Jul 21:47
Compare
Choose a tag to compare

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What's Changed

Read Serverless Data Lake Framework 2.0.0-beta.0 to know more about what's new in SDLF 2.0.0!

  • Explicit CloudFormation removal policy by @cnfait in #163
  • Fix sdlf-monitoring template issues by @cnfait in #164
  • allow tagging infrastructure with stack tags by @cnfait in #169
  • IAM permissions cleanup by @cnfait in #170
  • migration to EventBridge Scheduler by @cnfait in #171
  • CloudFormationManagedUploadInfrastructure permissions update by @cnfait in #172
  • move the states execution role from sdlf-team to sdlf-stage* by @cnfait in #173
  • remove old test infrastructure by @cnfait in #174
  • sdlf-team: fix cfn_nag issues by @cnfait in #175
  • replace jaidisido with cnfait in github issue templates by @cnfait in #176
  • disable cfn_nag W76 SPCM too high by @cnfait in #177
  • sdlf-stage*: fix cfn_nag issues by @cnfait in #178

Full Changelog: 2.0.0-beta.1...2.0.0-beta.2