Skip to content

Commit

Permalink
gather tests, correct key_prefix to key, add missing params to prod g…
Browse files Browse the repository at this point in the history
…lue role
  • Loading branch information
rxu17 committed Sep 11, 2024
1 parent a21c0d6 commit 6fbff2b
Show file tree
Hide file tree
Showing 7 changed files with 29 additions and 24 deletions.
25 changes: 16 additions & 9 deletions .github/workflows/upload-and-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -135,10 +135,11 @@ jobs:
- name: Test scripts with pytest (lambda, etc.)
run: |
pipenv run python -m pytest tests/test_s3_event_config_lambda.py -v
pipenv run python -m pytest tests/test_s3_to_glue_lambda.py -v
pipenv run python -m pytest tests/test_lambda_dispatch.py -v
pipenv run python -m pytest tests/test_consume_logs.py -v
pipenv run python -m pytest \
tests/test_s3_event_config_lambda.py \
tests/test_s3_to_glue_lambda.py \
tests/test_lambda_dispatch.py \
tests/test_consume_logs.py -v
- name: Test dev synapse folders for STS access with pytest
run: >
Expand Down Expand Up @@ -249,17 +250,23 @@ jobs:
if: github.ref_name != 'main'
run: echo "NAMESPACE=$GITHUB_REF_NAME" >> $GITHUB_ENV

- name: Run Pytest unit tests under AWS 3.0
- name: Run Pytest unit tests under AWS Glue 3.0
if: matrix.tag_name == 'aws_glue_3'
run: |
su - glue_user --command "cd $GITHUB_WORKSPACE && python3 -m pytest tests/test_s3_to_json.py -v"
su - glue_user --command "cd $GITHUB_WORKSPACE && python3 -m pytest tests/test_compare_parquet_datasets.py -v"
su - glue_user --command "cd $GITHUB_WORKSPACE && python3 -m pytest \
tests/test_s3_to_json.py \
tests/test_compare_parquet_datasets.py -v"
- name: Run Pytest unit tests under AWS 4.0
- name: Run unit tests for JSON to Parquet under AWS Glue 4.0
if: matrix.tag_name == 'aws_glue_4'
run: >
su - glue_user --command "cd $GITHUB_WORKSPACE &&
python3 -m pytest tests/test_json_to_parquet.py --namespace $NAMESPACE -v &&
python3 -m pytest tests/test_json_to_parquet.py --namespace $NAMESPACE -v"
- name: Run unit tests for Great Expectations on Parquet under AWS Glue 4.0
if: matrix.tag_name == 'aws_glue_4'
run: >
su - glue_user --command "cd $GITHUB_WORKSPACE && \
python3 -m pytest tests/test_run_great_expectations_on_parquet.py -v"
sceptre-deploy-develop:
Expand Down
2 changes: 1 addition & 1 deletion config/develop/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ parameters:
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
ShareableArtifactsBucketName: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
ExpectationSuiteKeyPrefix: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
1 change: 1 addition & 0 deletions config/prod/glue-job-role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ parameters:
S3IntermediateBucketName: {{ stack_group_config.intermediate_bucket_name }}
S3ParquetBucketName: {{ stack_group_config.processed_data_bucket_name }}
S3ArtifactBucketName: {{ stack_group_config.template_bucket_name }}
S3ShareableArtifactBucketName: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
stack_tags:
{{ stack_group_config.default_stack_tags }}
2 changes: 1 addition & 1 deletion config/prod/namespaced/glue-workflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ parameters:
S3SourceBucketName: {{ stack_group_config.input_bucket_name }}
CloudformationBucketName: {{ stack_group_config.template_bucket_name }}
ShareableArtifactsBucketName: {{ stack_group_config.shareable_artifacts_vpn_bucket_name }}
ExpectationSuiteKeyPrefix: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
ExpectationSuiteKey: "{{ stack_group_config.namespace }}/src/glue/resources/data_values_expectations.json"
stack_tags:
{{ stack_group_config.default_stack_tags }}
sceptre_user_data:
Expand Down
10 changes: 5 additions & 5 deletions src/glue/jobs/run_great_expectations_on_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def read_args() -> dict:
"cfn-bucket",
"namespace",
"data-type",
"expectation-suite-key-prefix",
"expectation-suite-key",
],
)
for arg in args:
Expand Down Expand Up @@ -241,21 +241,21 @@ def get_batch_request(
def read_json(
s3: boto3.client,
s3_bucket: str,
key_prefix: str,
key: str,
) -> Dict[str, str]:
"""Reads in a json object
Args:
s3 (boto3.client): s3 client connection
s3_bucket (str): name of the s3 bucket to read from
key_prefix (str): s3 key prefix of the
key (str): s3 key prefix of the
location of the json to read from
Returns:
Dict[str, str]: the data read in from json
"""
# read in the json filelist
s3_response_object = s3.get_object(Bucket=s3_bucket, Key=key_prefix)
s3_response_object = s3.get_object(Bucket=s3_bucket, Key=key)
json_content = s3_response_object["Body"].read().decode("utf-8")
expectations = json.loads(json_content)
return expectations
Expand Down Expand Up @@ -373,7 +373,7 @@ def main():
expectations_data = read_json(
s3=s3,
s3_bucket=args["cfn_bucket"],
key_prefix=args["expectation_suite_key_prefix"],
key=args["expectation_suite_key"],
)
logger.info("adds_expectations_from_json")
add_expectations_from_json(
Expand Down
5 changes: 2 additions & 3 deletions templates/glue-workflow.j2
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Parameters:
Type: String
Description: The name of the bucket where shareable artifacts are stored.

ExpectationSuiteKeyPrefix:
ExpectationSuiteKey:
Type: String
Description: The s3 key prefix of the expectation suite.

Expand Down Expand Up @@ -322,7 +322,7 @@ Resources:
"--cfn-bucket": !Ref CloudformationBucketName
"--parquet-bucket": !Ref ParquetBucketName
"--shareable-artifacts-bucket": !Ref ShareableArtifactsBucketName
"--expectation-suite-key-prefix": !Sub "${Namespace}/src/glue/resources/data_values_expectations.json"
"--expectation-suite-key": !Ref ExpectationSuiteKey
"--additional-python-modules": "great_expectations~=0.18,urllib3<2"
Description: This trigger runs the great expectation parquet job for this data type after completion of the JSON to Parquet job for this data type
Type: CONDITIONAL
Expand All @@ -331,7 +331,6 @@ Resources:
- JobName: !Sub "${Namespace}-{{ dataset['stackname_prefix'] }}-Job"
State: SUCCEEDED
LogicalOperator: EQUALS
Logical: AND
StartOnCreation: true
WorkflowName: !Ref JsonToParquetWorkflow
{% endfor %}
Expand Down
8 changes: 3 additions & 5 deletions tests/test_run_great_expectations_on_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ def test_that_get_batch_request_details_are_correct(test_spark):

def test_read_json_correctly_returns_expected_values():
s3_bucket = "test-bucket"
key_prefix = "test-prefix"
key = "test-key"

# Mock the S3 response
mock_s3_response = MagicMock()
Expand All @@ -229,13 +229,11 @@ def test_read_json_correctly_returns_expected_values():
mock_s3_client.return_value.get_object.return_value = mock_s3_response

# Call the function
result = run_gx_on_pq.read_json(
mock_s3_client.return_value, s3_bucket, key_prefix
)
result = run_gx_on_pq.read_json(mock_s3_client.return_value, s3_bucket, key)

# Verify that the S3 client was called with the correct parameters
mock_s3_client.return_value.get_object.assert_called_once_with(
Bucket=s3_bucket, Key=key_prefix
Bucket=s3_bucket, Key=key
)

# Verify the result
Expand Down

0 comments on commit 6fbff2b

Please sign in to comment.