You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AWS Solutions Centralized Logging with OpenSearch Solution works as expected for low volume traffic to the Lambda functions.
However, once the Lambda function begins to experience high volumes, we observed that the Lambda function begins to experience the "An error occurred (ExpiredToken) when calling the GetObject operation: The provided token has expired." error, which results in gaps/ missing metrics in the OpenSearch dashboard.
From a deep dive/ investigation of the issue, it appears that the code in Warm Lambda environments are reusing IAM sessions to connect to S3 to access the S3 Objects, the issue is that the solution's code does not take into account that a session may expire in a warm environment and currently the retry mechanism will retry a few times and then fail. We understand that we can increase the retry count however, in this case, the retry count can be exhausted due to the fact that it can take anywhere from 15 to 60 mins before the sessions are cleared out and the Lambda returns to a working state.
Because the session expires, the code then tries to use an expired token to fetch the S3 Objects and fails with the above mentioned error.
The current work around implemented that has worked in this scenario to solve this issue without major code changes is to configure AWS Lambda Destinations on the main Lambda function and then duplicate the Lambda function along with a minor code change in the Lambda handler of the duplicate function to include the line "event = event['requestPayload']" as the first line in the Lambda_handler() code, then using this second Lambda function as a Destination on Failure for the main Lambda function.
Although this solves the issue, it may prove cumbersome to manage the duplicate Lambda functions for a high number of pipelines.
Expected Behavior
Had a support case with AWS. From a discussion with the Lambda and IAM teams, it would be best to implement code logic that checks the validity and time left of the IAM session first then end the session should it be close to expiring and generate a new session that can be used to process the requests, eliminating the need for the Second Lambda function as a Destination.
Current Behavior
The pipeline ingestion stops, and there is no data on OpenSearch dashboards for up to 60 minutes, until a new IAM session is re-generated at the background.
Reproduction Steps
Create a pipeline on CentralizedLogging v2.3.0 with the EventBridge as a Lambda Function.
Leave Lambda Function's IAM Role Max Session duration settings to default 1 hour.
Look for error logs on the pipeline: "An error occurred (ExpiredToken) when calling the GetObject operation: The provided token has expired."
The timeline of these errors will match the gaps/missing data in the OpenSearch dashboard.
Possible Solution
It would be best to implement code logic that checks the validity and time left of the IAM session first then end the session should it be close to expiring and generate a new session that can be used to process the requests, eliminating the need for the Second Lambda function as a Destination.
Additional Information/Context
No response
Solution Version
2.3.0
AWS Region. e.g., us-east-1
No response
Other information
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
Issue:
The AWS Solutions Centralized Logging with OpenSearch Solution works as expected for low volume traffic to the Lambda functions.
However, once the Lambda function begins to experience high volumes, we observed that the Lambda function begins to experience the "An error occurred (ExpiredToken) when calling the GetObject operation: The provided token has expired." error, which results in gaps/ missing metrics in the OpenSearch dashboard.
From a deep dive/ investigation of the issue, it appears that the code in Warm Lambda environments are reusing IAM sessions to connect to S3 to access the S3 Objects, the issue is that the solution's code does not take into account that a session may expire in a warm environment and currently the retry mechanism will retry a few times and then fail. We understand that we can increase the retry count however, in this case, the retry count can be exhausted due to the fact that it can take anywhere from 15 to 60 mins before the sessions are cleared out and the Lambda returns to a working state.
Because the session expires, the code then tries to use an expired token to fetch the S3 Objects and fails with the above mentioned error.
The current work around implemented that has worked in this scenario to solve this issue without major code changes is to configure AWS Lambda Destinations on the main Lambda function and then duplicate the Lambda function along with a minor code change in the Lambda handler of the duplicate function to include the line "event = event['requestPayload']" as the first line in the Lambda_handler() code, then using this second Lambda function as a Destination on Failure for the main Lambda function.
Although this solves the issue, it may prove cumbersome to manage the duplicate Lambda functions for a high number of pipelines.
Expected Behavior
Had a support case with AWS. From a discussion with the Lambda and IAM teams, it would be best to implement code logic that checks the validity and time left of the IAM session first then end the session should it be close to expiring and generate a new session that can be used to process the requests, eliminating the need for the Second Lambda function as a Destination.
Current Behavior
The pipeline ingestion stops, and there is no data on OpenSearch dashboards for up to 60 minutes, until a new IAM session is re-generated at the background.
Reproduction Steps
Possible Solution
It would be best to implement code logic that checks the validity and time left of the IAM session first then end the session should it be close to expiring and generate a new session that can be used to process the requests, eliminating the need for the Second Lambda function as a Destination.
Additional Information/Context
No response
Solution Version
2.3.0
AWS Region. e.g., us-east-1
No response
Other information
No response
The text was updated successfully, but these errors were encountered: