Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8704] Instant time to completion time state upgrade migration for Flink StreamReadMonitoringFunction #12539

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

voonhous
Copy link
Member

@voonhous voonhous commented Dec 25, 2024

Change Logs

Instant time to completion time state upgrade migration for Flink StreamReadMonitoringFunction.

When performing a version upgrade, users are required to stop their job with savepoint, then restart their job with a newer Hudi-Flink-Bundle. To ensure forward compatibility, we added a state upgrader interface for StreamReadMonitoring function to allow forward compatibility for readers.

  1. Added an interface to handle state upgrades.
  2. Implemented StreamReadMonitoringStateUpgrader to handle issuedInstant (requestInstant) -> issuedInstant + issuedOffset (completionTime) upgrade
  3. Moved outdated legacy validation logic into StreamReadMonitoringStateUpgrader to avoid confusion.

Impact

None

Risk level (write none, low medium or high below)

None

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@voonhous voonhous marked this pull request as draft December 25, 2024 16:04
@voonhous voonhous changed the title Draft: [HUDI-8704] Instant time to completion time state upgrade migration for Flink StreamReadMonitoringFunction [HUDI-8704] Instant time to completion time state upgrade migration for Flink StreamReadMonitoringFunction Dec 25, 2024
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Dec 25, 2024
@voonhous voonhous marked this pull request as ready for review February 3, 2025 11:40
@voonhous voonhous requested a review from danny0405 February 3, 2025 11:40
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Feb 3, 2025
@voonhous
Copy link
Member Author

voonhous commented Feb 3, 2025

@danny0405 Apologies that this took so long, it is ready for review now.

@hudi-bot
Copy link

hudi-bot commented Feb 4, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

getClass().getSimpleName(), issuedInstant, issuedOffset, conf.get(FlinkOptions.TABLE_NAME), path);
}
this.issuedInstant = retrievedStates.get(0);
this.issuedOffset = retrievedStates.get(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected this be a simple change, check the issuedOffset against the timeline for the first time of read, if the issuedOffset is an instant time, then just switch to it's completion time or modification time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the misunderstanding, i do not quite understand what you mean, I was under the impression that this would be a backward compatibility improvement when migrating from a Hudi 0.x table to a Hudi 1.0 table.

From what i understand:

version attribute available in state?
0.x issuedInstant true
0.x issuedOffset false
1.x issuedInstant true
1.x issuedOffset true

Hence, i was under the impression that we needed a way to add a issuedOffset when upgrading from 0.x, where issuedOffset will be null in the state backend during first read after upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants