Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8850] Fix record merge mode related issues and improve test coverage in Spark SQL #12725

Merged
merged 9 commits into from
Jan 30, 2025

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Jan 28, 2025

Change Logs

This PR fixes record merge mode related issues and improves test coverage in Spark SQL:

  • Table version 6 does not have record merge mode in the table config. The file group reader is changed to infer the record merge mode if it does not exist in the table config, so it can read Hudi table of version 6.
  • For INSERT INTO in Spark SQL, the write config of record merge mode is not properly determined in buildHoodieInsertConfig. The relevant logic is removed since the SQL writer can automatically get the correct write config of record merge mode from the table config.
  • In MERGE INTO in Spark SQL, the record merge mode check is fixed to avoid NPE.
  • More tests are added in TestMergeModeCommitTimeOrdering and TestMergeModeEventTimeOrdering to coverage different cases of merge modes and both table version 6 and 8.

Impact

Bug fixes on record merge mode related issues and improvement on test coverage.

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 28, 2025
@yihua yihua force-pushed the HUDI-8850-insert-merge-mode branch from 44ef790 to 806d7f7 Compare January 29, 2025 03:01
@nsivabalan nsivabalan added the release-1.0.1 Patches targetted for 1.0.1 release label Jan 29, 2025
@yihua yihua force-pushed the HUDI-8850-insert-merge-mode branch from 806d7f7 to 7d8234d Compare January 29, 2025 07:32
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Jan 29, 2025
@yihua yihua force-pushed the HUDI-8850-insert-merge-mode branch from 7d8234d to 7419a76 Compare January 29, 2025 20:19
@yihua yihua changed the title [HUDI-8850] Fix determination of merge mode write config for INSERT INTO in Spark SQL [HUDI-8850] Fix record merge mode related issues and improve test coverage in Spark SQL Jan 30, 2025
@yihua yihua marked this pull request as ready for review January 30, 2025 05:29
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 27a950e into apache:master Jan 30, 2025
43 checks passed
linliu-code pushed a commit to linliu-code/hudi that referenced this pull request Jan 31, 2025
…erage in Spark SQL (apache#12725)

This PR fixes record merge mode issues and improves test coverage in Spark SQL:

- Table version 6 lacks record merge mode in the config. The file group reader now infers it if missing, enabling v6 table reads.
- For INSERT INTO in Spark SQL, buildHoodieInsertConfig didn't set record merge mode properly. The logic is now removed.
- The SQL writer now gets the correct record merge mode from the table config automatically.
- In MERGE INTO in Spark SQL, the record merge mode check is fixed to avoid NPE.
- More tests are added in TestMergeModeCommitTimeOrdering and TestMergeModeEventTimeOrdering for different merge modes. The tests cover both table version 6 and 8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-1.0.1 Patches targetted for 1.0.1 release size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants