Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8844] Support secondary and expression index creation through async indexer and write configs #12653

Merged
merged 9 commits into from
Jan 25, 2025

Conversation

codope
Copy link
Member

@codope codope commented Jan 16, 2025

Change Logs

  • Added new write configs for secondary index (SI) and expression index (EI) conformant to sql syntax for create index.
  • Use those configs to check if we need to update new SI/EI definition in index definition file.
  • Initialize SI/EI if a new one is being created through write configs.
  • Add asyn indexer and spark datasource to validate index gets built successfully.

Impact

Support secondary and expression index creation through async indexer and write configs.

Risk level (write none, low medium or high below)

medium

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jan 16, 2025
Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codope for working on this! Please find my comments below.
Can we also add tests for creation of expression index with write configs?
Couple of cases we can cover:-

  1. Creation of two or more expression index using write configs.
  2. Further upserts without removing the expression index configs should not recreate the index
  3. Failure scenarios with incomplete configs

@codope codope force-pushed the hudi-8844-si-indexer branch from 8021eba to 7baa38c Compare January 20, 2025 11:30
@lokeshj1703
Copy link
Contributor

@codope We would also need to look into indempotency of the index creation using dataframe.

@@ -133,6 +133,7 @@ public Option<HoodieIndexPlan> execute() {

private HoodieIndexPartitionInfo buildIndexPartitionInfo(MetadataPartitionType partitionType, HoodieInstant indexUptoInstant) {
// for expression index, we need to pass the index name as the partition name
// TODO: see the index partition info is built correctly. Should we register the index here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove the comment then?

@codope codope force-pushed the hudi-8844-si-indexer branch from 7baa38c to 4864f44 Compare January 23, 2025 17:52
* For the second time, the index definition file will be updated if exists.
* Table Config is updated if necessary.
*/
public static void register(HoodieTableMetaClient metaClient, HoodieIndexDefinition indexDefinition) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewer: this is not new logic. Just moved the util method from HoodieSparkIndexClient to this util class.

}
}

public static HoodieIndexDefinition getSecondaryOrExpressionIndexDefinition(HoodieTableMetaClient metaClient, String userIndexName, String indexType, Map<String, Map<String, String>> columns,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewer: this is not new logic. Just moved the util method from HoodieSparkIndexClient to this util class.

return metaClient.getTableConfig().getMetadataPartitions().stream().anyMatch(partition -> partition.equals(indexName));
}

private static boolean isEligibleForSecondaryOrExpressionIndex(HoodieTableMetaClient metaClient, String indexType, Map<String, String> options, Map<String, Map<String, String>> columns) throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewer: this is not new logic. Just moved the util method from HoodieSparkIndexClient to this util class.

@codope codope force-pushed the hudi-8844-si-indexer branch from 4864f44 to af9db02 Compare January 23, 2025 18:20
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments

@codope codope force-pushed the hudi-8844-si-indexer branch from 4118634 to d1caff0 Compare January 24, 2025 17:46
@codope codope force-pushed the hudi-8844-si-indexer branch from d1caff0 to 137a2d1 Compare January 25, 2025 01:28
@codope codope force-pushed the hudi-8844-si-indexer branch from 137a2d1 to 3c71355 Compare January 25, 2025 07:40
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Jan 25, 2025
@apache apache deleted a comment from hudi-bot Jan 25, 2025
@codope
Copy link
Member Author

codope commented Jan 25, 2025

@hudi-bot run azure

@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Jan 25, 2025
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 739b79a into apache:master Jan 25, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants