Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Dedup doesn't work with UCX #283

Open
praateekmahajan opened this issue Oct 8, 2024 · 2 comments
Open

Semantic Dedup doesn't work with UCX #283

praateekmahajan opened this issue Oct 8, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@praateekmahajan
Copy link
Collaborator

praateekmahajan commented Oct 8, 2024

Describe the bug

Semantic Dedup often gets stuck at the state when we call semantic_cluster_dedup.extract_dedup_data.

Steps/Code to reproduce bug

Run semantic dedup when the client = get_client(device_type='gpu', protocol='ucx')

Environment overview

Tried on cudf-cu12=24.8.*, cudf-cu12==24.10.a and cudf-cu12==24.12.a

Succeeds when protocol='tcp'

@praateekmahajan praateekmahajan added the bug Something isn't working label Oct 8, 2024
@praateekmahajan
Copy link
Collaborator Author

Also from a quick experiment it seems like classifiers (domain / quality) are about 30% slower when using UCX.

@praateekmahajan
Copy link
Collaborator Author

We should try if the PR #80 Patch Distributed UCX comms to allow configuring connect timeout (docs here) help solve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants