UMAP-rs not efficient #2

jianshu93 · 2023-11-10T19:42:24Z

Dear Cell-ranger team,

It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.

Thanks,

Jianshu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMAP-rs not efficient #2

UMAP-rs not efficient #2

jianshu93 commented Nov 10, 2023

UMAP-rs not efficient #2

UMAP-rs not efficient #2

Comments

jianshu93 commented Nov 10, 2023