You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered:
Dear Cell-ranger team,
It seems scan-rs is still using the very old vantage point data structure for nearest neighbor search, a key step of UMAP and t-SNE. However, recent breakthroughs in nearest neighbor search, e.g., proximity graph based algorithm has been proposed, which can be much faster and also accurate in terms of recall (e.g. HNSW, NSG). More important, it can be efficiently parallelized. In addition to the NNS step, UMAP steps, including cross entropy optimization, embedding space initialization, are all single threaded, thus slow for large dataset such as millions or billions of samples (it will be soon easy to have such large-scale dataset). I think the non-linear dimension reductions step can be further improved/accelerated.
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered: