Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frechet Distance Implementation #2

Open
atharvaaalok opened this issue Feb 12, 2025 · 3 comments
Open

Frechet Distance Implementation #2

atharvaaalok opened this issue Feb 12, 2025 · 3 comments

Comments

@atharvaaalok
Copy link
Owner

To implement the Frechet Distance.

A list of repositories:

Other resources:
Paper Walking your frog fast in 4 LoC

@JSS95
Copy link

JSS95 commented Feb 13, 2025

Hello, Atharva. Here is some information you might find helpful:

  • Many packages claim to provide the Fréchet distance, but most of them implement only the discrete Fréchet distance (DFD). However, DFD is sufficient for most cases, so if DFD meets your needs, I suggest finding a reliable package and importing it into your GeoSimilarity project.

  • The continuous Fréchet distance (CFD) can be exactly computed only for polygonal chains. Of course, you can approximate any curve using a polygonal chain and compute the CFD. However, achieving a good approximation requires many chain segments, making CFD computationally unscalable. The better approach is to sample a large number of points from the curve and use DFD, which remains scalable.

  • Fred is one of the few packages that actually supports CFD. It provides parallel computation, which is not (yet) supported in my CurveSimilarities. If you really need CFD, you may want to use Fred instead.

Judging from your logo, I assume you also want to find the optimal matching of points between two curves.

  • Classical DFD and CFD algorithms do not explicitly consider optimal matching because they are max-based measures. To define an optimal matching between all points on the curves, you need a summation-based (or integral-based) measure.

  • However, DFD and CFD can be modified with additional conditions to define a "correct" matching. This is known as "locally correct Fréchet matching," introduced by K. Buchin in 2019. I had planned to implement it in CurveSimilarities, but it is not yet complete. IIRC, I did implement the locally correct CFD, so you may try it if you're interested.

  • Dynamic time warping (DTW) is a discrete, summation-based counterpart of DFD. DTW has been widely used for several decades, and you can easily find good packages.

  • Integral Fréchet distance (IFD) is a continuous, integral-based counterpart of CFD. It was introduced by M. Buchin in 2007 but has not been extensively studied. It is also referred to as continuous dynamic time warping (CDTW).

  • A. Maheshwari introduced an approximation algorithm for IFD in 2018, and K. Buchin developed an exact algorithm for CDTW in 2022, though it is restricted to 1D sequences. Both algorithms are practically unscalable unless your polygonal chains have very low complexity.

  • CurveSimilarities provides an experimental algorithm to approximate IFD, but it does not guarantee asymptotic convergence.

Finally, according to the README, it seems you aim to define a loss function between surfaces as well.

  • IIRC, the Fréchet distance can be defined between surfaces, but it is generally NP-hard. For real-valued surfaces, refer to this paper. (I haven't check this yet).

  • I am unsure whether there are extensions of DFD, DTW, and IFD for surfaces.

@avitase
Copy link

avitase commented Feb 13, 2025

Hi @atharvaaalok, here are some comments from my end as well:

  • I suggest viewing the Fréchet distance as a function acting on point-wise distance matrices rather than on curves directly. For example, for two polygonal curves
    p = [[0, 0], [3, 4], [7, 4], [7, 8], [5, 3], [9, 3], [11, 5]]
    and
    q = [[3, 0], [1, 6], [9, 6], [11, 8]]
    the Euclidean distance (in 2D) induces the distance matrix
>>> d = torch.cdist(p, q)
>>> d
tensor([[ 3.0000,  6.0828, 10.8167, 13.6015],
        [ 4.0000,  2.8284,  6.3246,  8.9443],
        [ 5.6569,  6.3246,  2.8284,  5.6569],
        [ 8.9443,  6.3246,  2.8284,  4.0000],
        [ 3.6056,  5.0000,  5.0000,  7.8102],
        [ 6.7082,  8.5440,  3.0000,  5.3852],
        [ 9.4340, 10.0499,  2.2361,  3.0000]])

and the Fréchet distance of this matrix is the element

>>> d[4, 2].item()
5.0

Hence, the API shouldn't be frechet_distance(p, q) but frechet_distance(d) where it's left to the user to decide how to cook up d by, for example, taking a Euclidean distance, a great-circle distance, or the Manhatten distance.

  • Being agnostic to the underlying metric is not a unique feature of the discrete Fréchet distance, but I am not aware of an efficient implementation of the continuous Fréchet distance that works for, e.g., great-circle distances.
  • The discrete Fréchet distance approximates the continuous Fréchet distance, bounded by the step size of the discretization, and the latter is a proper metric by, for example, obeying the triangle inequality. IMO, this measure is therefore much more powerful than others, most prominently, DTW.
  • As already alluded to above, the Fréchet distance only selects one element from a given distance matrix, hence its derivative w.r.t. this matrix is binary and extremely sparse: Ignoring edge cases, all derivatives are zero except for a single element at the largest distance; here the derivative is one. Therefore, I even suggest implementing the Fréchet distance as an arg_* version that directly returns the index ((4, 2) in the example above), rather than the element (5.0).
  • Here is a layman's implementation. Use it like this:
>>> x = p.detach().clone().requires_grad_(True)
>>> d = torch.cdist(x, q)
>>> arg_frechet_distance(d)
(4, 2)
>>> fd = FrechetDistance.apply(d)
>>> fd
tensor(5., grad_fn=<FrechetDistanceBackward>)
>>> fd.backward()
>>> x.grad
tensor([[ 0.0000,  0.0000],
        [ 0.0000,  0.0000],
        [ 0.0000,  0.0000],
        [ 0.0000,  0.0000],
        [-0.8000, -0.6000],
        [ 0.0000,  0.0000],
        [ 0.0000,  0.0000]])

@JSS95
Copy link

JSS95 commented Feb 13, 2025

@avitase 's comments are correct. I would like to add some more points:

  • I think it's impossible to find CFD from d.
  • dfd(d) is the way to go. But if you want to make the function call consistent with other similarity measures, you can make it polymorphic and allow both dfd(p, q, metric='euclidean') and dfd(d, metric='precomputed'). scikit-learn does this for manifold learning.
  • Implementing arg_dfd() will be easy. If you need arg_cfd(), check curvesimilarities.significant_events()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants