Integrate Kalman Filter-based Torrent Health Estimation #8188

grimadas · 2024-10-03T11:10:14Z

The problem

As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.

Solution

Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?

My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.

To adapt to the dynamic nature of torrent networks I have made few adjustments:

Torrent health checks, performed at different time intervals, are considered reliable only to a certain degree, and our model includes mechanisms to estimate the likelihood of torrent change over time.
Outliers in health reports are defined as values lying outside a 95-99% confidence interval
If a peer consistently provides unreliable reports, its reputation is decreased drastically. If the report seems valid reputation score is slightly increased.
These reputation scores are then incorporated as weights in the predict_health function, which computes the current best estimate of torrent health given timestamp.

Development plan:

Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release

The text was updated successfully, but these errors were encountered:

adlai · 2024-10-11T09:47:13Z

Why are both scipy and numpy together considered too much, if numpy alone is not?

qstokkink · 2024-11-27T13:34:57Z

This approach seems viable. As a small POC, I stripped out both numpy and scipy: https://gist.github.com/qstokkink/823c566d532c4d3556fd100f7d9105e6

As an added benefit, the version without those libraries (which I named "quinten") is also ~10x faster:

synctext · 2024-11-28T19:59:03Z

Is 10 really a usable lowest seeder count? Users wait for a week sometimes to see if a seeder comes back.

grimadas · 2025-02-27T15:46:43Z

Some more nuances and inefficienciety of current content discovery and torrent checker. I find some decision a bit arbitrary, but I don't if we want to change some of them:

The torrent checker uses three methods to get health info (Tracker, DHT, metadata fetch). Then a health check of any non-zero is used. There is a bias towards the highest reported number from any of the sources. They are not combined, in the end only data from some one source is used.
Sending out 5 random torrent info + 5 requests ( each 5 random torrents) and for each we potentially have more SQL requests. That eats up space and io heavy especailly for HDD. We need to add caching + something smarter on the torrent selection and gossip.

qstokkink added type: enhancement component: content discovery labels Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Kalman Filter-based Torrent Health Estimation #8188

Integrate Kalman Filter-based Torrent Health Estimation #8188

grimadas commented Oct 3, 2024 •

edited

Loading

adlai commented Oct 11, 2024

qstokkink commented Nov 27, 2024

synctext commented Nov 28, 2024

grimadas commented Feb 27, 2025

Integrate Kalman Filter-based Torrent Health Estimation #8188

Integrate Kalman Filter-based Torrent Health Estimation #8188

Comments

grimadas commented Oct 3, 2024 • edited Loading

The problem

Solution

Development plan:

adlai commented Oct 11, 2024

qstokkink commented Nov 27, 2024

synctext commented Nov 28, 2024

grimadas commented Feb 27, 2025

grimadas commented Oct 3, 2024 •

edited

Loading