You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.
Solution
Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?
My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.
To adapt to the dynamic nature of torrent networks I have made few adjustments:
Torrent health checks, performed at different time intervals, are considered reliable only to a certain degree, and our model includes mechanisms to estimate the likelihood of torrent change over time.
Outliers in health reports are defined as values lying outside a 95-99% confidence interval
If a peer consistently provides unreliable reports, its reputation is decreased drastically. If the report seems valid reputation score is slightly increased.
These reputation scores are then incorporated as weights in the predict_health function, which computes the current best estimate of torrent health given timestamp.
Development plan:
Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release
The text was updated successfully, but these errors were encountered:
Some more nuances and inefficienciety of current content discovery and torrent checker. I find some decision a bit arbitrary, but I don't if we want to change some of them:
The torrent checker uses three methods to get health info (Tracker, DHT, metadata fetch). Then a health check of any non-zero is used. There is a bias towards the highest reported number from any of the sources. They are not combined, in the end only data from some one source is used.
Sending out 5 random torrent info + 5 requests ( each 5 random torrents) and for each we potentially have more SQL requests. That eats up space and io heavy especailly for HDD. We need to add caching + something smarter on the torrent selection and gossip.
The problem
As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.
Solution
Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?
My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.
To adapt to the dynamic nature of torrent networks I have made few adjustments:
predict_health
function, which computes the current best estimate of torrent health given timestamp.Development plan:
Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release
The text was updated successfully, but these errors were encountered: