Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Block some high frequency spider bots (#2816)
This introduces rules to the robots.txt to block some spiders based on: - Blocking spiders that are just for LLM training or enrichment of the company, and do not offer value to genuine searchers. - Block the package docs under /p from crawlers that hit those pages frequently but without obvious gains for genuine searchers. These bots are behaving as worst-case users, based on our tests for ocaml/infrastructure#161 : they are hitting our most expensive pages frequently, in patterns that bypass the cache (newly added by @mtelvers).
- Loading branch information