Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usecase? #1

Closed
samyogdhital opened this issue Feb 14, 2025 · 5 comments
Closed

Usecase? #1

samyogdhital opened this issue Feb 14, 2025 · 5 comments

Comments

@samyogdhital
Copy link

Hello there.
I am running a self hosted version of firecrawl. I saw your comment here in /dzhng/deep-research/issues/77.

I wanted to ask you, have you configured crawlrouter by yourself for deep-research repo locally?

If that is then how did you configure it?
I mean is it hot swappable or I have to do some configuration in that deep-research repo?

Thanks. I think this is awesome tool.

@loorisr
Copy link
Owner

loorisr commented Feb 14, 2025

Hello,

https://github.com/dzhng/deep-research needs a Firecrawl with the /search endpoint. It only use the /search endpoint to make and SERP and scraped the pages.

This repo allows you to use for example SearxNG as a self hosted search engine and then use your self-hosted version of Firecrawl or Crawl4AI or Jina (that can be self hosted) to scrape the pages.

In the deep-research repo you have to set up FIRECRAWL_BASE_URL="http://localhost:8000" (or to the ip where you run crawlrouter

@samyogdhital
Copy link
Author

This repo allows you to use for example SearxNG as a self hosted search engine and then use your self-hosted version of Firecrawl or Crawl4AI or Jina (that can be self hosted) to scrape the pages.

Brother I desperately need this. Was actively looking for solution. Was even ready to implement it myself. Thanks for this.
Its the exact usecase I am also working for.

I am closely following this repo.

Is there any roadmap for this project?
What are you planning to do?

@samyogdhital
Copy link
Author

@loorisr For this /deep-research just swapping the url will work perfectly? Or I have to do some changes?
I have not looked deep into this repo. I am out today. Will look into this tomorrow. If you have already done the integration, it would be really awesome to know brother.

Again following this project. If there is some roadmap please me know. May as well help with the code.

@loorisr
Copy link
Owner

loorisr commented Feb 14, 2025

I'm glad it can help :)

I'm using it with deep-research and it works fine, you just need to set FIRECRAWL_BASE_URL in the env file of deep-research to where you host crawlrouter.

Then on crawlrouter you need to set SEARCH_BACKEND to the one you want (for example searxng) and SCRAPE_BACKEND to, for example crawl4ai or firecrawl.
Of course you need to have a working instance of Searxng (with json mode activated) and same for crawl4ai/firecrawl. This docker compose should work, it is very close to the one I'm using

For the roadmap, I'm currently implementing the /crawl endpoint. It will also speed up the /search endpoint when activating the scape mode (by default /search only return the url, title and a description of the link, but deep-research also need to have the complete page scraped).
I will also complete a but the implementation (options).

And then it will depends what people could need. Other backend (https://scrapingant.com, ...) or other functions!

@samyogdhital
Copy link
Author

This is a promising project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants