Skip to content

Search Engine is a project that implements a basic search engine using C++, Python, and Cython. It builds a reverse index and ranks pages with the PageRank algorithm based on keyword relevance and page importance.

License

Notifications You must be signed in to change notification settings

BianchTech/Search_Engine

Repository files navigation

Search Engine

License: MIT GitHub Issues or Pull Requests Contributor Covenant Poetry PyPi CMake Build and Test

Logo da biblioteca


Search Engine is a simple, efficient engine that builds a reverse index for keyword searching and ranks results using the PageRank algorithm.

⚙️ Installation

Please create a virtual environment using venv, as the project is still in alpha testing and in its initial implementations.

python3 -m venv .env
source .env/bin/activate
pip install search-engine-cpp

🚀 Usage

from search_engine.crawler import Crawler

crawler = Crawler("https://en.wikipedia.org", "/wiki/", "Cat", test_mode=True)
graph = crawler.run(limit=10)
my_dict = graph.compute_page_rank()
top = sorted(my_dict.items(), key=lambda item: item[1], reverse=True)[:3]

print(top)

📋 Requirements for Contributions

Before compiling the project, ensure your environment meets the following requirements:

  • CMake 3.10 or higher
  • Google Test for unit testing
  • A C++11 compatible compiler or higher

📂 Project Structure

The project is organized as follows:

  • src/: Main implementation of the search engine, including reverse indexing and the PageRank algorithm.
  • tests/: Unit tests to verify the functionality of the system.
  • CMakeLists.txt: Configuration file for building the project with CMake.

🔧 Building the Project

To compile the project, follow these steps:

  1. Create a build directory and navigate into it:

    mkdir build && cd build
  2. Run CMake to generate the build files:

    cmake ..
  3. Compile the project using make:

    make

🧪 Running Tests for Contributions

Run unit tests to ensure the correctness of the system.

  1. After building the project, navigate to the build directory and execute:

    ./tests/unit-tests/LibUnitTests

This will run the tests covering search engine functionality, reverse indexing, and the PageRank algorithm.


🏃 Running Examples for Contributions

The first step is building the project, for this to run:

poetry install
poetry build

After building it, run this command to see the library working:

poetry run python Examples/graph_example.py

⚙️ How It Works

  • Reverse Indexing: Maps keywords to the documents where they appear.
  • PageRank: An algorithm that assigns a relevance score to each document based on its links and structure.
  • Querying: Searches for documents related to a keyword and ranks them according to their PageRank score.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


👥 Contributors

We welcome all contributions to this project! Please make sure to follow the guidelines outlined in the CONTRIBUTING.md file.
Thanks to all contributors

Contributors

Made with contrib.rocks.

Join the BianchTech Open-Source Community! 🚀

Be part of a growing community focused on innovation and collaboration! Contribute to impactful open-source projects, learn, and grow alongside like-minded developers.

💡 Ready to join? Just drop us a message in the Discussions section on GitHub. Let’s build the future together! 🌟


Keep learning,
Pedro;)