Skip to content

Latest commit

 

History

History
253 lines (191 loc) · 10.4 KB

README.md

File metadata and controls

253 lines (191 loc) · 10.4 KB

KnowledgeHunter

KnowledgeHunter is an advanced research and report generation platform that leverages AI, real-time data extraction, and cutting-edge analysis to deliver comprehensive, high-quality research reports. With dynamic query processing, customizable report templates, and robust real-time progress updates, KnowledgeHunter transforms the way research is performed and shared.

Live Demo Developer


Overview

KnowledgeHunter automates research workflows by:

  • Dynamic Query Processing: Intelligent trimming and parameter determination via AI models.
  • Iterative Research Loops: Automatically generates follow-up queries to refine and deepen research.
  • Multi-Modal Data Extraction: Uses Firecrawl to search the web, validate YouTube videos, and detect media content.
  • Vision Model Integration: Applies a vision model to analyze and validate image content, ensuring only high-quality images are considered.
  • Adaptive Research Modes: Choose between Quick Hunter (balanced, faster research) and Deep Hunter (comprehensive, detailed analysis) modes.
  • Rich Report Rendering: Produces final reports in Markdown—including markdown table rendering and rich media embeds.
  • Real-Time Progress Indicators: Provides live progress updates over WebSockets so users can follow research in real time.
  • UI/UX Enhancements: A refreshed, responsive, and intuitive user interface with improved interactions, error boundaries, and clear status feedback.

Updated Research Flow

The updated process now distinguishes between Quick Hunter vs Deep Hunter modes, integrates vision-based image analysis and YouTube video validation, and enriches report rendering with markdown table support. Below is the updated Mermaid flowchart:

flowchart TD
  %% Main Entry Point
  A[handleResearch] -->|Orchestrates| B[determineResearchParameters]
  A --> C[researchQuery]
  A --> D[expandQuery]
  A --> E[isResearchSufficient]
  A --> F[formatReport]
  A --> G[generateClarifyingQuestions]

  %% researchQuery subflow
  C --> H[Firecrawl.search]
  C --> I[fetchWithTimeout HTML]
  I --> J[Extract HTML Content]
  J --> K[detectMediaContent]

  %% detectMediaContent subflow
  K --> L[Process YouTube URLs isYouTubeVideoValid]
  K --> M[Collect Image URLs]
  M --> N[analyzeImagesWithVision]
  N --> O[getImageDimensions]
  O --> K

  %% Post-processing in researchQuery
  C --> P[processBatchFindings]

  %% Helper Functions used across flows
  B -.-> Q[trimPrompt]
  P -.-> Q
  D -.-> Q
  E -.-> Q
  F -.-> Q
  G -.-> Q

  %% All OpenAI API calls use the completions.create helper
  B --> R[openai.chat.completions.create]
  P --> R
  D --> R
  E --> R
  F --> R
  G --> R

  %% Labels for clarity
  subgraph Helpers [Helper Functions]
    Q[trimPrompt]
    R[openai.chat.completions.create]
  end
Loading

Key Features

Adaptive Research Modes: Quick Hunter vs Deep Hunter

  • Quick Hunter: Uses a fixed minimal set of research parameters for faster, balanced research output.
  • Deep Hunter: Dynamically calculates optimal breadth and depth based on query complexity to perform an in-depth investigation.

Enhanced Media Analysis

  • Vision Model for Image Analysis: Integrates a vision model (via OpenAI's latest vision-enabled models) to analyze and validate image content before including it in the report.
  • YouTube Video Validation: Verifies YouTube video availability and validity by checking for error markers and parsing the embedded player response.

Improved Report Rendering & UI/UX

  • Markdown Table Rendering: Final reports can now include well-formatted markdown tables to clearly present comparative data.
  • Progress Indicators: Real-time progress feedback is provided throughout the research process, helping users monitor the evolving analysis.
  • UI/UX Improvements: A refreshed frontend with enhanced styling, intuitive interactions, and robust error boundaries (e.g., in markdown rendering) ensure a seamless user experience.

Real-Time Updates & Interactivity

  • Live Progress Updates: Uses WebSockets to push continuous progress updates (e.g., current query status, percentage complete, and interim findings).
  • Interactive Query Refinement: The system generates clarifying questions and follow-up queries, allowing users to refine their research inputs interactively.

Architecture & Technology Stack

Frontend

  • React & TypeScript: Modern, type-safe UI built with React.
  • Tailwind CSS: Highly customizable and responsive styling.
  • Framer Motion: Smooth animations for transitions and UI feedback.
  • Component Library: Reusable components for dialogs, forms, notifications, and more.

Backend

  • Express & Node.js: Robust REST API and WebSocket server.
  • PostgreSQL & Drizzle ORM: Structured, type-safe database interactions.
  • Clerk Authentication: Secure user authentication and session management.
  • Firecrawl & OpenAI Integration: For advanced web crawling, search, and AI-driven analysis.

Shared

  • Zod & Drizzle-Zod: Schema validation for both client and server.
  • Common Types: Shared schemas and types ensure consistency across the codebase.

Advanced Research Capabilities

  • Dynamic Query Processing: Intelligent trimming and parameter determination using AI models
  • Iterative Query Handling: Automatically loops over query iterations and generates follow-up questions
  • Web Data Extraction: Integrates with Firecrawl to search the web and detect media content
  • Context Accumulation: Compiles and analyzes findings, URLs, and media content
  • Real-Time Updates: Provides live progress updates via WebSockets

Report Customization & Rendering

  • Multiple Report Templates: Offers various styles and citation formats
  • Flexible Section Ordering: Lets users customize the report layout
  • Advanced Markdown Rendering: Robust markdown (with table support) is rendered through a dedicated component wrapped in error boundaries
  • Export Options: Supports PDF, DOCX, and HTML exports along with a live preview

User Experience

  • Clean, Modern Interface: A responsive, mobile-friendly design with dark mode support
  • Interactive Query Refinement: Real-time adjustments to research queries
  • Real-Time Progress: Keeps users informed with continuous updates

AI Integration & Technical Features

  • OpenAI-Powered Analysis: Uses state-of-the-art models for data summarization and analysis
  • Source Verification: Ensures accurate research outputs with context-aware follow-up queries
  • Error Resilience: Implements error boundaries (for example, in markdown rendering) to prevent crashes
  • Modern Tech Stack: Built using React, TypeScript, Express, PostgreSQL, Clerk authentication, and RESTful APIs

Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Node.js 20.x or higher
  • PostgreSQL database
  • A Clerk account
  • An OpenAI API key
  • A Firecrawl API key

Environment Variables

Create a .env file in the root directory with the following variables:

# Database Configuration
DATABASE_URL=postgresql://user:password@host:port/database
PGHOST=your_pg_host
PGPORT=your_pg_port
PGUSER=your_pg_user
PGPASSWORD=your_pg_password
PGDATABASE=your_pg_database

# API Keys
OPENAI_API_KEY=your_openai_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key

Setup Instructions

  1. Clone the repository:
git clone https://github.com/meffordh/KnowledgeHunter.git
cd KnowledgeHunter
  1. Install dependencies:
npm install
  1. Configure Environment Variables:

    • Create a .env file in the root directory
    • Add all required environment variables as shown above
  2. Start the development server:

npm run dev

Project Structure

├── client/                # Frontend React application
│   ├── src/
│   │   ├── components/   # Reusable UI components
│   │   ├── hooks/        # Custom React hooks
│   │   ├── lib/          # Utility functions and helper modules
│   │   └── pages/        # Page components
├── server/               # Backend Express application
│   ├── auth.ts          # Authentication setup
│   ├── deep-research.ts # Research logic (modularized for clarity)
│   ├── routes.ts        # API endpoints and routes
│   └── storage.ts       # Database interface
└── shared/              # Shared types and schemas

Additional Updates & Modularization

Recent Improvements (February 2025)

Enhanced URL Fetching and Content Processing

  • Smart URL Handling: Implemented intelligent timeout management (2000ms default) for faster response to problematic URLs
  • Improved Error Detection: Added early detection and skipping of problematic URLs (PDFs, government sites)
  • Redirect Management: Enhanced handling of URL redirects with manual redirect mode to prevent redirect chains
  • Header Optimization: Added realistic browser-like headers to improve compatibility with various web servers
  • Graceful Error Recovery: Enhanced error logging and fallback mechanisms for different types of network failures

Markdown Rendering Modularization

To avoid cluttering deep-research.ts, we extracted the markdown rendering into its own component. For example, in SafeMarkdown.tsx:

// SafeMarkdown.tsx
import React from 'react';
import ReactMarkdown from 'react-markdown';
import ErrorBoundary from './ErrorBoundary';

const SafeMarkdown = ({ content }: { content: string }) => (
  <ErrorBoundary fallback={<div>Error rendering markdown.</div>}>
    <ReactMarkdown>{content}</ReactMarkdown>
  </ErrorBoundary>
);

export default SafeMarkdown;

Error Boundaries

The new error boundary component ensures that any errors (e.g., "this.setData is not a function") during markdown rendering do not crash the entire application.

Real-Time Progress Updates

The sendProgress function (using WebSockets) provides users with live updates throughout the research process.

Contributing

Contributions, issues, and feature requests are welcome! Please check Issues or open a pull request.