Edge Node Knowledge Mining

Getting Started

Follow the instructions below to set up and run the project on your local machine.

Prerequisites

Ensure you have the following installed on your system:

Node.js (v20 or higher)
Redis (for BullMQ job queues)

Installation

Clone the repository:

git clone https://github.com/OriginTrail/edge-node-knowledge-mining-js
cd edge-node-knowledge-mining

Install the dependencies:
```
npm install
```
Make sure Redis is running on its default port (6379).

Configuration

The project requires environment variables to be set. Use the provided .env.example file as a template:

Copy .env.example to .env:
```
cp .env.example .env
```

Populate the .env file with the required values. Example:

PORT=5005
UI_ENDPOINT=http://localhost:5173
AUTH_SERVICE_ENDPOINT=http://localhost:3001

KNOWLEDGE_MINING_QUEUE=knowledge-mining-queue
KNOWLEDGE_MINING_CONCURRENCY=20

OPENAI_API_KEY=your_openai_api_key
UNSTRUCTURED_API_URL=your_unstructured_api_url
UNSTRUCTURED_API_KEY=your_unstructured_api_key

Usage

Start the service:
```
npm start
```
The service will start on the configured port (default: 5005).

API Routes

1. Trigger Pipeline

POST /trigger-pipeline

This endpoint triggers a knowledge mining pipeline with a file upload.

Request:

Headers:
- Authorization: Bearer token for authentication.
Body (form-data):
- pipelineId (string, required): The ID of the pipeline to trigger. ID is the filename of the file where the pipeline is defined (simple_json_to_jsonld, pdf_to_jsonld...).
- fileFormat (string, optional): Format of the uploaded file (json, csv...).
- file (file, required): File to be processed.

Example cURL:

curl -X POST http://localhost:5005/trigger-pipeline \
  -H "Authorization: Bearer <your_token>" \
  -F "pipelineId=12345" \
  -F "fileFormat=pdf" \
  -F "file=@example.pdf"

Response:

Success (200):

{
  "pipelineId": "12345",
  "runId": "jobId123",
  "message": "Pipeline triggered successfully",
  "success": true
}

Error (400):

{ "error": "Missing pipelineId" }

{ "error": "No selected file" }

Error (500):

{ "error": "Failed to trigger pipeline" }

2. Check Pipeline Status

GET /check-pipeline-status

This endpoint retrieves the status of a specific pipeline run.

Request:

Headers:
- Authorization: Bearer token for authentication.
Query Parameters:
- pipelineId (string, required): The ID of the pipeline.
- runId (string, required): The ID of the specific run to check.

Example cURL:

curl -X GET "http://localhost:5005/check-pipeline-status?pipelineId=12345&runId=jobId123" \
  -H "Authorization: Bearer <your_token>"

Response:

Success (200):

{
  "id": "jobId123",
  "status": "completed",
  "ka": <knowledge_asset_object>
}

Error (400):

{ "error": "Missing pipelineId or runId" }

Error (404):
```
{ "error": "Pipeline not found" }
```

Error (500):

{ "error": "Failed to fetch pipeline status" }

Dependencies

The project uses the following dependencies:

Axios: HTTP client
BullMQ: Job queue library (requires Redis)
Cookie-parser: Parse HTTP cookies
CORS: Enable Cross-Origin Resource Sharing
Dotenv: Manage environment variables
Express: Web framework for Node.js
ioredis: Redis client
JSON-LD: JSON-LD library
Multer: Middleware for handling multipart/form-data
OpenAI: OpenAI API client
Unstructured Client: Client for interacting with the Unstructured API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Edge Node Knowledge Mining

Table of Contents

Getting Started

Prerequisites

Installation

Configuration

Usage

API Routes

1. Trigger Pipeline

2. Check Pipeline Status

Dependencies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Edge Node Knowledge Mining

Table of Contents

Getting Started

Prerequisites

Installation

Configuration

Usage

API Routes

1. Trigger Pipeline

2. Check Pipeline Status

Dependencies