Follow the instructions below to set up and run the project on your local machine.
Ensure you have the following installed on your system:
- Node.js (v20 or higher)
- Redis (for BullMQ job queues)
-
Clone the repository:
git clone https://github.com/OriginTrail/edge-node-knowledge-mining-js cd edge-node-knowledge-mining
-
Install the dependencies:
npm install
-
Make sure Redis is running on its default port (6379).
The project requires environment variables to be set. Use the provided .env.example
file as a template:
-
Copy
.env.example
to.env
:cp .env.example .env
-
Populate the
.env
file with the required values. Example:PORT=5005 UI_ENDPOINT=http://localhost:5173 AUTH_SERVICE_ENDPOINT=http://localhost:3001 KNOWLEDGE_MINING_QUEUE=knowledge-mining-queue KNOWLEDGE_MINING_CONCURRENCY=20 OPENAI_API_KEY=your_openai_api_key UNSTRUCTURED_API_URL=your_unstructured_api_url UNSTRUCTURED_API_KEY=your_unstructured_api_key
-
Start the service:
npm start
-
The service will start on the configured port (default:
5005
).
POST /trigger-pipeline
This endpoint triggers a knowledge mining pipeline with a file upload.
Request:
- Headers:
Authorization
: Bearer token for authentication.
- Body (form-data):
pipelineId
(string, required): The ID of the pipeline to trigger. ID is the filename of the file where the pipeline is defined (simple_json_to_jsonld, pdf_to_jsonld...).fileFormat
(string, optional): Format of the uploaded file (json, csv...).file
(file, required): File to be processed.
Example cURL:
curl -X POST http://localhost:5005/trigger-pipeline \
-H "Authorization: Bearer <your_token>" \
-F "pipelineId=12345" \
-F "fileFormat=pdf" \
-F "file=@example.pdf"
Response:
- Success (200):
{ "pipelineId": "12345", "runId": "jobId123", "message": "Pipeline triggered successfully", "success": true }
- Error (400):
{ "error": "Missing pipelineId" }
{ "error": "No selected file" }
- Error (500):
{ "error": "Failed to trigger pipeline" }
GET /check-pipeline-status
This endpoint retrieves the status of a specific pipeline run.
Request:
- Headers:
Authorization
: Bearer token for authentication.
- Query Parameters:
pipelineId
(string, required): The ID of the pipeline.runId
(string, required): The ID of the specific run to check.
Example cURL:
curl -X GET "http://localhost:5005/check-pipeline-status?pipelineId=12345&runId=jobId123" \
-H "Authorization: Bearer <your_token>"
Response:
- Success (200):
{ "id": "jobId123", "status": "completed", "ka": <knowledge_asset_object> }
- Error (400):
{ "error": "Missing pipelineId or runId" }
- Error (404):
{ "error": "Pipeline not found" }
- Error (500):
{ "error": "Failed to fetch pipeline status" }
The project uses the following dependencies:
- Axios: HTTP client
- BullMQ: Job queue library (requires Redis)
- Cookie-parser: Parse HTTP cookies
- CORS: Enable Cross-Origin Resource Sharing
- Dotenv: Manage environment variables
- Express: Web framework for Node.js
- ioredis: Redis client
- JSON-LD: JSON-LD library
- Multer: Middleware for handling
multipart/form-data
- OpenAI: OpenAI API client
- Unstructured Client: Client for interacting with the Unstructured API