Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

cybercongress/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

version CircleCI license LoC contributors discuss contribute

://cyber wiki index

Installation

Note: Requires Go 1.12+

git clone https://github.com/cybercongress/crawler
cd crawler
go build -o crawler

Preparation

  1. IPFS daemon should be launched
  2. Download enwiki-latest-all-titles to crawler root dir:
ipfs get QmddV5QP87BZGiSUCf9x9hsqM73b83rsPC6AYMNqkjKMGx -o enwiki-latest-all-titles
  1. Add account to cyberdcli:
docker exec -ti cyberd cyberdcli keys add <name> --recover

Usage

Submit links

Basically, there are two main functions provided by crawler tool. The first one is to parse wiki titles and submit links between keywords and wiki pages.

./crawler submit-links-to-cyber ./enwiki-latest-all-titles --home=<path-to-cyberdcli> --address=<account> --passphrase=<passphrase> --chunk=100

Note: Uses only local cyberd node.

Note: Submit links do not add duras to IPFS.

Note: Chunk - how many links messages added to one tx

Note: There is --help command, for example

./crawler submit-links-to-cyber --help

Here, enwiki-latest-all-titles is titles file obtained from official Wiki dumps.

Uploading duras to IPFS

Also, crawler has separate command upload-duras-to-ipfs to upload files to local IPFS node. All DURAs are collected under single root unixfs directory.

./crawler upload-duras-to-ipfs enwiki-latest-all-titles

Issues

If you have any problems with or questions about search, please contact us through a GitHub issue.

Contributing

You are invited to contribute new features, fixes, or updates, large or small; We are always thrilled to receive pull requests, and do our best to process them as fast as We can. You can find detailed information in our contribution guide.

Changelog

Stay tuned with our Changelog.