Simple Python web crawler using BeautifulSoup to test the theory that clicking on the first link (that is not a translation or italicized) on 97% of Wikipedia pages will lead to the page for Knowledge. The final page used to be the page for Philosophy, but this has recently changed to the page for Knowledge.
The program starts with a random Wikipedia article, finds and opens the first Wikipedia link in the body of the given Wikipedia URL, then finds and opens the first Wikipedia link in the body of that URL (and so on) until one of three possibilities occur ββ
- The pre-determined "target URL" is hit. In this case, the target URL is https://en.wikipedia.org/wiki/Knowledge.
- The pre-determined "maximum links" number is hit. In this case, this is specified to be 25 links.
- The last link opened has already been opened as part of this exercise ββ ergo, the program has hit a cycle.
To run the program, download the wiki-web-crawler.py file to your main user folder, and open the file in Terminal:
Python3 wiki-crawler.py
This program was built as the final project for the Introduction to Python course on Udacity.