-
Notifications
You must be signed in to change notification settings - Fork 2
TextRank
- Based on Google's PageRank Algorithm
- Graph-based Ranking Statistical Model
- Each Node is a Sentence
- Ranking Sentences with Underlying Assumption that Summary Sentences are Similar to most other Sentences
- Higher Ranking Sentence = Sentence being more similar to other sentences in the text
Click here for more information on TextRank
- Python Implementation of TextRank
- Slight Improvements to TextRank
- Lemmatization instead of Stemming
- NLP Combinations
- All NLP
- Stop words Removal Only
- Lemmatization Only
- No NLP
There are two different pyTextRank Notebooks, the first notebook shows how a single article will be processed along with it's output, while the second notebook processes multiple data and then stores the generated summaries into a separate folder.
Additional Steps to Run Different NLP Combinations
The original Github code for the pyTextRank model have been modified by the authors of this repository to run different NLP combinations. Below is a list of detailed steps to run the different NLP combinations.
1. All NLP
No additional steps required, base model contains all NLP techniques (Stop words Removal + Lemmatization).
2. Stop words Removal Only
Comment out this chunk of code from pyTextRank.py
line 227 if pos_family in POS_LEMMA:
...
line 229 word = word._replace(root=tok_lemma)
3. Lemmatization Only
Remove all words from stop word list
4. No NLP
Step 1 - Remove all words from stop word list
Step 2 - Comment out this chunk of code from pyTextRank.py
line 227 if pos_family in POS_LEMMA:
...
line 229 word = word._replace(root=tok_lemma)
- Chinese Python Implementation of TextRank
- NLP Combinations
- No NLP
- Stop words Removal Only
Completed by Melvin and Joe