Skip to content

yuanshuo1022/text-similarity-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-similarity-server

基于词向量的文本相似度分析--后端

下载依赖:
pip install

基本依赖项:
opencc
jieba
woed2Vec
······

demo使用:

下载数据(.xml.bz2文件随意选择)
链接 :https://dumps.wikimedia.org/zhwiki/latest/

数据抽取(无需解压.xml.bz2文件)
python wiki_process.py zhwiki-latest-pages-articles.xml.bz2 zhwiki-latest.txt
繁体转简体:
https://github.com/yuanshuo1022/text-similarity-server/blob/master/demo/chinese_simple.py

清洗预料:
https://github.com/yuanshuo1022/text-similarity-server/blob/master/demo/clean_corpus.py

模型训练:
https://github.com/yuanshuo1022/text-similarity-server/blob/master/demo/wordModel.py

可视化:
https://github.com/yuanshuo1022/text-similarity-server/blob/master/demo/WordCloud_visualization.py
https://github.com/yuanshuo1022/text-similarity-server/blob/master/demo/wiki_word2vec_visualization.py

web应用启动使用:

安装flask框架
pip install 
注:flask-cors如果安装不上这在pycharm中设置:file-setting-Project-python Interpreter中找到对应依赖添加

启动应用
python main.py

About

基于词向量的文本相似度分析--后端

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published