Skip to content

callforpapers-source/inter-word-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

inter-word-embedding

inter-sampling is a non-neural network approach for semantic word embedding that uses context distribution along with euclidean distance and vocab sampling. Now we are using SVD for dimension reduction and concurrency. Methods and formulas will be documented in the first opportunity.

note: code is dirty, without comment, and not user-friendly.

Part of a pre-trained model uploaded. Just run the notebook and extract data/text5-AC-MODEL-4000.tar.gz. If one wants to train another model, it just needs to open the inter-sampling.go and change the "data/text5" to whatever you want. Other variables are as follow:

  • DATASET_FILE: a filename contained with text data. Each line, one document.
  • WINDOW: number of vocabs that script should use to be base of judgment.
  • MAX_RATE: Do not change it. It defines the max rate of vectors. Eight means the vectors are between 0-8.
  • DIMENSION: Dimension specifier. default 100. The higher, the better and slower.
  • SAMPLE_ACCURACY: number of samples that specifies vectors. The higher, the better and slower.

About

a non-neural network approach for word embedding

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published