inter-word-embedding

inter-sampling is a non-neural network approach for semantic word embedding that uses context distribution along with euclidean distance and vocab sampling. Now we are using SVD for dimension reduction and concurrency. Methods and formulas will be documented in the first opportunity.

note: code is dirty, without comment, and not user-friendly.

Part of a pre-trained model uploaded. Just run the notebook and extract data/text5-AC-MODEL-4000.tar.gz. If one wants to train another model, it just needs to open the inter-sampling.go and change the "data/text5" to whatever you want. Other variables are as follow:

DATASET_FILE: a filename contained with text data. Each line, one document.
WINDOW: number of vocabs that script should use to be base of judgment.
MAX_RATE: Do not change it. It defines the max rate of vectors. Eight means the vectors are between 0-8.
DIMENSION: Dimension specifier. default 100. The higher, the better and slower.
SAMPLE_ACCURACY: number of samples that specifies vectors. The higher, the better and slower.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
LICENSE		LICENSE
README.md		README.md
inter-sampling.go		inter-sampling.go
plotting-vectors.ipynb		plotting-vectors.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inter-word-embedding

About

Releases

Packages

Languages

License

callforpapers-source/inter-word-embedding

Folders and files

Latest commit

History

Repository files navigation

inter-word-embedding

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages