Skip to content

Converting message (text data) to TFIDF matrix and using Multinomial Naive Bayes we detect whether given message is spam or ham

Notifications You must be signed in to change notification settings

MakrandBhandari/Spam-Detection-using-Multinomial-Naive-Bayes-Classifier

Repository files navigation

Spam-Detection-using-Multinomial-Naive-Bayes-Classifier

  • Any message wether spam or ham is text data which is in unstructured format
  • Used TFIDF word embedding technique to convert text data (unstructured data) to structured data, with it's advantage that it gives higher weightage to important rare words and lesser weightage to unimportant frequent words with respect to the whole corpus.
  • Term Frequency : TF(term t) = No. of times term (t) occured in particular given doc./ Total no. of words in that particular given doc = (r/w). It is basically count vectorizer.
  • Inverse Document Frequency : IDF(term t) = loge(total no. of doc./ (1 + no. of doc. containing term (t)))= loge(N/1 + n). it gives higher weightage to important rare words and lesser weightage to unimportant frequent words with respect to the whole corpus.
  • TFIDF matrix = TF * IDF
  • Using TFIDF matrix built Multinomial Naive Bayes (used when features have discrete values) to predict whether the given message is spam or ham

About

Converting message (text data) to TFIDF matrix and using Multinomial Naive Bayes we detect whether given message is spam or ham

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published