Unstructured Data Analysis (Graduate) @Korea University
- Syllabus (download)
- Term project groups
- 1조: 박성훈, 이수빈(2018021120), 이준걸, 박혜준
- 2조: 이정호, 천우진, 유초롱, 조규원
- 3조: 백승호, 목충협, 변준형, 이영재
- 4조: 박건빈, 이수빈(2018020530), 변윤선, 권순찬
- 5조: 최종현, 이정훈, 박중민, 노영빈
- 6조: 백인성, 김은비, 신욱수, 강현규
- 7조: 전성찬, 박현지, 문관영
- 8조: 조용원, 정승섭, 민다빈, 최민서
- 9조: 박명현, 장은아, 유건령
- CS224d @Stanford: Deep Learning for Natural Language Processing
- Course Homepage: http://cs224d.stanford.edu/
- YouTube Video: https://www.youtube.com/playlist?list=PLlJy-eBtNFt4CSVWYqscHDdP58M3zFHIG
- CS224n @Stanford: Natural Language Processing Deep Learning
- Course Homepage: http://web.stanford.edu/class/cs224n/syllabus.html
- Youtube Video: https://www.youtube.com/playlist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6
- Deep Natural Lanugage Processing @Oxford
- Course Homepage: https://github.com/oxford-cs-deepnlp-2017/lectures
- The usefullness of large amount of text data and the challenges
- Overview of text analytics methods
- Text data collection: Web scraping
- Introduction to Natural Language Processing (NLP)
- Lexical analysis
- Syntax analysis
- Other topics in NLP
- Reading materials
- Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57. (PDF)
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493-2537. (PDF)
- Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709. (PDF)
- Perception, Multi-layered Perceptron
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Practical Techniques
- Bag of words
- Word weighting
- N-grams
- Word2Vec
- GloVe
- FastText
- Doc2Vec
- Reading materials
- Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155. (PDF)
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. (PDF)
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). (PDF)
- Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). (PDF)
- Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606. (PDF)
- Dimensionality Reduction
- Supervised Feature Selection
- Unsupervised Feature Extraction: Latent Semantic Analysis (LSA) and t-SNE
- R Example
- Reading materials
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391. (PDF)
- Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology, 38(1), 188-230.
- Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. (PDF) (Homepage)
- Document similarity metrics
- Clustering overview
- K-Means clustering
- Hierarchical clustering
- Density-based clustering
- Reading materials
- Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323. (PDF)
- Topic modeling overview
- Probabilistic Latent Semantic Analysis: pLSA
- LDA: Document Generation Process
- Reading materials
- Hofmann, T. (1999, July). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc. (PDF)
- Hofmann, T. (2017, August). Probabilistic latent semantic indexing. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 211-218). ACM.
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (PDF)
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. (PDF)
- LDA Inference: Gibbs Sampling
- LDA Evaluation
- Recommended video lectures
- LDA by D. Blei (Lecture Video)
- Variational Inference for LDA by D. Blei (Lecture Video)
- Document classification overview
- Naive Bayesian classifier
- RNN-based document classification
- CNN-based document classification
- Reading materials
- Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. (PDF)
- Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 649-657) (PDF)
- Lee, G., Jeong, J., Seo, S., Kim, C, & Kang, P. (2018). Sentiment classification with word localization based on weakly supervised learning with a convolutional neural network. Knowledge-Based Systems, 152, 70-82. (PDF)
- Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480-1489). (PDF)
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. (PDF)
- Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025. (PDF)