Course Overview
Grading
Final Project
Course Resources
Schedule and Readings
- Piazza - we'll use this for Q&A, and this will be the fastest way to reach the course staff. Note that you can post anonymously, and/or make posts visible only to instructors for private questions.
Understanding language is fundamental to human interaction. Our brains have evolved language-specific circuitry that helps us learn it very quickly; however, this also means that we have great difficulty explaining how exactly meaning arises from sounds and symbols. This course is a broad introduction to linguistic phenomena and our attempts to analyze them with machine learning. We will cover a wide range of concepts with a focus on practical applications such as information extraction, machine translation, sentiment analysis, and summarization.
Prerequisites:
- Language: All assignments will be in Python using Jupyter notebooks, NumPy, and TensorFlow.
- Time: There are 6-7 substantial assignments in this course as well as a term project. Make sure you give yourself enough time to be successful! In particular, you may be in for a rough semester if you have other significant commitments at work or home, or take both this and any of 210 (Capstone), 261, or 271 :)
- MIDS 207 (Machine Learning): We assume you know what gradient descent is. We'll review simple linear classifiers and softmax at a high level, but make sure you've at least heard of these! You should also be comfortable with linear algebra, which we'll use for vector representations and when we discuss deep learning.
Contacts and resources:
- Course website: GitHub datasci-w266/2021-summer-main
- Piazza - we'll use this for Q&A, and this will be the fastest way to reach the course staff. Note that you can post anonymously, and/or make posts visible only to instructors for private questions.
- Email list for course staff (expect a somewhat slower response here): mids-nlp-instructors@googlegroups.com
Live Sessions:
- (Section 1) Monday 6:30 - 8p Pacific (Joachim Rahmfeld)
- (Section 2) Tuesday 2 - 3:30p Pacific (Peter Grabowski)
- (Section 3) Tuesday 4 - 5:30p Pacific (Daniel Cer)
- (Section 4) Wednesday 6:30 - 8p Pacific (Mike Tamir, Paul Spiegelhalter)
- (Section 5) Thursday 4 - 5:30p Pacific (Zack Alexander)
- (Section 6) Friday 4 - 5:30p Pacific (Mark Butler)
Teaching Staff Office Hours:
- Zack Alexander: Thursday immediately after his live session
- Mark Butler: Monday at 1pm PST and Friday immediately after his live session
- Daniel Cer: Friday at noon PST
- Peter Grabowski: Tuesday at 1pm PST
- Joachim Rahmfeld: Wednesday at noon PST
- Mike Tamir/Paul Spiegelhalter: Wednesday immediately after his live session
- Drew Plant: Monday at 6pm PST
- Gurdit Chahal: Tuesday at 3pm PST
Office hours are for the whole class; students from any section are welcome to attend any of the times above.
Async Instructors:
- Dan Gillick
- James Kunz
- Kuzman Ganchev
Your grade report can be found at https://w266grades.appspot.com.
Your grade will be determined as follows:
- Weekly Assignments: 40%
- Final Project: 60%
- Participation: Up to 10% bonus
There will be a number of smaller assignments throughout the term for you to exercise what you learned in async and live sessions. Some assignments may be more difficult than others, and may be weighted accordingly.
Participation will be graded holistically, based on live session participation as well as participation on Piazza (or other activities that improve the course this semester or into the future). Do not stress about this part.
We curve the numerical grade to a letter grade. While we don't release the curve, it usually results in about a quarter of the class each receiving A, A-, B+, and B. Exceptional cases receive A+, C, or F, as appropriate.
A word of warning: Given that we (effectively) release solutions to assignments in the form of unit tests, it shouldn't be surprising that most students earn near perfect scores. Since the variance is so low, assignment scores aren't the primary driver of the final letter grade for most students. A good assignment score is necessary, but not sufficient, for a strong grade in the class. A well structured, novel project with good analysis is what makes the difference between a high B/B+ and an A-/A.
As mentioned above: this course is a lot of work. Give it the time it deserves and you'll be rewarded intellectually and on your transcript.
We recognize that sometimes things happen in life outside the course, especially in MIDS where we all have full time jobs and family responsibilities to attend to. To help with these situations, we are giving you 5 "late days" to use throughout the term as you see fit. Each late day gives you a 24 hour (or any part thereof) extension to any deliverable in the course except the final project presentation or report. (UC Berkeley needs grades submitted very shortly after the end of classes.)
Once you run out of late days, each 24 hour period (or any part thereof) results in a 10 percentage point deduction on that deliverable's grade.
You can use a maximum of 2 late days on any single deliverable. We will not be accepting any submissions more than 48 hours past the original due-date, even if you have late days. (We want to be more flexible here, but your fellow students also want their graded assignments back promptly!)
We don't anticipate granting extensions beyond these policies. Plan your time accordingly!
If you run into a more serious issue that will affect your ability to complete the course, please email the instructors mailing list and cc MIDS student services. A word of warning though: in previous sections, we have had students ask for INC grades because their lives were otherwise busy. Mostly we have declined, opting instead for the student to complete the course to the best of their ability and have a grade assigned based on that work. (MIDS prefers to avoid giving INCs, as they have been abused in the past.) The sooner you start this process, the more options we (and the department) have to help. Don't wait until you're suffering from the consequences to tell us what's going on!
See the Final Project Guidelines
We believe in the importance of the social aspects of learning: between students, and between students and instructors, and we recognize that knowledge-building is not solely occurring on an individual level, but that it is built by social activity involving people and by members engaged in the activity. Participation and communication are key aspects of this course that are vital to the learning experiences of you and your classmates.
Therefore, we like to remind all students of the following requirements for live class sessions:
-
Students are required to join live class sessions from a study environment with video turned on and with a headset for clear audio, without background movement or background noise, and with an internet connection suitable for video streaming.
-
You are expected to engage in class discussions, breakout room discussions and exercises, and to be present and attentive for your and other teams’ in-class presentations.
-
Keep your microphone on mute when not talking to avoid background noise. Do your best to minimize distractions in the background video, and ensure that your camera is on while you are engaged in discussions.
That said, in exceptional circumstances, if you are unable to meet in a space with no background movement, or if your connection is poor, make arrangements with your instructor (beforehand if possible) to explain your situation. Sometimes connections and circumstances make turning off video the best option. If this is a recurring issue in your study environment, you are responsible for finding a different environment that will allow you to fully participate in classes, without distraction to your classmates.
Failure to adhere to these requirements will result in an initial warning from your instructor(s), followed by a possible reduction in grades or a failing grade in the course.
We are not using any particular textbook for this course. We’ll list some relevant readings each week. Here are some general resources:
- Speech and Language Processing (2nd edition) (Jurafsky and Martin)
- Speech and Language Processing (3rd edition draft) (Jurafsky and Martin) - free online!
- NLTK Book - Accompanies NLTK (Natural Language ToolKit) and includes useful, practical descriptions (with python code) of basic concepts.
- Deep Learning (Goodfellow, Bengio, and Courville)
We’ll be posting materials to the course GitHub repo.
Note: the syllabus below might be subject to change. We'll be sure to announce anything major on Piazza.
The course will be taught in Python, and we'll be making heavy use of NumPy, TensorFlow, Keras, and Jupyter (IPython) notebooks. We'll also be using Git for distributing and submitting materials. If you want to brush up on any of these, we recommend:
- Git tutorials: Introduction / Cheat Sheet, or interactive tutorial
- Python / NumPy: Stanford's CS231n has an excellent tutorial.
- TensorFlow: We'll go over the basics of TensorFlow and Keras in Assignment 2.
Effective TensorFlow is a great reference, ranging from the absolute basics through advanced topics like multi-GPU training,tf.learn
, and debugging.
You can also check out the tutorials on the TensorFlow website, but these can be somewhat confusing if you're not familiar with the underlying models. Also, look at the Keras Guide as we will be using Keras in this class.
A few useful papers that don’t fit under a particular week. All optional, but interesting!
- (optional) Chris Olah’s blog and Distill
- (optional) GloVe: Global Vectors for Word Representation (Pennington, Socher, and Manning, 2014)
We'll update the table below with assignments as they become available, as well as additional materials throughout the semester. Keep an eye on GitHub for updates!
Dates are tentative: assignments in particular may change topics and dates. (Updated slides for each week will be posted during the live session week.)
Live Session Slides: [available with @berkeley.edu address]
Note: we will update this table as we release (approximately weekly) assignments. Each assignment will be released around the last live session of the week and due approximately one week later.
Topic | Release | Deadline | |
---|---|---|---|
Assignment 0 | Course Set-up
|
May 16 | |
Assignment 1 | Assignment 1
|
May 23 | |
Assignment 2 | Assignment 2
|
May 30 | |
Project Proposal | Final Project Guidelines | Jun 5 | |
Assignment 3 | Assignment 3
|
Jun 6 | |
Assignment 4 | Assignment 4
|
Jun 13 | |
Assignment 5 | Assignment 5
|
Jun 20 | |
Assignment 6 | Assignment 6
|
Jun 27 | |
Assignment 7 | Assignment 7
|
Jul 6 | |
Project Reports | due July 31 (hard deadline) |
||
Project Presentations | in-class August 2-6 |
Async to Watch | Topics | Materials | |
---|---|---|---|
Week 1 (May 3) |
Introduction |
|
|
Week 2 (May 10) |
5.2 Softmax Classification 5.4 Neural network recap 5.6 Neural network training loss |
|
|
Week 3 (May 17) |
Classification and Sentiment (up to 2.6),
4.2, 4.12 - 4.17, 6.10, 6.12 |
|
|
Week 4 (May 24) |
Classification and Sentiment (2.7 onwards)
Note: you should review Async 5.3, 5.4, and 5.5. |
|
|
Week 5 (May 31) |
Language Modeling I,
4.1-4.4, 5.8, 5.11 |
|
Language model introduction:
Distributed representations:
|
Interlude (Extra Material) | Units of Meaning: Words, Morphology, Sentences |
|
|
Week 6 (June 7) |
Language Modeling II |
|
|
Week 7 (June 14) |
Machine Translation I
Machine Translation II |
|
|
Week 8 (June 21) |
No Async |
|
|
Week 9 (June 28) |
Entities |
|
|
Week 10 (July 5) |
Summarization |
|
|
Week 11 (July 12) |
No Async |
|
|
Week 12 (July 19) |
Part of Speech Supplementary Videos
Dependency Parsing (Parsing I) Constituency Parsing (Parsing II) |
|
[Optional: Interactive HMM Demo] |
Week 13 (July 26) |
Information Retrieval |
|
|
Week 14 (Aug 2) |
In class project presentations |
Thanks for a great semester!