Course Overview
Google translate instantly translates between any pair of over eighty human languages like French and English. How does it do that? Why does it make the errors that it does? And how can you build something better? Modern translation systems like Google Translate and Bing Translator learn to translate by reading millions of words of already translated text. This course will show you how they work. We cover fundamental building blocks from linguistics, machine learning (especially deep learning), algorithms, and data structures, showing how they apply to a difficult real-word artificial intelligence problem.
Instructor
- Lane Schwartz - One-on-one discussions are available by appointment.
Office Hours
- Office hours will be held via Zoom; times will be announced near the end of the first week of class
Time and place
- Online asynchronous
This class will be conducted entirely online, in an entirely asynchronous mode. What does that mean?
Readings and videos:
- You will be responsible for approximately 40 pages per week of assigned readings
- You will be responsible for watching approximately 3 hours per week of pre-recorded videos
At any time, you can use the course forum on Campuswire:
- to ask the instructor and TAs private questions
- to ask questions that the instructor and your classmates can see and answer
Follow the forum questions:
- I will regularly post updates and announcements on the forum
- You are expected to read the questions posted by your classmates and the answers that are posted to those questions. You are responsible for material discussed in the forum, even if it was not covered in the readings or videos.
Assessments:
- You will be responsible for completing a log (approximately weekly) recording the extent to which you engaged with the readings and videos
- You will be responsible for completing online quizzes testing your understanding of the content from the readings and videos
- You will be responsible for completing an online mid-term exam testing your understanding of the content from the readings and videos
- You will be responsible for completing homework assignments and turning them in electronically (details will be provided with each assignment)
- You will be responsible for completing a final project and turning it in electronically (details will be provided when assigned)
Quizzes and reading logs will be hosted on PrairieLearn.
Course Objectives
The main goal of this course is introduce students to the models and algorithms that underlie modern machine translation techniques.
By the end of this course, you will be able to:
- Understand the issues and tasks involved in preparing corpora for use in machine translation, and be able to perform such tasks
- Implement a statistical word alignment model and an algorithm to estimate such a model from a sentence-aligned parallel corpus
- Understand phrase-based statistical machine translation and implement a decoding algorithm for a phrase-based MT model
- Understand statistical and neural language models and explain how these models relate to statistical and neural machine translation
- Describe and implement a recurrent neural sequence-to-sequence translation model and decoding algorithm using a current neural toolkit
- Conduct a thorough literature review on the state-of-the-art in machine translation for a particular language pair
- Participate in a WMT shared task by training and testing a complete end-to-end machine translation system for a particular language pair, and writing a paper thoroughly describing the system
Course Materials
Required textbooks
Statistical Machine Translation (errata) by Philipp Koehn
- Electronic access is available through CambridgeCore by selecting Institutional Login through Shibboleth.
- You can rent or purchase the Kindle eBook version of this text from Amazon
- You can purchase a paper copy from the campus bookstore or from Amazon
Neural Machine Translation by Philipp Koehn
- Electronic access is available through CambridgeCore by selecting Institutional Login through Shibboleth.
- You can purchase the Kindle eBook version of this text
- You can purchase a paper copy the campus bookstore or from Amazon.
Course Organization
This course is organized into 9 major topics, with each topic taking 1-2 weeks, for a total of 15 weeks of content.
Each major topic will include:
- An overview of the topic and its contents
- A list of learning objectives which will summarize expected learning goals
- A list of required readings for the topic
- A playlist of required videos for the topic
Major topics
- Introduction to machine translation
- Managing and modelling data
- Statistical word-based machine translation models & algorithms
- Statistical phrase-based machine translation models & algorithms
- Statistical language models
- Feed-forward and recurrent neural language models
- Recurrent neural translation models & algorithms
- Transformer-based neural translation models & algorithms
- Building and running an end-to-end machine translation system
Course Policies
Academic Integrity
This course follows the University of Illinois Student Code regarding Academic Integrity. The College of Liberal Arts and Sciences also has an excellent web page on the topic. You are required to thoroughly read these resources no later than the Wednesday of the first week of class, and to thoroughly understand your responsibilities with regard to Academic Integrity.
All work submitted for this class must be solely your own. Violations of Academic Integrity include, but are not limited to, copying, cheating, and unapproved collaboration. Violations will not be tolerated and can result in a failing grade. Ignorance is not an excuse.
Do not hesitate to ask the instructor(s) if you are ever in doubt about what constitutes plagiarism, cheating, or any other breach of academic integrity.
Communications
Course announcements, assignments, and due dates will all be communicated to students as announcements on the the course forum on Campuswire, which should be the primary mechanism for communication in this course. Students may post questions privately so that only the instructor can see the question. Other questions may be viewed by classmates, so that classmates can provide a peer response in addition to that provided by the instructor. Questions may also be asked anonymously, so that neither the instructor nor classmates will see the poster’s name.
Office hours and one-on-one meetings with students will take place over Zoom.
Grading
Students will be assessed on the extent to which they have attained the learning goals & outcomes through a combination of practical exercises, homework assignments, projects, quizzes and exams.
- Quizzes and reading log: 10%
- Homework:
- HW1: 10%
- HW2: 10%
- HW3: 10%
- HW4: 10%
- HW5: 10%
- Project:
- Proposal: 5%
- Literature review: 10%
- Checkpoint 1: 5%
- Checkpoint 2: 5%
- Final report: 15%
Grades will be assessed on a 10-point fixed letter grade system with no rounding. Grading on a curve will not be used. In the table below, square brackets and parentheses are used to indicate inclusive and exclusive endpoints, respectively.
Letter grade | Percentage range |
---|---|
A+ | [97-100] |
A | [93-97) |
A- | [90-93) |
B+ | [87-90) |
B | [83-87) |
B- | [80-83) |
C+ | [77-80) |
C | [73-77) |
C- | [70-73) |
D+ | [67-70) |
D | [63-67) |
D- | [60-63) |
F | [0-60) |
Late work
Assignment submissions will be penalized by 10 percentage points each day past the deadline for four days past the deadline. Assignments submitted five or more days past the deadline will not be given credit. If there is an unforeseeable emergency which prevents you from submitting an assignment on time, please contact the instructor as soon as you can.
For some or all homework assignments, a solution may be presented to the class after the original homework deadline. Under no circumstances will work be accepted after a solution has been presented to the class.
Readings and videos
Students are expected to regularly review the schedule of assigned readings and video lectures. The schedule is subject to change.
Students are expected to and required to complete all assigned readings and video lectures prior to the class for which they are assigned.
Students are expected to read and participate in Q&As and class discussions on the the course forum on Campuswire
Learning Goals & Outcomes
Students are expected attentively read assigned readings, attentively view assigned videos, and complete all assigned work prior to the specified deadlines.
Students who do so are expected to attain the learning goals and outcomes.
Disabilities
To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call +1-217-333-4603, e-mail disability@illinois.edu or go to the DRES website.
If specific accommodations will be requested for this course, the student is asked to inform the course instructor as soon as possible, ideally within the first week of class or as soon as a DRES letter has been prepared, by following these steps:
- Post a private message addressed to both the instructor and the TA on the course forum on Campuswire
- Use the subject heading DRES letter
- Use the dres topic tag
- In the body of the post, the student should attach a PDF of their DRES letter
- In the body of the post, the student should list which specific accommodations mentioned in the DRES letter are being requested for this class
Religious Observances
Illinois law requires the University to reasonably accommodate its students’ religious beliefs, observances, and practices in regard to admissions, class attendance, and the scheduling of examinations and work requirements. You should examine this syllabus at the beginning of the semester for potential conflicts between course deadlines and any of your religious observances. If a conflict exists, you should notify your instructor of the conflict and follow the procedures described here to request appropriate accommodations. This should be done in the first two weeks of classes.
Sexual Misconduct Reporting Obligation
The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX Office. In turn, an individual with the Title IX Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.
A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found here. Other information about resources and reporting is available here.