Introduction
What is machine learning?
"Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. What this means in most cases, is that an algorithm is given a set of data and infers information about the properties of the data--and that information allows it to make predictions about other data it might see in the future. This is possible because almost all nonrandom data contains patterns, and these patterns allow the machine to generalize. In order to generalize, it trains a model with what it determines are the important aspects of the data."
--Toby Segaran, Programming Collective Intelligence
A more concise definition:
Machine learning allows computers to observe input and produce a desired output, either by example or through identifying latent patterns in the input.
This course takes an application-driven approach to current topics in machine learning. The course covers supervised learning (regression, classification, structured prediction) and unsupervised learning (dimensionality reduction, bayesian modeling, clustering). The course will also consider challenges resulting from learning applications. We will cover popular algorithms (naive Bayes, SVM, perceptron, HMM, k-means, deep learning) and will focus on how statistical learning algorithms are applied to real world applications. Students in the course will implement several learning algorithms in homeworks and demonstrate their understanding of machine learning in a final project.
Goals
The course has several goals:
Requirements
Students are expected to have:
Grading
Lectures
This semester all lectures will be online via Zoom. Lectures will have two parts.
1) A pre-recorded session distributed ahead of time. These sessions will be approximately 20-30 minutes in length and will introduce the concepts covered in the lecture. Students are expected to watch these videos before attending the live lecture.
2) A live lecture at the scheduled course time that continues presenting the material from the pre-recorded session, answers questions, and includes breakout activities for students. We will record the live lectures, excluding the breakouts, for students unable to attend the live lecture.
Homework
Homeworks can contain both written problems and programming assignments. Programming portions of the homework will require students to implement machine learning algorithms. We use a fully automated grading system for programming assignments. More details will be given when the first assignment is distributed. Written portions will ask questions that cover fundamental concepts covered in the course.
Homeworks are to be turned in electronically. Written assignments will be turned in using Gradescope. Programming assignments will be turned in on the course website (cs475.org) unless otherwise directed. In general, homeworks will only cover material taught at least one week before the assignment is due. For example, if an assignment is handed out on Monday Sept 1, due in one week on Monday Sept 8, all material in the homework will have been taught by the end of the Monday Sept 1.
Students are expected to complete each homework on their own, unless another collaboration policy is stated in the homework.
Exams
Since the course will be fully online students will take exams via Blackboard. Each exam will be open book and open notes. Exams will be similar to previous years but will be more focused on multiple-choice, true false and short answer questions. The exams will include various cheating-prevention and detection measures to ensure each student completes their own work.
Projects
The goal of the project is for students to extend their ML knowledge by exploring a dataset and solving an applied problem. Your project must fall into one of the following areas we have studied in the course: supervised learning, unsupervised learning, graphical models, interpretability, or reproducibility. We will specify datasets and other guidance for getting started, but you are encouraged to work on original research that can lead to a published paper of your own, though the project is not expected to be of publishable quality. Projects will be done in small groups, with each partner receiving the same grade.
Recitation Sections
Recitation sections are optional class meetings led by the TAs that take place on Friday. Topics include covering additional background material, reviewing course material, exploring additional topics, and reviewing homework solutions.
In rare instances, a class lecture may be moved to the recitation section time slot on Friday. These will be announced in advance.
Late Policy
Late homework and project assignments will be accepted up to 24 hours past the due date for a 25% reduction in maximum possible grade. Exceptions will only be given in extreme cases. However, every student is permitted to hand-in homeworks and project assignments late without penalty using a 72-hour grace period for the entire semester. This means that you can choose to hand-in the first homework 70 hours late and the second homework 2 hours late, but then every other homework and project assignment must be on time for the rest of the semester. You may divide these 72 hours as you see fit, but once you have used up all of the time, you will be given no more. We will round-up to the hour (minutes don't count).
It is your responsibility to track your homework late hours.
Missing a deadline can be stressful, and it is not always within your control. Issues arise both academic and personal that cause you to fall behind. Students often blame themselves and believe that if they work harder they can catch up, only to fall further behind. We understand that difficult situations arise, and we want to help you manage them to ensure you can stay on track with the course. The key is to email the instructor as soon as possible when you think you may miss a deadline. Note that you don't have to email if you plan to use late hours, only if you believe the late hours will be insufficient, or if you have an emergency situation. If you contact the instructor ahead of time, we may be able to work together to ensure that you are not penalized for late submissions. However, if we find out after the deadline has passed, we are very limited in our ability to assist.
Textbook
In previous years, we used the Bishop book. Since this book first appeared there has been an explosion in new machine learning books. Many are focused on specific methods or applications.
The official book for this class will be: There is currently only one edition of the book, but there are multiple printings. As of September 2013, the latest printing is the fourth printing. New printings fix many errors found in earlier books. Page numbering can be different between printings, but section numbers (which we will be using) are the same. Online errata can be found here.
Many machine learning courses select readings from multiple sources without providing a single official book. We prefer to select a single book since it presents every topic using the same style, approach and notation. The consistency in presentation across multiple topics aids learning and provides a resource for exploring topics in more depth.
In addition to the official book, we will provide other readings on specific topics that offer a different (and hopefully better) presentation of the material. These readings will be available freely online.
Switching textbooks from Bishop to Murphy comes at a cost. We selected Murphy because it is more up to date, covers more topics, and uses notation common in the machine learning community. However, Bishop does have its own advantages. We think the presentation of some topics is superior to Murphy. Also, since the class was originally designed to follow Bishop, the presentation of some topics in class will be flipped from Bishop. The reading in Murphy will jump around between chapters. If you encounter a topic in the reading with which you are not familiar, you may want to go back to an earlier chapter in the book where the topic is first explained.
Overall, we think changing to Murphy will be an improvement. However, for those who prefer to use Bishop, we will list the corresponding readings in Bishop as optional.
A few other relevant textbooks.
Cheating
We take cheating very seriously. We expect every student to have read the Department of Computer Science Academic Integrity Code and will hold students accountable to it. So that course policies are clear, here is review of relevant rules (in addition to the integrity code.)
Cheating
Not-Cheating
We are aware that many of the programming assignments will ask students to implement algorithms already available online. We will try to avoid direct duplication when possible. However, you are not permitted to copy any part of your code from other libraries.
What happens when you cheat?
We will be carefully examining homeworks and exams for signs of cheating. If you cheat, at a minimum you will be given a 0 for the assignment or exam. More likely, you will have the total value of the homework or exam subtracted from your grade, ie. if you cheat on an exam worth 15% of your grade, you will get a 0 on the exam and have an additional 15% of your grade deducted. In some cases, cheating will be reported to the appropriate university board, which can result in failing the class, suspension of expulsion.
Remember:
DO help each other understand the lectures, readings and homeworks.
DO NOT complete each other's homework.
What is machine learning?
"Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. What this means in most cases, is that an algorithm is given a set of data and infers information about the properties of the data--and that information allows it to make predictions about other data it might see in the future. This is possible because almost all nonrandom data contains patterns, and these patterns allow the machine to generalize. In order to generalize, it trains a model with what it determines are the important aspects of the data."
--Toby Segaran, Programming Collective Intelligence
A more concise definition:
Machine learning allows computers to observe input and produce a desired output, either by example or through identifying latent patterns in the input.
This course takes an application-driven approach to current topics in machine learning. The course covers supervised learning (regression, classification, structured prediction) and unsupervised learning (dimensionality reduction, bayesian modeling, clustering). The course will also consider challenges resulting from learning applications. We will cover popular algorithms (naive Bayes, SVM, perceptron, HMM, k-means, deep learning) and will focus on how statistical learning algorithms are applied to real world applications. Students in the course will implement several learning algorithms in homeworks and demonstrate their understanding of machine learning in a final project.
Goals
The course has several goals:
- Students will learn the fundamentals of machine learning
- Students will learn to implement machine learning algorithms
- Students will learn to evaluate how to apply machine learning to different settings
Requirements
Students are expected to have:
- Strong programming skills in Python. There will be considerable programming required for the homeworks.
- Comfort with relevant mathematical topics (linear algebra, multi-variate calculus, probability). The course prerequisites are multivariable calculus (AS.110.202), probability (EN.550.310/EN.550.420), linear algebra (AS.110.201/AS.110.212). The official prerequisites are listed in SIS.
Grading
- Homeworks: 40%
- Midterm: 20%
- Final: 20%
- Project: 20%
Lectures
This semester all lectures will be online via Zoom. Lectures will have two parts.
1) A pre-recorded session distributed ahead of time. These sessions will be approximately 20-30 minutes in length and will introduce the concepts covered in the lecture. Students are expected to watch these videos before attending the live lecture.
2) A live lecture at the scheduled course time that continues presenting the material from the pre-recorded session, answers questions, and includes breakout activities for students. We will record the live lectures, excluding the breakouts, for students unable to attend the live lecture.
Homework
Homeworks can contain both written problems and programming assignments. Programming portions of the homework will require students to implement machine learning algorithms. We use a fully automated grading system for programming assignments. More details will be given when the first assignment is distributed. Written portions will ask questions that cover fundamental concepts covered in the course.
Homeworks are to be turned in electronically. Written assignments will be turned in using Gradescope. Programming assignments will be turned in on the course website (cs475.org) unless otherwise directed. In general, homeworks will only cover material taught at least one week before the assignment is due. For example, if an assignment is handed out on Monday Sept 1, due in one week on Monday Sept 8, all material in the homework will have been taught by the end of the Monday Sept 1.
Students are expected to complete each homework on their own, unless another collaboration policy is stated in the homework.
Exams
Since the course will be fully online students will take exams via Blackboard. Each exam will be open book and open notes. Exams will be similar to previous years but will be more focused on multiple-choice, true false and short answer questions. The exams will include various cheating-prevention and detection measures to ensure each student completes their own work.
Projects
The goal of the project is for students to extend their ML knowledge by exploring a dataset and solving an applied problem. Your project must fall into one of the following areas we have studied in the course: supervised learning, unsupervised learning, graphical models, interpretability, or reproducibility. We will specify datasets and other guidance for getting started, but you are encouraged to work on original research that can lead to a published paper of your own, though the project is not expected to be of publishable quality. Projects will be done in small groups, with each partner receiving the same grade.
Recitation Sections
Recitation sections are optional class meetings led by the TAs that take place on Friday. Topics include covering additional background material, reviewing course material, exploring additional topics, and reviewing homework solutions.
In rare instances, a class lecture may be moved to the recitation section time slot on Friday. These will be announced in advance.
Late Policy
Late homework and project assignments will be accepted up to 24 hours past the due date for a 25% reduction in maximum possible grade. Exceptions will only be given in extreme cases. However, every student is permitted to hand-in homeworks and project assignments late without penalty using a 72-hour grace period for the entire semester. This means that you can choose to hand-in the first homework 70 hours late and the second homework 2 hours late, but then every other homework and project assignment must be on time for the rest of the semester. You may divide these 72 hours as you see fit, but once you have used up all of the time, you will be given no more. We will round-up to the hour (minutes don't count).
It is your responsibility to track your homework late hours.
Missing a deadline can be stressful, and it is not always within your control. Issues arise both academic and personal that cause you to fall behind. Students often blame themselves and believe that if they work harder they can catch up, only to fall further behind. We understand that difficult situations arise, and we want to help you manage them to ensure you can stay on track with the course. The key is to email the instructor as soon as possible when you think you may miss a deadline. Note that you don't have to email if you plan to use late hours, only if you believe the late hours will be insufficient, or if you have an emergency situation. If you contact the instructor ahead of time, we may be able to work together to ensure that you are not penalized for late submissions. However, if we find out after the deadline has passed, we are very limited in our ability to assist.
Textbook
In previous years, we used the Bishop book. Since this book first appeared there has been an explosion in new machine learning books. Many are focused on specific methods or applications.
The official book for this class will be: There is currently only one edition of the book, but there are multiple printings. As of September 2013, the latest printing is the fourth printing. New printings fix many errors found in earlier books. Page numbering can be different between printings, but section numbers (which we will be using) are the same. Online errata can be found here.
Many machine learning courses select readings from multiple sources without providing a single official book. We prefer to select a single book since it presents every topic using the same style, approach and notation. The consistency in presentation across multiple topics aids learning and provides a resource for exploring topics in more depth.
In addition to the official book, we will provide other readings on specific topics that offer a different (and hopefully better) presentation of the material. These readings will be available freely online.
Switching textbooks from Bishop to Murphy comes at a cost. We selected Murphy because it is more up to date, covers more topics, and uses notation common in the machine learning community. However, Bishop does have its own advantages. We think the presentation of some topics is superior to Murphy. Also, since the class was originally designed to follow Bishop, the presentation of some topics in class will be flipped from Bishop. The reading in Murphy will jump around between chapters. If you encounter a topic in the reading with which you are not familiar, you may want to go back to an earlier chapter in the book where the topic is first explained.
Overall, we think changing to Murphy will be an improvement. However, for those who prefer to use Bishop, we will list the corresponding readings in Bishop as optional.
A few other relevant textbooks.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2009
This book covers a large amount of material and is ideal for students with some prior experience with statistics. - Tom Mitchell. Machine Learning. 1997
This book used to be the standard for machine learning, but is out of date. It covers in depth some topics that more recent books overlook. It's presentation is excellent. - Ethem Alpaydin. Introduction to Machine Learning. 2004
Comparable in quality and coverage to the Bishop book, but not often used. A nice supplement for difficult topics.
Cheating
We take cheating very seriously. We expect every student to have read the Department of Computer Science Academic Integrity Code and will hold students accountable to it. So that course policies are clear, here is review of relevant rules (in addition to the integrity code.)
- Every exam, project, homework and any other work completed during this course must be entirely your own. Copying any material from other students or the web is expressly prohibited, unless otherwise permitted in the assignment write up.
- Normally, all exams are closed book unless otherwise stated. This means that students may not reference any material during an exam that is not provided as part of the exam. However, for this semester (Fall 2020) exams are open book and open notes.
- Any collaboration between students during an exam will be considered cheating.
- If a student copies your work, even without your knowledge, you are cheating. It is your responsibility to ensure that no one has access to your work.
- There is no statue of limitations on punishing cheating. Even if we find on the last day of the semester that you had cheated on homeworks, you will be punished.
- Talking with other students to understand homework and course mateiral is strongly encouraged. However, discussing an assignment and cheating are very different things. If you copy someone else's work you are cheating. If you let someone copy your work you are cheating. If someone tells you the answer you are cheating. Everything you hand in must be in your own words based on your understanding of the solution.
Cheating
- Copying any part of a homework from someone else.
- Verbally telling someone the answer to a homework question.
- Looking at someone else's code or solution.
- Obtaining any part of your solution or code from any online resource or software library.
Not-Cheating
- Explaining the homework question to someone else.
- Discuss at a high level the homework.
- Helping someone think through a problem.
- Directing someone to a section of the textbook, reading, or online resource that helps explain a concept.
We are aware that many of the programming assignments will ask students to implement algorithms already available online. We will try to avoid direct duplication when possible. However, you are not permitted to copy any part of your code from other libraries.
What happens when you cheat?
We will be carefully examining homeworks and exams for signs of cheating. If you cheat, at a minimum you will be given a 0 for the assignment or exam. More likely, you will have the total value of the homework or exam subtracted from your grade, ie. if you cheat on an exam worth 15% of your grade, you will get a 0 on the exam and have an additional 15% of your grade deducted. In some cases, cheating will be reported to the appropriate university board, which can result in failing the class, suspension of expulsion.
Remember:
DO help each other understand the lectures, readings and homeworks.
DO NOT complete each other's homework.