This course spans a broad variety of topics that all have one thing in common: The need to work with data. The course is meant as an introduction to the challenges and techniques used to process and manage information. The course is offered at two levels:
The majority of the content is common to both levels. The programming assignments differ considerably. Students in the 3xx course will focus on creating a web service that would provide data services to consumers, while students in the 2xx will focus more on obtaining and processing data from various sources.
Some of the topics we will consider are:
There are many important topics that are related but that we will not cover. For instance, there is a lot of work in data mining and machine learning, to develop algorithms for extracting information from data. Also the question of visualizing information is a whole topic on its own. These aspects all deserve their own courses.
A large part of the material will be covered in my course notes on the website, but there are many resources linked from the notes. We will be covering a variety of topics that no single textbook could hope to cover. We will therefore be using a variety of linked resources, some freely available (e.g. the Python documentation) and some not.
I am asking all students to join the ACM (Association for Computing Machinery) instead of buying a textbook. The ACM delivers resources that advance computing as a science and a profession, and this includes the ACM Learning Center which provides you access to hundreds of online books and videos related to computing. The annual student fee for the ACM is around $20. I encourage you to continue your membership every year and to take advantage of the opportunities and resources it offers.
In the class schedule page you will find, for each class day, a list of links to reading assignments. Your homework will require you to have a solid understanding of the material covered there, so I strongly encourage you not to get behind.
You are expected to attend every class meeting. You are only allowed to miss 3 classes without excuse. From that point on, every unexcused absence will result in a reduction of your final score by one percentage point, up to a total of 5 points. Excused absences should be arranged in advance, and backed by appropriate documentation. Emergencies will be dealt with on an individual basis. There are very few reasons that would qualify as an excuse for an absence.
There will also be numerous in-class group activities that you will be expected to participate in.
There will be lab assignments roughly once a week. You are expected to work on these assignments on your own, but you are welcome to ask me questions, and you are welcome to discuss general topics related to the assignment with your classmates. We will typically start these assignments in class, but you will be expected to complete them outside of class.
There will be one midterm, on Friday, October 14th, and a final/2nd midterm during finals week. You have to be here for the exams. If you have conflicts with these days, let me know as soon as possible. Do not plan your vacation before you are aware of the finals schedule. In terms of your final grade, the exams you did better on will weigh more.
For a large part of the course you will be engaged in a collaborative project with a classmate (groups of 2 only please). The project differs depending on the class. In both instances you are expected to maintain you code in a GitHub repository owned by one of the people in the group.
The goal of the project is to demonstrate the ability to collect data from varying sources and formats, and successfully merge them together. In this project you will need to produce a script or set of scripts that:
The goal of the project is to demonstrate the ability to serve data to clients and response to client requests in various forms. In this project you will need to produce a web service that:
Your final grade depends on class attendance, homework, midterms and the final, as follows:
Component | Percent |
---|---|
Participation | 10% |
Assignments | 30% |
Project | 20% |
Worst Exam | 15% |
Best Exam | 25% |
Component | Percent |
---|---|
Participation | 10% |
Assignments | 25% |
Project | 30% |
Worst Exam | 15% |
Best Exam | 20% |
This gives a number up to 100, which is then converted to a letter grade based roughly on the following correspondence:
Letter grade | Percentage Range |
---|---|
A, A- | 90%-100% |
B+, B, B- | 80%-90% |
C+, C, C- | 70%-80% |
D+, D, D- | 60%-70% |
F | 0%-60% |