Applied Statistics focuses on the statistical study of data, its collection, description and inference based on sample data. Here are some of the kinds of questions we might ask:
In this course, we will develop the tools to be able to answer these kinds of questions. Simply put:
In Statistics our goal is to understand the uncertainty and variability inherent in every experiment/phenomenon/measurement, and to attempt to control that variability.
Broadly speaking, the coarse is divided into three parts. We will start with Descriptive Statistics, which deals with the various ways of presenting data, their summaries and inter-relationships, and the problems one might encounter when doing that, both from using bad graphing techniques and from relying too much on numerical summaries. You will be able to understand the pitfalls when people and the media quote average numbers and percentages, and you will be able to put those numbers into a proper perspective. You will familiarize yourselves with the various types of graphs, their strengths and weaknesses, as well as common steps to make the information from the graphs more clearly presented.
The second, brief, part of the course deals with the design of experiments, and sampling methods. It is an introduction to the methods used to collect data, and the problems that arise. As an example, during the great depression a popular magazine made an extremely wrong prediction about who would win the presidential elections, which led to the downfall of that magazine. Their description was based on a massive survey that ended up getting answers from almost 2 million people, so it seems very surprising that they would get things wrong. We will investigate the mistakes that they made, and why having this enormous sample size didn’t necessarily help them. We will also touch briefly on some of the fundamental principles employed in designing a study or experiment.
The final part of the course deals with Inferential Statistics. Inferential Statistics concerns itself with making predictions about a population based on information from a small sample. For instance, when CNN reports that Obama and McCain both have 47% support, based on a sample of 1000 people, what does that really tell us about the voting preference of all Americans? And what is it that makes us certain that those 1000 people that were polled are sufficient to make a prediction? Would we have been able to make a better prediction with more people, or does it not matter how many we have after a while? Can we provide some range of values that we can be pretty sure the candidate’s actual percentage would be in? If a baseball player has a better on-base percentage than another player on a particular season, does this mean that they are truly better at getting to base? Or did they just happen to have a better season? At which point is it true skill and not just ‘luck’?
In order to understand the mechanics behind Inferential Statistics, we will need to spend some time studying the basics of Probability Theory and the notion of a random variable. Probability Theory is the mathematical study of random phenomena and processes, and it will provide us a tool to deal with a wide range of situations, from sampling from a large population to simply a basketball player shooting from the free-throw line, or even the various tests for diseases and how reliable they are.
The course has the following main objectives:
On the website you will find a schedule with links to documents for each class day. In those documents you will find notes for the day’s lesson, a reading assignment, and a list of practice problems. You should work on those practice problems, and ask any questions you have about them. You do not have to turn the problems in.
You are expected to attend every class meeting, including labs. You are only allowed to miss 3 classes without excuse. From that point on, every unexcused absence will result in a reduction of your final score by one percentage point, up to a total of 5 points. Excused absences should be arranged in advance, and backed by appropriate documentation. Emergencies will be dealt with on an individual basis. There are very few reasons that would qualify as an excuse for an absence.
Around once or twice a week, I will be assigning homework. These will be collected, and counted on a completion scale, depending on how much effort you have put and how complete your work is. Questions on the quizzes and exams tend to be similar to the homework problems, so it is to your advantage to really understand the homework, and not merely “do it” or copy it just to get it turned in. Homework assignments are 5% of your final grade.
We will be using the Moodle platform for online quizzes. You will typically have one quiz each week. Each quiz has a two hour time limit, and will have a deadline no more than a week after we cover that topic. You are allowed to take the quiz up to 2 times before that deadline, and you receive feedback after each attempt. The average of the two tries will be your final quiz score. You are expected to work on the quizzes on your own, and you are allowed to refer to the book and class notes while taking them. Your quiz score is 10% of your final grade.
There will be two midterms, on Friday February 8th, Friday March 15th, and a final/3rd midterm during finals week. You have to be here for the exams. If you have conflicts with these days, let me know as soon as possible. Do not plan your vacation before you are aware of the finals schedule. In terms of your final grade, the exams you did better on will weigh more.
Throughout the semester, you will work in groups of three on a term project.
Your final grade depends on class attendance, homework, project, quizzes, midterms and the final, as follows:
Component | Percent |
---|---|
Attendance | 5% |
Homework | 5% |
Quizzes | 10% |
Project | 15% |
Worst Midterm | 15% |
Middle Midterm | 20% |
Best Midterm | 30% |
This gives a number up to 100, which is then converted to a letter grade based roughly on the following correspondence:
Letter grade | Percentage Range |
---|---|
A, A- | 90%-100% |
B+, B, B- | 80%-90% |
C+, C, C- | 70%-80% |
D+, D, D- | 60%-70% |
F | 0%-60% |