MeanThe first statistic that I include here is the most common statistic with which you have likely ever worked. You probably know it be the name "average" but in the field of statistics, you will find it referred to by "mean," "arithmetic mean," or "arithmetic average." It probably doesn't need much of an explanation, as most students learn how to calculate averages very early on in school! It represents a calculated measure of the center of a distribution of values, simply obtained by adding up all of the values and then dividing that sum by the number of values you added together. (It is important to be aware that there are different types of means in statistics: sample and population means. I describe these in more detail in a separate post. For the sake of demonstration, consider the math in this post to describe samples instead of populations.)
There are a couple of important points to make about the notation involved in calculating means. The first is regarding the actual mathematical symbol for mean (because you don't want to always have to write down the word "mean" in your solutions!). The symbol for mean is written as an x (or whatever variable you are using) with a small horizontal bar over it, like this:
You say this symbol as "x bar." You can use and will see this notation wherever an arithmetic mean value is being used in statistical analysis and calculations. It is extraordinarily common, yet would appear confusing at first to a student who is new to statistics, because it looks like nothing they had ever dealt with before.
In addition to this, there is a second notation that you will see that may need an explanation first. This notation is used to describe the arithmetic mean formula. I explained the concept and process of calculating a mean above, but here is one way in which you could write this down in your work:
Mathematically, this simply says that the mean is equal to the sum of all your values (x1 all the way up to xwhatever) divided by the total number of values that you are adding up. This average formula could also be represented in another way, like this:
This formula for mean is saying the same thing as the previous one. The 1/n part is the same in both equations (in the first, dividing by n is the same as multiplying by 1/n). The fancy capital E-looking thing is the Greek capital letter sigma (which is not equivalent to E, but rather to S), and in math, it means to "sum up everything in the following equation." And the xi part represents all the values of x. So the sigma would start with x1, then add x2, then add x3, and so on, for all the values of x. (I will do a separate post on sigma notation to perhaps explain this a bit better, with more examples.)
An important concept to understand about the mean is just what exactly it represents, and how it can be influenced by its dataset. For a collection of values that are similar, the mean will provide a fairly reasonable measure of the center of this data. However, if you consider the inclusion of any extreme values, you can see how this would cause the arithmetic average to be biased in its direction. The more extreme the outliers are, the greater their effect on the mean. Try for yourself to see what I mean. Consider the dataset of values 1, 2, 3, 4, 5, and then consider the dataset of 1, 2, 3, 4, 20. You can see that the mean is pulled in the direction of the outlier. This is simply a result of how the mean is calculated, and is one of the flaws of it as a statistical tool. Similarly, if have a distribution of values in your dataset that are "skewed" (that is, if you graph them out, you will see that the graph isn't symmetrical, and it has a tail on one end), the long tail will tend to bias the measurement of the mean in its direction. Because of these characteristics, the mean is considered not to be a resistant measure (in that it can't resist being pulled by extreme data). However, despite these points, the mean is an incredibly useful tool for statistics, if for no other reason that it is so simple to use, and provides a very quick evaluation of how the dataset is centered.
There are a few differences to consider when comparing the mean and the median. Since the mean uses the actual data values in its calculation, it is influenced more by extreme or skewed data. Therefore, the median will represent a better estimate of the center of the distribution. In this sense, the median can be considered to be a more resistant measure than the mean. So, if you have a symmetric distribution of data, the mean and the median will be very similar. However, when you have skewed distributions, the mean will be located more in the long tail of the distribution, further away from the median. Consider, if you have a set of prices in a data set, and then you double the highest price, the median will be the same in both cases, though the doubled price point will push the mean much further away and more towards that extreme end of the distribution. The mean and the median provide differing assessments of the central tendency of a distribution, but both functions are extremely useful in statistical analysis.
I hope that this post has been informative and helpful for you! If it was, please don't forget to hit the +1 button below, or click here to share by tweeting about it!