Thursday, August 30, 2012

Sample Mean vs. Population Mean


I did not mention this is my last post that outlined some basic statistical functions related to central tendency, but when studying stats, it is important to understand that you will be dealing with two kinds of means: a sample mean, and a population mean.  Conceptually, they both do the same kind of thing, though their meanings are slightly different.  It is a very good idea to know when to use sample mean vs population mean, and I will try to go over these concepts and their uses in this post.

Mathematically, I already explained how you determine a mean value.  It is what you have likely always known as an average value, and you can very easily find it using the mean formula.  Without actually writing out the equation, you already know that the mean is the sum of all your values, divided by the total number of your values.  This is straightforward and nothing new.  Here is where I am going to make a distinction that you need to be aware of.

In statistics, you deal with populations.  Populations are complete groups of people, of things, of measurements.  As an example, you likely know of the population of the planet Earth.  That refers to all of the people on the planet.  Or, you could have a population of bald eagles in a nesting ground, or a population of Ferrari sports car manufactured in 2011.  Populations refer to the whole group of whatever you are talking about.  However, in many cases, you don't have access to data about the entire population.  You only have access to a subset of that population... a sample of the population.  So, a sample can be considered to be a small part of the population, but is representative of that population as a whole.

A sample could also be looked at as only an estimate of the larger population.  They are frequently sufficient enough to work with, since having data for an entire population could involve a very complicated and long set of data, and the closer your sample size is to your population size, the more accurate this estimate becomes.  This is why people tend to question things that are only based upon a few observations... error is higher when sample size is smaller.  More observations means less error.

So then, with those definitions in mind, you should hopefully be able to understand what is meant by population mean and sample mean.  Literally, a population mean is the average of the entire population, whereas the sample mean is the average of a sample (which represents a larger population).  Of course, since this is mathematics, we have different ways to write the notation for these two statistics concepts.

When we are talking about a population mean, where we have data about all of the subjects or measurements of a given population, we represent that data by the Greek letter mu, which looks like a fancy lower-case u:


This is calculated by summing all of the values in the entire population, and then dividing by the total number of values in that population, which is denoted by a capital N for a population.

On the other hand, when we are dealing with a sample mean (a subset that is representative of a whole population), we denote this function by the aforementioned symbol, x-bar:


As before, we find this by summing all of the values in your set, and then dividing by the total number of values in your set, in this case, the number being denoted by a lower-case n for a sample.

As I mentioned, calculating these values means essentially doing the same thing.  However, in stats, it is wise to pay attention to the group that you are analyzing.  Making a mistake at this point could lead to much larger errors in any further statistical analysis.  Keep in mind that a sample mean is an approximation of a population mean, and that approximation becomes more accurate as the size of your sample (n values) approaches the size of your whole population (N values).


No comments:

Post a Comment

Related Posts