Calculating Measures of Central Tendency in Microsoft Excel
The mean, median, and mode are often referred to as measures of central tendency, meaning that they tell us information about the center of a distribution. In this hub, I will cover how to find these measures using Microsoft Excel. However, it is first useful to have a firm grasp of the mean, median, and mode before learning how to locate them using computer software. I will briefly overview these concepts. For more detailed information on what they are and how to calculate them by hand, check out this hub.
Finding the Mean: The "Center of Gravity" of a Distribution
The mean is simply the average of all values for a particular interval/ratio level variable. Using the data on age from the GSS 2008 subsample we could calculate the mean by adding together every value for age and dividing by the total number of cases (there are 20). The calculation would look like this:
(48+ 47+ 32+ 37+ 62 + 40 + 26+ 36 + 37 + 61 + 32 + 42 + 42 + 45 + 66 + 37 + 46 + 47 + 48 + 49) ÷ (20) = 43.95
With large data sets, a manual calculation of the mean is extremely tedious and with so many values there is a greater chance for error. Luckily, Microsoft Excel has a built in function designed to calculate the mean. To perform this computation in Excel, select all of the cells for which you would like to calculate the mean. In this example we would be selecting cells B2 to B21. Click and hold down your mouse in cell B2, then drag down the box that appears to cell B21. Once all data have been highlighted, click the downward arrow next to the “AutoSum” (∑) button and select “Average.” Once you click “Average,” the mean will appear directly below the selected data (in cell B22) by default. The computed average should also be 43.95.
Finding the Median: The Center Value of a Distribution
The median is the position that evenly divides the distribution in half. It is also the 50th percentile of a distribution. The median can be obtained for ordinal and interval-ratio level variables. If all data for a particular variable were arranged from highest to lowest, the median would be the value that falls directly in the center. If two values are at the center of a distribution (this often occurs in data sets with an even number of cases), simply take the average of the two values in the center.
To calculate the median of a variable in Excel, select the cell directly below the column containing the data you wish to analyze.
For example, let’s find the median for job satisfaction using the GSS 2008 subsample. Select the cell directly below the data column for the variable you are analyzing. In this case, select cell F22. Click on the drop-down menu arrow next to the “AutoSum” (∑) button on the “home” tab and select “more functions.” In the menu that appears change the category to “Statistical” and scroll to locate and select “median.” Press okay.
You will now be taken to a new window called “function arguments.” Excel has assumed by default that we wish to calculate the median for the values in the column directly about (cells F2 to F21). Once determining that range in “Number 1” is correct, click “OK.” If Excel does not make the correct assumption, you will have to enter the rage of cells manually before pressing “OK.” The median will now appear in the selected cell.
The median job satisfaction in the GSS 2008 subsample corresponds to the numerical value 3. Looking back to our coding scheme we can see that 1=very dissatisfied, 2= a little dissatisfied, 3= moderately satisfied, and 4=very satisfied. The median job satisfaction is therefore “moderately satisfied” (note: for categorical data that has been coded, the mean is always corresponds to category represented by a given numerical code, never to the numerical code itself).
- The median can also be obtained from an existing frequency distribution including cumulative percentages by locating the value that falls at the 50th percentile.
Finding the Mode: The Most Common Value in the Distribution
The mode is the most frequently occurring value for a particular variable. The mode can be calculated for nominal, ordinal, or interval-ratio level data. One method of finding the mode is to count up the frequencies of each value for a particular variable and determining which value occurs most often. If we have already created a frequency distribution for a given variable, finding the mode is fairly simple. Let’s take another look at the frequency distribution we created for “Highest Rank (Symbols)” using the College Rating Survey data (located on the right). This frequency distribution was created in this previous hub.
Looking at the frequency distribution on the left, we can see that there are two most frequently occurring values for “Highest Rank (Symbols). Both Harvard and Yale have a frequency of 3. “Highest Rank (Symbols)” is therefore bimodal. Harvard and Yale are the modes. (Note: The mode always corresponds to the category/categories with the highest frequency/frequencies, but never to the frequency itself.)
- The mode can also be obtained using built-in Excel functions. To do this, follow the same steps provided to calculate the median, but choose “mode” from the AutoSum drop-down selection “More Functions…” (make sure you have chosen the category “statistical.”
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.