Cross-Tabulating Variables: How to Create a Contingency Table in Microsoft Excel
Contingency tables (also called crosstabs) are useful as a rudimentary tool to analyze the relationship between two variables. In a contingency table, one variable is presented in the columns and the other in the rows. By looking at the distribution of one variable across categories of the other, we are able to gain preliminary insight into the association among variables. It is fairly easy to cross-tabulate variables using Microsoft Excel.
Contingency tables are most useful when variables have a limited number of response categories. To usefully analyze variables with multiple categories, it may be necessary to group them (but only if it makes sense to do so). For instance, a survey may ask respondents their income (in dollars or thousands of dollars, etc.). There will likely be a variety of unique response options if information is collected in this way. It may make sense to group data into a limited set of categories (ex. income ranges) for variables such as these.
Note: Data must already have been input into Microsoft Excel and be properly coded in order to create a contingency table. For assistance with this process, please refer to the hub: Analyzing Survey Data in Microsoft Excel: Coding, Inputting Data, and Creating Frequency Distributions
Let’s work through an example step-by-step. We can create a contingency table (crosstab) by using Microsoft Excel's PivotTable function. Data from the General Social Survey (GSS) 2008 will be used to illustrate this process.
Identify the two variables you wish to analyze in a contingency table. You should have a reason for wanting to analyze these two variables together. Identify which variable you would consider to be the independent variable and the dependent variable. The independent variable is generally thought to have an influence on the dependent variable. Sometimes the independent variable is seen as the “cause” and the dependent variable is seen as the “effect” (however, it is important to remember that variables can also be related to one another without one necessarily “causing” the other).
- Ex. I have chosen to analyze the variables “degree” and “income” from the GSS 2008. I wish to look at the relationship between these two variables because I wish to see if there is a relationship between “degree” and “income,” hypothesizing that one’s highest level of education may have an effect on one’s income. More specifically, I am really interested in determining whether those with a college degree have higher incomes than those without a college degree. I identify “degree” as the independent variable and “income” as the dependent variable.
*Because contingency tables are best suited to variables with a limited number of responses, I have collapsed “degree” into 2 categories (instead of the original five). I have coded respondents with a high school diploma or less as 1. I have coded respondents with any type of college degree as 2.
Click anywhere within your data on your spreadsheet (personally I prefer to click near the center). Create a contingency table using the “pivot table” feature in the “insert tab”).
- Ex. Click anywhere near the center of the data (I chose cell E35). Select “Pivot Table” from the insert tab. A window titled “Create PivotTable.” The default option in “Select a table or range” should be correct since I clicked in the center of the data before creating a pivot table. I choose to place the PivotTable in the existing worksheet. To select a cell where the table will begin, immediately after clicking “existing worksheet” scroll down and click on a cell where you would like the PivotTable to begin. The current Pivot table will appear in this spreadsheet beginning in the cell B46. Check the box next to “Add this data to the Data Model.” Then click “OK.”
Specify where your values will be placed in your PivotTable (Contingency Table)
- Ex. A menu will appear on the right side of the screen called “Pivot Table Fields.” In the top of this menu you will see you all of the variables in your data set. In the bottom part of the menu, there is a grid with four fields: filters, rows, columns, and values. Drag the independent variable (degree) to the columns. Drag the dependent variable (income) to the rows. Drag ID # to values (you will need to have a column for ID #s in order to make a PivotTable in Excel). If the ID# in the “values” field is not presented as a count (ex. it is presented as a sum) you will need to change it. Click the arrow next to “values” and select “value field settings.” When the “value field settings” window opens, choose “count” from the list of options. An image of this is presented on the left below. Staying in this window, select the tab “Show values as.” In the drop-down menu select the appropriate percentages. Since our independent variable is located in the columns of our contingency, we select “% of column total.” This is shown in the image below on the right. I would recommend following this format (independent variable in the columns, dependent variable in the rows, column percentages selected). Once appropriate percentages have been selected, click OK.
You have now created a contingency table (Pivot Table) displaying percentages in Excel. Certain features of this table may be modified, such as row and column headers. If values are presented in numerical form, it may be more useful to describe them in words in your table. You can edit various aspects of the table by clicking on cells and replacing the default text with your desired text. The default version of the contingency table is presented below on the left and the cleaned-up version of it is presented on the right.
Thank you for taking the time to read this hub. Feel free to leave comments and suggestions.
For more information on creating contingency tables (crosstabls) in Microsoft Excel using the PivotTable feature, check out the video on the right.