Analyzing Survey Data in Microsoft Excel: Coding, Inputting Data, and Creating Frequency Distributions

Updated on August 13, 2014

Overview

In the following hub I provide a brief tutorial on how to use Microsoft Excel to analyze survey data. I use two small data sets, the College Rating Survey and the General Social Survey 2008 Extract, to illustrate some basic techniques. The examples below should help you to familiarize yourself with some basic tools of analysis that can be applied when analyzing your own survey results.

Microsoft Excel: The Basics

Microsoft Excel is a spreadsheet program with many basic features that are useful for analyzing data. In the spreadsheet, each column corresponds to a letter and each row corresponds to a number. The intersection of a column and row is referred to as a cell. Above the column letters is the formula bar, which is used to create and execute computations in Excel. Many common computations can be performed using the AutoSum button, located on the right side of the screen.

Preparing Data

Coding

Before bringing your data into Microsoft Excel, I would strongly suggest assigning codes to all non-numerical (non-interval/ratio level) data. To perform certain functions in Excel, data must be numeric. This is also the case for nearly all more advanced statistical packages—so it’s useful to get into the practice of coding.

Most numerical (interval/ratio-level) data does not require coding. For example, data on age (in years) can be kept as is and need not be recoded. However, if data on age has been collected using categories/ranges (ex. less than 20, 20-29, 30-39), then these categories/ranges must also be given numeric codes.

Step 1: For each survey question (variable), where coding is deemed appropriate, determine the number of unique response options.

  • Ex. In the GSS 2008 subsample, respondents are asked to choose their “highest degree completed” from a set of 5 response options (less than high school, high school, junior college, bachelor’s degree, graduate degree). Therefore, we know we will need to create 5 unique codes.

Step 2: Determine whether response options have a logical ordering (this is almost always this case for ordinal-level variables but not necessarily for nominal-level variables). If so, assign values accordingly.

  • Ex. The GSS 2008 response options for “highest degree completed” can be logically ranked from lowest to highest level of education. Therefore, we decide to assign the following codes: 1=less than high school, 2=high school, 3=junior college 4=bachelor’s degree, 5=graduate degree.

*Missing Responses: If any of the questions on your survey were left unanswered, it is best to go ahead and code them anyway (although sometimes missing responses are simply left blank—Excel does not deal well with empty cells). Following from the GSS 2008 example above, if some respondents did not provide a selection for “highest degree” completed, we could create an additional code to represent (6=missing).

Inputting Data

Once a coding scheme has been created for all response options corresponding to all survey questions (variables) requiring them, it is time to enter the data into Microsoft Excel. Data from the College Rating Survey is used to illustrate how to input data. I have already identified all unique response options and have created a coding scheme for the five items to be analyzed in Excel. The items and coding scheme are presented below:

*Note in the table above that some codes refer to combinations of response options. Unfortunately, not all responses in the survey were mutually exclusive or mutually exhaustive (or were not answered in this way). Although not ideal, it was necessary to create and assign additional codes for all unique combinations.

Step 1: For each survey item, provide a name/description at the top of each column. If you choose to provide column headers that are not intrinsically meaningful (ex. QI, Q2), make sure to include a description of each survey item along with your findings. If respondents have been assigned study ID#s, list them in the first column of the spreadsheet. If respondents have not been assigned study ID#s I would strongly urge you to assign them (as they are needed to use Excel’s built-in contingency table function). For the College Ranking Survey data, I have chosen the following column headings and have listed Study ID#s in the rows:

Step #2: Now that the columns of the spreadsheet have been properly labeled, it is time to input the data. The data for each participant should be entered into the row corresponding to their Study ID#. Remember to implement your coding scheme when inputting data. In the first image below I have provided the survey data collected from one of the survey participants (ID #1). In the second image displaying the entire data set, data for this respondent is located in the row directly beneath the column headings (this row is highlighted in blue). I have also included the coding scheme for each survey item below its corresponding column on the spreadsheet.

Survey Data for Respondent #1

Image 2: Spreadsheet including all Data (Respondent #1’s Coded Responses Highlighted in Blue)

Analyzing the Data

Once all of the survey data has been entered into the spreadsheet we can begin our analysis. Techniques for analyzing and presenting survey data are presented throughout the remainder of this tutorial.

Frequency Distributions

Frequency distributions are a useful way to present and interpret data. A frequency distribution is a numerical display showing the number of cases, and usually the percentage of cases (the relative frequencies), corresponding to each value or group of values of a variable (definition from Schutt p. 454).

To illustrate, I will begin by creating a frequency distribution of the #1 ranked colleges/universities in the College Rating Survey Ranking Task #1. In this item, respondents are asked to rank colleges/universities by their profile characteristics. Symbols were given in lieu of names. I have chosen only to include respondents’ #1 selection.

Data on this variable is located in the Excel spreadsheet in Column D entitled “Highest Rank (Symbols).” We will work through frequency distribution creation step-by-step. Manual instructions (using hand calculations) and Excel-specific instructions (using formulas) are both provided (where applicable). The method you choose to analyze your data will likely be influenced by the size of your data set and your proficiency with Excel.

Step 1: Choose a location in your Microsoft Excel spreadsheet to create your frequency distribution. Give your frequency distribution a title. Include “Value,” “Frequency,” and “Percentage” in your column headings. Beneath “Value,” list all unique response options. Even if values have been re-assigned codes, present them in textual form in your frequency distribution. Include “Total” in the cell beneath your last value.

  • Ex. I have chosen to construct the frequency distribution on the same spreadsheet as the survey findings, beneath the input data and coding scheme. It will begin in cell B18. I provide the title “Frequency Distribution of Highest Rank (Symbols).” I include “Value,” “Frequency,” and “Percentage” in the column headings. Beneath “Value,” I list all four unique survey response options (Harvard, Yale, MIT, Brandeis) in textual form. I include a cell for “Total.”

Step 2 (manual calculation): Count the number of times each value occurs in your data spreadsheet and enter it into the “Frequency” column next to its corresponding value. Include the sum of the frequencies next to “Total.”

  • Ex. (manual calculation) Returning to Column D of the data spreadsheet (rows 2-9), I manually count that “Harvard” (coded as 1) was chosen as the highest rank 3 times. “Yale” (coded as 2) was chosen 3 times. “MIT” (coded as 3) was chosen once. “Brandeis” was chosen once. I enter these counts into the spreadsheet in the “frequency” column. I report the summed frequencies (they add up to 8) next to “Total.”

Alternative Step #2 (using formulas): For large data sets, manually counting the number of times a value has occurred may not be feasible. Luckily, Microsoft Excel has a function that will do it for you. The formula takes the form =COUNTIF(range, criteria). To use this function, your frequency distribution must be located on the same spreadsheet as your data.

  • Ex. (using formulas): Returning to the data spreadsheet, we see that data on highest rank is located in column D, rows 2-9. Our range can be specified as D2:D9 (this literally translates as cells D2 through D9, with the colon meaning “to”). Our criteria is the numeric code corresponding to a particular value. Criteria must always be specified in quotations. For Harvard, our criteria is “1” (its numerical code). Click in the frequency cell corresponding to Harvard (C20), enter the formula =COUNTIF(D2:D9, “1”) and press enter. Our frequency for Harvard (3) is calculated for us. Formulas for each additional value must be constructed and entered in its corresponding cell. Never forget the = sign (or the formula will not execute).

Once all the frequencies have been computed you can have Excel sum the frequences for you using “AutoSum” (∑). Highlight frequencies for all values (in this case there are four of them located in cells C20 through C23). To select all values simply click on cell C20, hold down the mouse, and drag the box that appears down to C23. Click on the drop-down arrow next to the “AutoSum” (∑) button. Once you click “sum,” your calculation will appear in the cell directly beneath your selection by default, in this case, next to “Total.”

Step #3 (manual calculation): Calculate a percentage corresponding to the frequency for each value. To do this, divide the frequency for each value by the frequency total and enter the percentage into the spreadsheet. Also sum the percentages (they should equal 100%).

  • Ex. (manual calculation): Harvard has a frequency of 3. If we divide the frequency for Harvard (3) by the frequency total (8), we obtain a percentage of 37.5% for Harvard. Yale: 3/8= 37.5%. MIT: 1/8 = 12.5%. Brandeis: 1/8 = 12.5%. The percentages sum to 100%. Calculations are noted in the table.

Alternative Step #3 (using formulas): Instead of computing percentages using a calculator, we are also able to make basic computations using formulas in Excel (similar to the illustration of the use of formulas in Alternative Step #2).

  • Ex. (using formulas): If we were to select cell D6 we could calculate the percentage for Harvard by entering =3/8 (/ is the symbol for division in Excel) in the formula bar and pressing enter. Percentages for Yale, MIT, and Brandeis would also be calculated by using formulas accordingly. Once all four percentages are obtained, they can be highlighted and summed using the AutoSum function, just like in Alternative Step #2.

Step #4 (manual calculation): In many frequency distributions it is using to include a column for “Cumulative percentage.” Cumulative percentage is simply a “running tally” of percentages. Although it often makes more sense for ordinal and interval-ratio level variables, we can calculate it for any frequency distributions if desired.

  • Ex. Add a column for “cumulative percentage” next to “percentage.” For the first value (in this case Harvard), simply re-enter it into the corresponding cell in the “cumulative percentage” column. To calculate the second cumulative percentage, we must add the percentage corresponding to the first value (Harvard – 37.5%) to the percentage corresponding to the second value (Yale – 37.5%) for a total of 75%. To calculate the third percentage, add the percentages corresponding to the first two values to the third value. Continue this process. Your final table should look like this:

*How to read cumulative percentage: 37.5% of respondents chose Harvard as their top ranking. 75% of respondents chose either Yale or Harvard as their top ranking. 87% of respondents chose MIT, Yale, or Harvard as their top ranking. 100% of respondents chose Brandeis, MIT, Yale, or Harvard as their top ranking. Although the cumulative percentage may not be particularly meaningful in this case—it is often useful to include for variables that include ranked response options.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://turbofuture.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)