Robbie mostly writes about Skyrim but also occasionally sheds light on the oddities of Microsoft applications such as Excel and Outlook.
The ToolPak contains a number of very powerful statistical tools that you can use on your data to analyse it in greater detail.
Today, I will be looking at the Sampling tool within the ToolPak. Excel treats the data you give the tool to analyse as a population. This tool can be used for two main purposes:
- If your want to look at a smaller sample of your data to analyse further or perhaps to chart or graph, you can use Sampling to create a random sample
- Also, if you believe that there are variations in your data (for example daily variations), the tool can analyse your data for this
In today's example, I will be looking at whether or not the number of hits my websites receive goes down over the weekend. Once we have used the tool to analyse our data, we will end up with a table similar to the one below.
Using the Sampling Tool in Excel 2007 and Excel 2010
To begin, ensure that you have your data in columns with labels at the top if you need them. Ensure that your data is contiguous (is all in one unbroken section). If you are analysing your data for variations over a specific period, you may find it helpful to put labels in the column to the left of your data to allow you to more easily see the start and end of your specified periods.
Note: Normally, I discuss a dialogue box starting at the top and working my way down. In this case, I have deliberately moved the discussion on Input Range and Period to make it clearer how it all fits together!
OK, let’s begin to create our data samples.
- Click on the Data Analysis button
- Select Sampling
- The Sampling dialogue box will then open
- Under Sampling Method, either select an appropriate Period, or the number of samples you want if you want a random sample to be created under Random
- For Output Options, select either an Output Range (which is a cell to place the data in) or a New Worksheet Ply: or New Workbook depending on your preference
- For Period, I will choose seven as my period is a week and my data is daily
- Select the Input Range of your data
NOTE: This gets a little complicated. I am looking ultimately to determine if there is a difference in the number of hits on the weekend compared to the week.
To do this, I need to use the Sampling tool twice to create data for Saturday and Sunday. To calculate the range for Saturday,
Using the example shown in the figure below,
- The Input Range starts at C26
- Excel knows that the Period I am interested in is 7
- It then counts down the column starting at C26 as 1 and ends at C32 as 7
- It returns the value of 24 and then starts counting again
- Once Excel reaches the bottom of your data range it stops gathering data
To create Sunday’s data, I follow the same process as above, but I move the start of my Input Range and the Output Range.
As you can see from the figure below, the starting cell has moved to C27 and the Output is now N57.
Excel then counts cells as it did above
The results of my two sampling runs on my data are shown below:
I now have the data that I to show the rate of hits that I get over the weekend. If I repeat this process a further five times I will get the data for each day of the week. Repeating the process for Monday through Friday was very simple,
- I reduced the start of the Input Range by one each time and
- Then adjusted the Output Range to reflect the new column
I then used the AVERAGE function to give me the averages for each day of the week and end up with the below table.
The AVERAGE function is very simple to use, simply select the range that you want to average after typing =AVERAGE( into a cell and then close the bracket to give you something similar to
Using the Sampling tool from the Analysis ToolPak and the AVERAGE function in tandem, I now know that Saturday and Sundays are both much quieter than the week and that Monday is also a quiet day when compared to Tuesday through Thursday!
Using the Sampling tool to create a random sample in Excel 2007 and Excel 2010
The other use for the Sampling tool is to generate a random sample from your data. To illustrate this, I will turn to the lottery as an example.
Suppose I decide to run a lottery to give away prizes. I have thirty numbers and five prizes.
- First, I choose the Input Range which is my numbers from one to thirty
- This time under Sampling Method I choose Random and the Number of Samples as 5
- For Output Range, I select a nearby cell
- After I click OK, I receive my five random numbers
Note: This is truly random process, each time you run this you will get an entirely different set of numbers.
The Sampling Tool from Excel 2007 and Excel 2010’s Analysis ToolPak is a tool that allows you to perform two forms of analysis on big data sources. You can
- Create a smaller representative sample at random from your larger data “population”
- Examine your data for periodic variations (for example comparing traffic on Saturday and Sunday to the traffic received Monday through Friday)
I do hope that you have found this article informative and useful and that you will now be able to successfully use the Sampling tool on your large data sources in Excel 2007 or Excel 2010.
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.
© 2013 Robbie C Wilson