Kevin is a Software Developer with 20 years experience designing and building business intelligence and system integration solutions.
Tricks for Working With Large Files and Sets of Data in Excel
In our world of data analytics and big data, there is no getting away from our beloved data crunching and wrangling tool: Excel. However, with large data sets, Excel starts to huff and puff. These ten tips will help crunch your data for that last-minute meeting like a pro.
Tip #1: Use Manual Recalculation
If you are using a lot of data with formulas, transformation, grouping and other visual tools like splitting views, you can use a lot of memory with Automatic Calculations as Excel to not only recalculate your data; it also needs to recalculate the screen layout. With this option enabled, you are essentially running the Excel calculation engine every time you enter or exit a cell which can impact performance and even either get a “resources unavailable …” error or Excel may even crash and restart.
To improve Excel workbook performance, select the Manual Calculation option. This option will greatly improve performance as you will choose when you need to recalculate your data.
- To enable this option, open the Options page (Excel 2010, 2013, 2016 and later) under the File tab, or from the Microsoft button in Excel 2007 (it is located at the bottom of the page).
- Next, click on Formulas and locate “Calculation options.”
- Select “Manual” with or Without “Recalculate workbook before saving” option.
Manually recalculating is easy and you have several options:
- Press F9 will manually recalculate all the formulas in all your open workbooks and all depended formulas that have changed.
- To only recalculate formulas that have changed since last calculation, use the Shift+F9 option. Best performance as you only calculate the immediate formulas that you are working on and not the whole workbook.
- Ctrl+Alt+F9 is another option. Using this key combination, you will recalculate all formulas and their dependencies regardless if they have changed or not. This can be expensive on memory and performance.
- Another option is to check the dependencies between formulas and recalculate them regardless if they have changed or not. To use this option, use the Ctrl+Shift+Alt+F9 key combination.
Tip #2: Set Maximum Iteration for Precision
If you opt or need to keep the Automatic Calculations on, you can limit the number of times the Excel calculation engine will execute the formulas by setting the “Maximum Iterations” to a number you think if sufficient for your immediate needs.
In order to use this option, you first need to enable the “Iterative calculation” feature by checking the checkbox and setting the maximum number of automatic iterations.
The “Maximum change” option is to allow you to determine the level of precision in your calculations. Setting a lower level will allow the Excel calculation engine to iterate fewer times over a formula until the precision is met.
Tip #3: Remove or Minimize the Use of Volatile Functions
Excel has several volatile functions that are resource (memory) hungry. Given the fact, the Excel calculation engine only uses 1 CPU on your computer; you need to maximise this resource on large data sets. Excel considers the following functions as volatile:
RAND(), NOW(), TODAY(), OFFSET(), CELL(), INDIRECT(), INFO(). Mislabelled functions as volatile: INDEX(), ROWS(), COLUMNS(), AREAS().
The above functions are considered volatile as they continuously recalculate values in cells and their precedents even when no change is apparent or needed.
Certain actions like sorting and hiding or un-hiding rows and columns will promote volatility. Changing range names or adding and removing worksheets can increase the volatility of these formulas, so use with care. Of course, using a volatile formula is not a negative, you just need to be aware of them and their impact on large data-sets.
Tip #4: Don't Use Excel Data Tables
For Excel tables see below. Data Tables are often confused with Excel tables and the latter being most often used, thus are the #9 tip in the list. Data Tables are tuck away under the What-if scenario menu.
Data Tables are great for simple financial calculations but can be a drag on large datasets. As a workaround, you can use a macro or use the Scenario Manager. Another option, which I believe is the best for large datasets is to use Power Pivot with DAX. You can access to Power Pivot from the Data tab and select “Manage Data Model” or from the Power Pivot tab and select the “Manage” option from the Data Model group.
By selecting your data in a worksheet and clicking on the Power Pivot tool, you will transfer the data into Power Pivot. You can then build a What-if Scenario calculator in Power Pivot and transfer the results back to Excel.
You can also analyse and compare data sets for example using Power Query, especially large datasets.
Tip #5: Use Extra CPUs
Most modern computers (laptops, tablets, desktop, etc.) have multiple CPUs (Central Processing Units). CPUs 4, 8 or 16 cores but Excel formulas often only use 1 or 2 cores or CPUs. You can increase the number of CPUs to boost the performance of your data crunching.
From the Options section in the backstage panel from under File (under the Home button on 2007), select the Advanced sub-section and scroll down to the Formulas on the right. Here you have the option first enable Multi-threading if it is not already enabled and to either accept the default, or switch to the Manual and specify the number of CPUs to use depending on your system (check the manufacturers site for your computer) or from the Run command in Windows (Windows button + R) or from Cortana or the Start menu on pre-Windows 10, enter msinfo32.exe. In the corresponding information panel, locate the Processor information.
Tip #6: Remove Workbook Links
Linking multiple workbooks together is a real performance killer no matter the size of your data set. With large data sets, you can go get a coffee while you wait for your data to refresh. When you link workbooks together, Excel must open the other workbook in memory and copy the data into the main workbook to do the calculation. For large data sets, this is a real drain the memory and processing.
Two alternate options are available which will great improve performance:
a) Instead of linking workbooks together, try copying the data into a separate worksheet in the main workbook. This will at least save Excel the trouble of managing this operation in memory every time if must access the external data to perform a calculation.
b) Import the data into Power Pivot and access the data from the Data Model using a Pivot Table.
Tip #7: Use Power Query
Power Pivot was introduced as a free addon with Excel 2010. Given the fact 2010 and 2007 are compatible, you can use the addon with both versions. With 2013 and now 2016, Power Pivot is integrated with Excel.
Power Pivot has optimized technology for large datasets, easily handling up to 100 million rows compared to the million rows with Excel. Power Pivot was conceived as a tool for advanced data analysis, so can create advanced data models using the built-in data modelling tools and create powerful transformations using the DAX and Power Query languages.
Once you have finished you can link the dataset and use it in Excel like with a Pivot Table. Let’s take a quick look at loading data into Power Pivot, which is called Get & Transform in Excel 2016 and later.
Let’s look at using Power Pivot. You can open the software either from the Data tab and selecting “Manage Data Model” or use the “Manage” menu command in the Data Model group from the Power Pivot Tab. Either way will open the Power Query tool.
From the menu, you can select the “Get External Data” menu command which open a dialog box allowing you to select a data source type, like a database, a text file or an Excel for example. If you already have a connection defined from Excel, you can opt to select the “Existing Connections” menu option.
Once you have selected the data source and loaded the data, you can transform the data using powerful data transformation tools, add columns, add other data sources and build a sophisticated data model. Going into detail on building is big to cover in this article. However, once the data model is ready, you simply need to transfer into Excel using the PivotTable button to select a style or format and to link the metadata to the PivotTable in Excel.
The big gain is that the data is always in the Power Query tool and Excel are linked via the metadata. For the analyst, it appears as the data is in Excel. Performance wise, it is faster than Excel.
Using this technique, you can also create connection in Excel from your data in Power Pivot or in your external source. Check out this article I wrote on Comparing Data sets in Power Pivot and Excel. You can use this technique for all sorts of adhoc analysis.
Tip #8: Use Pivot Tables
Using Pivot tables, either from an Excel table (see next), or from a Power Query data model (previous tip) or directly from an external data source will allow you to build a cube (rows, columns, and aggregated data) of large data sets while maintaining performance. But, it is good practice the use some of all the previous tips as well.
You can create a pivot table from the “Insert” menu tab. You can include the pivot table in the active worksheet or a new one.
Once the pivot table is created, you build the cube by adding fields to either the rows or columns and the aggregate values (values that can be summed, averaged or counted) to the Values section. Pivot tables can get very powerful by including filters, sorting, slicers.
Below is an example of a pivot table based on the Lemonade data set I previously used. The data model was created with Power Query and aggregated using the Pivot Table functionality.
Tip #9: Use Excel Tables
If you need to work with data directly in a worksheet, it is better to define a table rather than simply adding data to the worksheet. The reason Excel will know what the boundaries of the data set are. Using Excel tables, which is not the same as Data Tables, allows you to create database like tables and to access these tables using standard SQL. As such to can also create data models from these tables.
You can create an Excel table either directly from an external source, data located in an existing worksheet or from Power Query.
For example, the following listing in Excel can be converted into a table using the Create Table function under the “Insert” tab.
Excel will attempt to detect the table range automatically, but you can make changes as well as indicate if the first row contains headers.
A new tab call “Design” will be available once the table is created allowing you add formatting, slicers (filters), manage the data like removing duplicates and summarizing the data. You can also rename the table.
Excel tables create a database table directly in Excel which can use standard SQL (through Visual Basic for Applications) to query the data, upload the data to another database like MySQL or Microsoft SQL Server.
Tip #10: Use Queries
This last tip is directly related to the previous tips in that you need a table to create data models than be queried using standard SQL. Using a query, you can build a multi-sheet data model or a multi-workbook data model that will more optimized than using linked worksheets or workbooks because you are primary keys concepts and using standard SQL to query the data.
You can create Queries from either a data set in Excel, an Access database or database servers like Oracle or MS SQL Server. You can also use an OLAP Cube (this is beyond the scope of the article) to build your query.
If you get an error message stating that there is “no visible table in data source,” select the “Options” button and select all options and click ok. The columns will appear afterwards.
Next, select the columns to include in the data model based on certain conditions just like a WHERE clause in SQL. Repeat for each table. You can opt to transfer the data to a new worksheet or open the Query editor to continue building your data model.
Other Techniques for Working With Large Data Sets in Excel
We have looked at some of the popular techniques for working with large data sets in Excel. The Power Query option is one of the most powerful, however, beyond these capabilities, you can use SQL Server Analysis Services (SSAS) to build OLAP or Tabular data models, which can then be used in Excel. You can also use Excel with Access to build and query data models using a more database-like environment.
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.
© 2019 Kevin Languedoc