10 Tips for Working With Large Excel Files

Updated on January 15, 2019
klanguedoc profile image

Kevin is a Software Developer with 20 years experience designing and building business intelligence and system integration solutions.

Use these tips when working with large Excel datasets.

Hyper-charge your Excel Data
Hyper-charge your Excel Data

Tip #1 – Manual Recalculation

If you are using a lot of data with formulas, transformation, grouping and other visual tools like splitting views, you can use a lot of memory with Automatic Calculations as Excel to not only recalculate your data; it also needs to recalculate the screen layout. With this option enabled, you are essentially running the Excel calculation engine every time you enter or exit a cell which can impact performance and even either get a “resources unavailable …” error or Excel may even crash and restart.

To improve Excel workbook performance, select the Manual Calculation option. This option will greatly improve performance as you will choose when you need to recalculate your data.

  1. To enable this option, open the Options page (Excel 2010, 2013, 2016 and later) under the File tab, or from the Microsoft button in Excel 2007 (it is located at the bottom of the page).
  2. Next, click on Formulas and locate “Calculation options.”
  3. Select “Manual” with or Without “Recalculate workbook before saving” option.

Formula Calculation Options
Formula Calculation Options | Source

Manually recalculating is easy and you have several options:

  • Press F9 will manually recalculate all the formulas in all your open workbooks and all depended formulas that have changed.
  • To only recalculate formulas that have changed since last calculation, use the Shift+F9 option. Best performance as you only calculate the immediate formulas that you are working on and not the whole workbook.
  • Ctrl+Alt+F9 is another option. Using this key combination, you will recalculate all formulas and their dependencies regardless if they have changed or not. This can be expensive on memory and performance.
  • Another option is to check the dependencies between formulas and recalculate them regardless if they have changed or not. To use this option, use the Ctrl+Shift+Alt+F9 key combination.


Tip #2 – Maximum Iteration and Precision

If you opt or need to keep the Automatic Calculations on, you can limit the number of times the Excel calculation engine will execute the formulas by setting the “Maximum Iterations” to a number you think if sufficient for your immediate needs.

In order to use this option, you first need to enable the “Iterative calculation” feature by checking the checkbox and setting the maximum number of automatic iterations.

The “Maximum change” option is to allow you to determine the level of precision in your calculations. Setting a lower level will allow the Excel calculation engine to iterate fewer times over a formula until the precision is met.

Set the Maximum Number Iterations
Set the Maximum Number Iterations

Tip #3 – Remove or Minimize the Use of Volatile Functions

Excel has several volatile functions that are resource (memory) hungry. Given the fact, the Excel calculation engine only uses 1 CPU on your computer; you need to maximise this resource on large data sets. Excel considers the following functions as volatile:

RAND(), NOW(), TODAY(), OFFSET(), CELL(), INDIRECT(), INFO(). Mislabelled functions as volatile: INDEX(), ROWS(), COLUMNS(), AREAS().

The above functions are considered volatile as they continuously recalculate values in cells and their precedents even when no change is apparent or needed.

Certain actions like sorting and hiding or un-hiding rows and columns will promote volatility. Changing range names or adding and removing worksheets can increase the volatility of these formulas, so use with care. Of course, using a volatile formula is not a negative, you just need to be aware of them and their impact on large data-sets.

Tip #4 - Don't Use Excel Data Tables

For Excel tables see below. Data Tables are often confused with Excel tables and the latter being most often used, thus are the #9 tip in the list. Data Tables are tuck away under the What-if scenario menu.

Data Tables are great for simple financial calculations but can be a drag on large datasets. As a workaround, you can use a macro or use the Scenario Manager. Another option, which I believe is the best for large datasets is to use Power Pivot with DAX. You can access to Power Pivot from the Data tab and select “Manage Data Model” or from the Power Pivot tab and select the “Manage” option from the Data Model group.

Using the Scenario Manager to reduce performance bottlenecks on Data Table calculations.
Using the Scenario Manager to reduce performance bottlenecks on Data Table calculations.
Access Power Pivot via the Data tab and select Manage Data Model in Data Tools group.
Access Power Pivot via the Data tab and select Manage Data Model in Data Tools group.
Open Data Modeler from the Power Pivot tab and selecting Manage in the Data Model group
Open Data Modeler from the Power Pivot tab and selecting Manage in the Data Model group
Either by selecting your data in the Excel worksheet or by importing external data, you can create a What if Scenario tool in Power Pivot
Either by selecting your data in the Excel worksheet or by importing external data, you can create a What if Scenario tool in Power Pivot

By selecting your data in a worksheet and clicking on the Power Pivot tool, you will transfer the data into Power Pivot. You can then build a What-if Scenario calculator in Power Pivot and transfer the results back to Excel.

Tip #5 - Use Extra CPUs

Most modern computers (laptops, tablets, desktop, etc.) have multiple CPUs (Central Processing Units). CPUs 4, 8 or 16 cores but Excel formulas often only use 1 or 2 cores or CPUs. You can increase the number of CPUs to boost the performance of your data crunching.

From the Options section in the backstage panel from under File (under the Home button on 2007), select the Advanced sub-section and scroll down to the Formulas on the right. Here you have the option first enable Multi-threading if it is not already enabled and to either accept the default, or switch to the Manual and specify the number of CPUs to use depending on your system (check the manufacturers site for your computer) or from the Run command in Windows (Windows button + R) or from Cortana or the Start menu on pre-Windows 10, enter msinfo32.exe. In the corresponding information panel, locate the Processor information.

Configuring more CPUs
Configuring more CPUs
Using msinfo32 to locate CPU information on your computer
Using msinfo32 to locate CPU information on your computer

Tip #6 – Remove Workbook Links

Linking multiple workbooks together is a real performance killer no matter the size of your data set. With large data sets, you can go get a coffee while you wait for your data to refresh. When you link workbooks together, Excel must open the other workbook in memory and copy the data into the main workbook to do the calculation. For large data sets, this is a real drain the memory and processing.

Two alternate options are available which will great improve performance:

a) Instead of linking workbooks together, try copying the data into a separate worksheet in the main workbook. This will at least save Excel the trouble of managing this operation in memory every time if must access the external data to perform a calculation.

b) Import the data into Power Pivot and access the data from the Data Model using a Pivot Table.

Get External Data
Get External Data
Importing the Data via the Import Wizard
Importing the Data via the Import Wizard
Select the worksheet and apply filtering if necessary
Select the worksheet and apply filtering if necessary

Tip #7 - Use Power Query

Power Pivot was introduced as a free addon with Excel 2010. Given the fact 2010 and 2007 are compatible, you can use the addon with both versions. With 2013 and now 2016, Power Pivot is integrated with Excel.

Power Pivot has optimized technology for large datasets, easily handling up to 100 million rows compared to the million rows with Excel. Power Pivot was conceived as a tool for advanced data analysis, so can create advanced data models using the built-in data modelling tools and create powerful transformations using the DAX and Power Query languages.

Once you have finished you can link the dataset and use it in Excel like with a Pivot Table. Let’s take a quick look at loading data into Power Pivot, which is called Get & Transform in Excel 2016 and later.

Let’s look at using Power Pivot. You can open the software either from the Data tab and selecting “Manage Data Model” or use the “Manage” menu command in the Data Model group from the Power Pivot Tab. Either way will open the Power Query tool.

Power Pivot Editor
Power Pivot Editor

From the menu, you can select the “Get External Data” menu command which open a dialog box allowing you to select a data source type, like a database, a text file or an Excel for example. If you already have a connection defined from Excel, you can opt to select the “Existing Connections” menu option.

Power Pivot External Source
Power Pivot External Source

Once you have selected the data source and loaded the data, you can transform the data using powerful data transformation tools, add columns, add other data sources and build a sophisticated data model. Going into detail on building is big to cover in this article. However, once the data model is ready, you simply need to transfer into Excel using the PivotTable button to select a style or format and to link the metadata to the PivotTable in Excel.

Power Pivot Editor
Power Pivot Editor

The big gain is that the data is always in the Power Query tool and Excel are linked via the metadata. For the analyst, it appears as the data is in Excel. Performance wise, it is faster than Excel.

Tip #8 – Use Pivot Tables

Using Pivot tables, either from an Excel table (see next), or from a Power Query data model (previous tip) or directly from an external data source will allow you to build a cube (rows, columns, and aggregated data) of large data sets while maintaining performance. But, it is good practice the use some of all the previous tips as well.

You can create a pivot table from the “Insert” menu tab. You can include the pivot table in the active worksheet or a new one.

Once the pivot table is created, you build the cube by adding fields to either the rows or columns and the aggregate values (values that can be summed, averaged or counted) to the Values section. Pivot tables can get very powerful by including filters, sorting, slicers.

Below is an example of a pivot table based on the Lemonade data set I previously used. The data model was created with Power Query and aggregated using the Pivot Table functionality.

Pivot Table Example
Pivot Table Example

Tip #9 – Use Excel Tables

If you need to work with data directly in a worksheet, it is better to define a table rather than simply adding data to the worksheet. The reason Excel will know what the boundaries of the data set are. Using Excel tables, which is not the same as Data Tables, allows you to create database like tables and to access these tables using standard SQL. As such to can also create data models from these tables.

You can create an Excel table either directly from an external source, data located in an existing worksheet or from Power Query.

For example, the following listing in Excel can be converted into a table using the Create Table function under the “Insert” tab.

Excel will attempt to detect the table range automatically, but you can make changes as well as indicate if the first row contains headers.

Create Excel Table
Create Excel Table

A new tab call “Design” will be available once the table is created allowing you add formatting, slicers (filters), manage the data like removing duplicates and summarizing the data. You can also rename the table.

Excel Table Designer
Excel Table Designer

Excel tables create a database table directly in Excel which can use standard SQL (through Visual Basic for Applications) to query the data, upload the data to another database like MySQL or Microsoft SQL Server.

Tip #10 – Use Queries

This last tip is directly related to the previous tips in that you need a table to create data models than be queried using standard SQL. Using a query, you can build a multi-sheet data model or a multi-workbook data model that will more optimized than using linked worksheets or workbooks because you are primary keys concepts and using standard SQL to query the data.

You can create Queries from either a data set in Excel, an Access database or database servers like Oracle or MS SQL Server. You can also use an OLAP Cube (this is beyond the scope of the article) to build your query.

Use MS Query
Use MS Query

If you get an error message stating that there is “no visible table in data source,” select the “Options” button and select all options and click ok. The columns will appear afterwards.

Next, select the columns to include in the data model based on certain conditions just like a WHERE clause in SQL. Repeat for each table. You can opt to transfer the data to a new worksheet or open the Query editor to continue building your data model.

Microsoft Query Editor
Microsoft Query Editor

Conclusion

We have looked at some of the popular techniques for working with large data sets in Excel. The Power Query option is one of the most powerful, however, beyond these capabilities, you can use SQL Server Analysis Services (SSAS) to build OLAP or Tabular data models, which can then be used in Excel. You can also use Excel with Access to build and query data models using a more database-like environment.

This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.

© 2019 Kevin Languedoc

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://turbofuture.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)