4 Ways to Compare Datasets in Excel Using Power Query

Updated on April 27, 2020
klanguedoc profile image

Kevin is data science and data engineer. He works for a large consulting company in Montreal, Canada. He has over 20 years experience.

Comparing two data sets in Excel is quite easy using Power Query and the Merge operation. The Merge operation can perform inner and outer join operations on your data. This article will explain the different types of operations.

I will use the list of countries from Wikipedia. the link to the exact page is provided below. I will show how to import this data using Excel tools. A second list of countries will be used from the WorldMeter site which we will also download using Excel using the link below.

Get the Data

Let’s start with the Wikipedia list. Follow these steps to retrieve the list from the Wikipedia page

  1. From Excel Data tab, select “From Web” in the “Get & Transform Data” group
  2. An import panel will display on screen with two options “Basic” and “Advanced” (figure 1). Leave it at “Basic” and paste the following Wikipedia URL in the field and click “OK”: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
  3. Next, you will need to select the appropriate access for this site. Since it is Wikipedia and a public website, we can "Anonymous” web access (figure 2) and from the dropdown field, select the top level, www.wikipedia.org and connect to the website
  4. In the next panel (or page), Excel will present the different sections of the web page. For our purposes, we need to select “Sovereign States and Dependencies by Population” which is the table title (figure 3). Here, you have two options: either “Load” or “Transform”. If you want to load the data as is into Excel, select the first option, otherwise, select “Transform”, which we need to do as we need to delete columns that are not needed.
  5. Once the data is loaded in the Power Query editor, select all the columns (ctrl+click) except the country name column. Then click “Remove Columns” and finally “Close and Load To” (figure 4) to transfer the focus back to Excel. In Excel, a panel will offer the option to create a table, pivot table, a connection as well as where to display the results. Opt to create a connection only. This will create a connection with the source, which is the Wikipedia page in this example. What is great about this option is that you can continuously refresh the data from the source as needed. Of course, in this example, it is not really needed as the world is not actively creating new countries every other week. Once the connection is created, it will appear in the side panel on the right. You can rename it as needed. It won't affect the connection with the source.

figure 1
figure 1 | Source
Figure 2
Figure 2 | Source
figure 3
figure 3 | Source
Figure 4
Figure 4 | Source
Figure 5
Figure 5

Now repeat the same operation for the WorldMeter country list: https://www.worldometers.info/geography/alphabetical-list-of-countries/. To import only the list of countries, use the “Table 0” selection and select “Transform” as per the previous. Remove all the necessary columns, keeping only the column with the country names.

Select the "Close and Load To" option to transfer the data set back to Excel, creating a Connection as before. Once both lists of countries have been imported into Power Query and the connections have been created, we will perform a series of Merge operations. The objective is to list the names of the countries according to the Merge operation we use. There are six types of merge operations that can be performed on the data sets.

Merge Operations

  • Left Outer (All from first, matching from second)
  • Right Outer (All from second, matching from first)
  • Full Outer (All matching from both)
  • Inner (only matching rows)
  • Left Anti (Rows only from first)
  • Right Anti (Rows only from second)

I will compare the data sets using each of the previous Merge operations, loading the results in a separate Excel Worksheet.

Left Outer (All from first, matching from second)

This type will keep all the items in the first list (left side) and those items in the second list that are found in the first. To perform the Merge operation follow these instructions (figure 6):

  • Right-click on the Wikipedia connection in the right panel and select Merge
  • A panel will appear with the Wikipedia data set already selected
  • Select the WorldMeter data set from the second list
  • Select the Left Outer from the Join Kind field
  • Optional select the "Use fuzzy matching...." when comparing. If you opt for this option, you will need to expand the Fuzzy search option section.
  • At the bottom a preview of the merge operation list the number of matches.
  • Click "Ok" to perform the operation

Figure 6
Figure 6
Figure 7
Figure 7
Figure 8
Figure 8
Figure 9
Figure 9

Once the merge is completed, the Power Query will appear with the data from both list. On the left is the Wikipedia country list. On the right is a column with the value of "Table" (figure 7 above). This is because the resulting will contain the collection of columns from the second table that can be expanded into the first table. Click on the icon with two arrays next to the column header (figure 8) which will reveal all the columns in the table (figure 7). In our example, there is only one column "Country" (figure 8). Once expanded, the list countries from the WorldMeter will be displayed with the missing values (figure 9).

Select "Close and Load" to transfer the data back to Excel.

Right Outer (All from second, matching from first)

The Compare operations will select all rows from the second list and those items matching in the first list. Performing this merge is the same as the previous; right-click the Wikipedia connection and select Merge from the shortcut menu. The Wikipedia list will already be selected in the Merge panel. Select the WorldMeter as the second list and the Right Outer join type. The result count will be displayed at the bottom. Click ok to perform the operation (figure 10).

Figure 10
Figure 10
Figure 11
Figure 11
Figure 12
Figure 12

As I mentioned before, the second column of data will be displayed as "Table". You need to expand the selection and select the available column to display. In our case, "Countries" is the only choice. You will notice the dataset is different from the last with the data on the right showing the complete list and the only the ones in the first column that match the outer column.

Click "Close and Load to" to transfer the merged data back to Excel. This time, select to copy the resulting data set into an Excel Worksheet

Full Outer (All matching from both)

This operation will select all rows from both list. This is like a Cross Join in SQL. All countries from both list will be merged together. To perform this merge operation, select the Merge command from the context menu. In the Merge interface select both data sets like before. In the Join Type field, select the Full Outer type. Perform the Merge operation (figure 13).

Figure 13
Figure 13

Inner (only matching rows)

To select only rows unique to both lists, use this option. This merge operation will compare data in both list and return only the values that are present in both list. Perform in the like manner as the others, except choose the Inner (only matching rows).

Once done, return the filtered data from both list to a new worksheet in excel.

Figure 14
Figure 14
Figure 15
Figure 15
Figure 16
Figure 16

Left Anti (Rows only from first)

This Merge option will return a list of countries that are unique to the first list. In the sample data, this would be the list of countries from the Wikipedia list only. The Merge operation is identical to the others, except that you must choose the "Left Anti (Rows only from first)" option.

Figure 17
Figure 17

Right Anti (Rows only from second)

Right Anti (Rows only from second) is the reverse of the Left Anti, meaning this Merge option will return a list of countries that are unique to the selected outer list or second list. To perform this operation, select the "Right Anti" in the Join Type field after you launch the Merge operation from the context menu of a selected connection.

Figure 18
Figure 18

Conclusion

All possible compare operations are provided as demonstrated. The Merge can be performed on multiple columns as needed. Also, you can transform the data as needed like any other data set .

The Excel file containing the Merge results can be downloaded from GitHub at this address https://github.com/kevlangdo/excel_compare_lists . To clone the GitHub repo from command line or bash git clone https://github.com/kevlangdo/excel_compare_lists.git

This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.

© 2020 Kevin Languedoc

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://maven.io/company/pages/privacy

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)