Importing Data Into Excel 2007 and Using the TRIM, SUBSTITUTE and CLEAN Functions to Remove Non-Printable Characters
Hi and welcome to my latest article on Excel 2007. Today’s article is about importing data from external files into Excel and then ensuring that the data you import is in a consistent and clean state.
Typically, when importing data from a text file into Excel, the data is usually stored either in a *.txt (plain text file) file or a *.csv (data values are separated by commas or sometimes a tab) file. Excel 2007 also accepts data in a *.prn file format which is a file format that has instructions for a printer contained within the file as well as the data itself.
Depending on the data source, the imported data may consist of trailing or leading spaces as well as junk characters. The TRIM function can remove un-needed spaces from text (it ignores spaces between words) and the SUBSTITUTE and CLEAN functions will help removing junk or un-printable characters.
To import data into Excel 2007,
- Navigate to the Data tab, then to the Get External Data group and select the From Text button
- Browse to your file and click Import
- The Text Import Wizard will open (as shown below)
NOTE: Excel 2007 shows a preview of your data. You can see from my example that my data is separated with commas (it was a *.csv) file which becomes important in a second
- Select Delimited and click Next
- Now we have to decide which delimiter (separator) to use. In my case as my file was a *.csv I choose Comma
- Once again, Excel 2007 will show you a preview of your data in the Data preview pane. If the data is not cleanly separated into columns, choose a different delimiter (or combination of delimiters) until it displays exactly as you want it to
You can see from my picture above that Excel 2007 has nicely separated my data into columns, so I now click Finish to import the data and return to Excel.
Using the TRIM function
When receiving data from an external source, it often contains characters that you don’t want particularly leading or trailing spaces. The TRIM function will remove both leading and trailing spaces as well as also double spaces between words. There are two ways of using the TRIM function.
The first is to remove extra spaces from the contents of a cell using
Suppose I have a list of items that I have in stock. As you can see on the left, in my picture below my list is messed up as each item has a different number of leading spaces.
Using the TRIM function, those spaces are automatically removed which results in the data being much tidier and more importantly consistent. Should I wish to do more manipulation with this data, I will get far better results with all the data being consistent.
The other method is useful when you are copying and pasting text from an external source (for example Word 2007) and you want Excel to tidy the text up, you can use TRIM to perform this for you. I have the following text from Word 2007 (I have used quotes to illustrate the spaces)
" Too many Spaces "
I use the following formula:
=TRIM(" Too many Spaces ")
This gives me “Too many Spaces” (once again the quotes are used to show that the spaces have been removed)
Removing non-printable characters
The first step is to find the ASCI code for the character that you want to remove. To do this, use the CODE function
=CODE(“<Copy character you want to remove here”)
=CODE(“<Insert cell containing just that character here>”)
=CODE(“S”) returns 83 which is the ASCI code for S
This is where it gets a little complicated.
- For characters with an ASCI code of 0 – 31 we can use the CLEAN function on its own.
- For characters with an ASCI code of 32 or higher, we use a combination of TRIM and SUBSTITUTE to remove the character
CLEAN simply removes characters with an ASCI code of 0 - 31. The function is simple to use, for example I have a cell (B1) with CHAR(7) embedded in it. I simply use the following formula and Excel 2007 will remove CHAR(7) from the cell.
TRIM and SUBSTITUTE
If you want to remove a character with an ASCI code of 32 or higher, you need to use the TRIM and SUBSTITUTE functions together.
- The SUBSTITUTE function is used to convert the non-printing character to a space
- The TRIM function is used to remove that space
As an example, I have the following in cell K19
Using the formula
I end up with "Trousers" in cell K19.
Importing data from *.csv files and *.txt files is common method of accessing external data for use in Excel 2007. This article walked through importing data from text files and also the correct use of delimiters to ensure that the import is done with the data neatly arranged in columns.
Once the data is cleanly imported into Excel, there are often characters such as spaces or junk (non-printable) characters that you would like to remove to ensure that your data is consistent. The TRIM function is used to remove un-wanted spaces. To remove unwanted characters, the CLEAN function can be used in many cases and when the CLEAN function fails, the TRIM and SUBSTITUTE functions can be used together to convert the character to a space and then remove that space.
Thanks for reading my article; I hope that you enjoyed reading it as much as I enjoyed writing it. I do hope that you have found it useful and informative.
I also have a number of other articles on aspects of Excel 2007, covering everything from Conditional Formatting to creating charts and graphs. I have an Index article which also covers how I successfully transitioned from Excel 2003 to 2007 as well as outlining my other Excel 2007 articles which can be found here.