Our definitive guide will show you 15 ways to clean data in Excel. Learn more about data cleansing through useful examples.
Clean data and string manipulations in Excel are critical! Besides the well-known techniques, we will introduce special VBA codes to make the process quicker and more effective.
Table of contents:
- Sorting data to check extremes and peaks
- Remove duplicates to clean data
- Use the find and replace function
- Check the type of data in a cell
- Convert numbers stored as text into numbers
- Eliminate blank cells in a list or range
- Clean data using split the text into columns
- Concatenate text using the TEXTJOIN function
- Change text to lower – upper – proper case
- Remove non-printable characters using the CLEAN formula
- Remove unnecessary characters from text automatically
- Clean extra spaces using the TRIM function
- Insert text after specific nth characters
- Delete the text after the nth character
- Remove special characters from a text
Effective ways to clean data in Excel
Let’s see the basics first! Sometimes the best method is the “back to the roots” and starts the data cleansing project with simple but effective solutions.
Before we take a deep dive into this topic, allow us to tell a personal story.
After college studies, I worked at a small company with limited HR resources. As a newbie, they didn’t dish out the easy tasks for me. Okay, let’s clean data in Excel. I got countless boring tasks. The main issue was how to create clean data just in time? I had never heard of the Excel string manipulation functions, so the solution wasn’t simple. Many night shifts; good, old days.
After that, we formed our company. Finally, we had enough resources to develop a powerful Excel add-in for data cleansing purposes.
Our goal is to make text manipulation and Excel data processing a child’s play for you. So today, we publish a universal Excel add-in tool. From now, string manipulation and data cleansing will be easy! Read more about the utility.
Don’t forget: Dashboards are only successful if the source data has a clean and organized structure.
#1 – Sorting data to check extremes and peaks
Is it possible to find errors in a list using simple sorting? Yes! We have nothing else to do but arrange the data, either in growing or decreasing order. Right-click on the cell and choose the ‘Sort Largest To Smallest’ option from the menu. Seldom can there be found extremely small or large, maybe peeking data?
Look at the picture below:
Who would ever think you could find these mistakes in several million records? Alternatively, you can find errors and blanks in a range using conditional formatting.
#2 – Remove duplicates to clean data
Excel supports many options to eliminate duplicates. For example, two tightly joined operations filter unique values and remove duplicated data. The result is the same in both cases: a list of unique values.
To filter for unique values (filtered list), use the Advanced command in the Sort & Filter group on the Data tab.
Select the range of cells! Then go to the Data tab! In the Sort & Filter group, click Advanced. Next, choose the Unique records only check box and click OK.
Explanation: However, there is one significant difference that we cannot ignore. In the process of filtering unique values, it temporarily hides the duplicated values. You can back your original list to undo this operation.
If you use the ‘Remove Duplicates’ tool on the ribbon, you’ll permanently delete duplicates! Pay attention to this! The removed, deleted items will last after saving the Workbook. So be very careful.
Let us see how the duplicate removal values tool works! First, select the range. Then, go to the Data tab! Then, in the Data Tools group, click Remove Duplicates.
Select one or more columns, then click OK. A message box will appear and shows how many duplicated values are removed and how many unique ones remain.
#3 – Use the find and replace function
The advantage of the Find & Replace function is that we can work with it relatively fast in any size data table. First, let’s see how to clean data using this function. To find and replace data in a worksheet, follow these steps:
Press the Ctrl+H keyboard shortcut, and the Find and Replace dialog box appears.
In the Find What box, enter the data you want to locate. Then, enter the Replace With box the data you want to replace.
If you want to replace all occurrences simultaneously, click Replace All.
#4 – Check the Type of Data in a Cell
Please wait a minute before we make any changes to the raw data.
We have to check the type of data in the cell. We can do this by using the TYPE function. The function is one of a group of information functions. Numeric or text data? In the picture below, you can see what kind of results the formula brings in the case of different types of data.
If the function returns 1, the data type is numeric. If it’s returning with 2, the data type is text.
#5 – Convert Numbers Stored as Text into Numbers
If we import data from a text file or an external data source into Excel, Excel often stores numbers as text. It can be the source of many problems, but there are ways to avoid mistakes.
Create a helper column and type 1. Next, apply a simple multiply using this formula: number (stored as text) * 1. A result is a number.
Read more about how to convert text to numbers.
#6- Eliminate blank cells in a list or range
Now we’ll highlight the whole range that also contains the empty lines. Press the F5 key, and from the popup window, choose the ‘Special’ option.
All the cells in the selected range that are not blank are deselected, leaving only the blank cells selected.
Click ‘Delete‘ in the ‘Cells’ section of the ‘Home‘ tab.
Finally, select ‘Delete Sheet Rows‘ from the drop-down menu.
#7 – Split text into columns
It’s easy to split a text into multiple cells. For example, suppose that our data set contains our clients’ first and family names in one cell. We want to split the names that the first names and family names be in different columns.
Start the ‘Convert Text to Columns Wizard’ and follow the steps below:
First, select the range which contains names. Next, click Text to Columns in the Data Tools group.
Select the Delimiters checkbox and click Next.
Leave all the checkboxes empty under the Delimiters section. Now check the Space check box (in our example). Click Finish!
#8 – Concatenate text using the TEXTJOIN function
Let’s see the opposite of the previous section. Now we would like to join the text from several cells into one. How can we solve this problem quickly?
The TEXTJOIN function joins (what a surprise) values using a selected delimiter. It provides better functionality than CONCAT. This function allows you to supply a range of cells and has a setting to ignore empty values.
Take a look at the formula bar! You can use space as a delimiter. The second parameter is TRUE because we would like to ignore the empty cells (if it is available).
If you are using an earlier version of Excel, you should install 3rd party add-in.
#9 – Change Text to Lower – Upper – Proper Case
Let’s see what kind of Excel text transformation formulas we can use to transform the text. It is worth using them during data cleansing.
- LOWER() transforms the given text into lower case letters. Example: LOWER(‘EXCEL’), result: ‘excel’
- UPPER() changes the given text into upper case letters. Example: UPPER(‘excel’), result: ‘EXCEL’
- PROPER() transforms the first letter of the character string and the letters standing after none-letters into upper case letters.
=PROPER(‘Information is BEAUTIFUL’)
The result of the conversion is: ‘Information Is Beautiful. Not too often, but we might use these kinds of transformations also.
#10 – Remove non-printable characters – The CLEAN formula
Removing non-printable characters is one of the most advanced parts of data cleansing! We can say that work is relatively complex with special text formulas. We have made some short VBA codes for several possibilities.
The non-printable Unicode characters are 129, 141, 143, 144, and 157.
Let’s see how it works with the help of the following example. The text in cell B3 contains non-printable characters. So, first, you can check the result using LEN() formula. Then, to remove the unnecessary text parts, use the CLEAN function.
#11 – Remove unnecessary characters from text automatically
If we only want to delete special characters, the following tool can help. Download our add-in to remove alpha, numeric, or non-numeric characters in seconds.
#12 – Clean extra spaces using the TRIM function
Removing extra spaces from the text needs more attention.
You can find spaces at the beginning, in the middle, or at the end of the text. The hardest to recognize are the letter ones because they are invisible to the naked eye.
We’ll explain the two reasons they have to be removed. First that they can cause mistakes in the use of formulas. Imagine, for example, that the result of the VLOOKUP() formula is incorrect. It may sound funny, but it is not. Some formula mistakes may influence the outcome of the final calculation.
The other reason is the size of the file. However, indeed, the one-character surplus is not baneful. Problems occur when the Excel database contains several million records.
Yes, there are some like this. Just think about the external data sources and Power Query output. In this case, the size increase can even be 5% – 10%. We don’t have to depict the harmful effects of this. Use the TRIM() formula and guarantee that the list will be clean.
#13 – Insert text after specific nth characters
Using some Excel functions, we can resolve that we insert some characters in front of or behind a given string. The problem starts when we perform this operation on an Excel database.
By choosing the ‘Text Tools’ option, we can manipulate massive data sets in seconds! However, executing this task manually is the real challenge if you use simple Excel formulas.
From the Excel string manipulation formulas we have to emphasize the SEARCH(), FIND(), MID(), LEFT(), RIGHT() functions.
Now imagine that comfortable situation where you don’t have to create endless formulas ever again! Instead, start the functions that you can see in the picture.
#14 – Delete the text after the nth character
The following data cleansing task is a real threat to Excel users. Let’s stay a little more with the string manipulation functions. Opposite to the previous paragraph here, we don’t insert the characters but remove them.
At this point, all people who are good at Excel would say the following sentence: Let’s cut the characters from the front and back. What if we have to do this from the middle of the text or starting from the fifth character?
Probably this is the moment when the average user would throw the keyboard from the desk.
Please try the remove by position function; it is a very effective working tool. Then, look through the list. The result will quickly restore your blood pressure.
#15 – Remove special characters from a text
Finally, let’s see the special characters. We distinguish three different types. The first is the non-printable characters, the second is the alpha characters, and the third is the numerical data. The tools seen in the picture can remove only one type of character from the text.
Imagine how useful can be this Excel application if we have to cleanse a database containing phone numbers. The cells have numbers and other characters also.
Let’s say we need data that only can contain numbers. And the list of phone numbers has five thousand rows. What do you think about how long would it take to remove the hyphens and other non-numerical characters? It takes too long, only thinking about it! Use Excel and automated data cleaning functions! The execution time narrows down to seconds. It looks great!
Final Thoughts: Use Effective methods to Clean Data in Excel
Let’s summarize what we have learned today! We have introduced the most important text transformations and their field of use. The free data cleansing add-in will take a lot of weight off your shoulders. If you would like to increase your knowledge of Excel, visit the well-known Excel forums.
There is no shame in learning from professionals. That’s what we have done in the early days. However, excel string manipulation functions are a huge subject that needs time and attention. As we can see from the article, Excel offers many possibilities to improve data quality.
Some functions we made with VBA macros, you should check these out closely. We hope that you will create a flawless analysis with the help of the introduced data cleansing techniques.