15 Ways to Clean Data in Excel

Our definitive guide will show you 15 ways to clean data in Excel. Learn more about data cleansing through useful examples.

Clean data and string manipulations in Excel are critical! Besides the well-known techniques, we will introduce special VBA codes to make the process quicker and more effective.

Table of contents:

  1. Sorting data to check extremes and peaks
  2. Remove duplicates to clean data
  3. Use the find and replace function
  4. Check the type of data in a cell
  5. Convert numbers stored as text into numbers
  6. Eliminate blank cells in a list or range
  7. Clean data using split the text into columns
  8. Concatenate text using the TEXTJOIN function
  9. Change text to lower – upper – proper case
  10. Remove non-printable characters using the CLEAN formula
  11. Remove unnecessary characters from text automatically
  12. Clean extra spaces using the TRIM function
  13. Insert text after specific nth characters
  14. Delete the text after the nth character
  15. Remove special characters from a text

Effective ways to clean data in Excel

Let’s see the basics first! Sometimes the best method is the “back to the roots” and starts the data cleansing project with simple but effective solutions.

Prologue

Before we take a deep dive into this topic, allow us to tell a personal story.

After college studies, I worked at a small company with limited HR resources. As a newbie, they didn’t dish out the easy tasks for me. Okay, let’s clean data in Excel. I got countless boring tasks. The main issue was how to create clean data just in time? I had never heard of the Excel string manipulation functions, so the solution wasn’t simple. Many night shifts; good, old days.

After that, we formed our company. Finally, we had enough resources to develop a powerful Excel add-in for data cleansing purposes.

clean data in excel add-in

Our goal is to make text manipulation and Excel data processing a child’s play for you. So today, we publish a universal Excel add-in tool. From now, string manipulation and data cleansing will be easy! Read more about the utility.

Don’t forget: Dashboards are only successful if the source data has a clean and organized structure.

#1 – Sorting data to check extremes and peaks

Is it possible to find errors in a list using simple sorting? Yes! We have nothing else to do but arrange the data, either in growing or decreasing order. Right-click on the cell and choose the ‘Sort Largest To Smallest’ option from the menu. Seldom can there be found extremely small or large, maybe peeking data?

Look at the picture below:

clean data example 1 peaks

Who would ever think you could find these mistakes in several million records? Alternatively, you can find errors and blanks in a range using conditional formatting.

#2 – Remove duplicates to clean data

Excel supports many options to eliminate duplicates. For example, two tightly joined operations filter unique values and remove duplicated data. The result is the same in both cases: a list of unique values.

To filter for unique values (filtered list), use the Advanced command in the Sort & Filter group on the Data tab.

sort and filter group data

Select the range of cells! Then go to the Data tab! In the Sort & Filter group, click Advanced. Next, choose the Unique records only check box and click OK.

clean data unique records only

Explanation: However, there is one significant difference that we cannot ignore. In the process of filtering unique values, it temporarily hides the duplicated values. You can back your original list to undo this operation.

If you use the ‘Remove Duplicates’ tool on the ribbon, you’ll permanently delete duplicates! Pay attention to this! The removed, deleted items will last after saving the Workbook. So be very careful.

Let us see how the duplicate removal values tool works! First, select the range. Then, go to the Data tab! Then, in the Data Tools group, click Remove Duplicates.

remove duplicates quickly

Select one or more columns, then click OK. A message box will appear and shows how many duplicated values are removed and how many unique ones remain.

removed values popup

Finally, read this definitive guide to learn more about removing duplicates.

#3 – Use the find and replace function

The advantage of the Find & Replace function is that we can work with it relatively fast in any size data table. First, let’s see how to clean data using this function. To find and replace data in a worksheet, follow these steps:

Press the Ctrl+H keyboard shortcut, and the Find and Replace dialog box appears.

In the Find What box, enter the data you want to locate. Then, enter the Replace With box the data you want to replace.

Use the find and replace function

If you want to replace all occurrences simultaneously, click Replace All.

#4 – Check the Type of Data in a Cell

Please wait a minute before we make any changes to the raw data.

We have to check the type of data in the cell. We can do this by using the TYPE function. The function is one of a group of information functions. Numeric or text data? In the picture below, you can see what kind of results the formula brings in the case of different types of data.

If the function returns 1, the data type is numeric. If it’s returning with 2, the data type is text.

Check the Type of Data in a Cell to clean data

#5 – Convert Numbers Stored as Text into Numbers

If we import data from a text file or an external data source into Excel, Excel often stores numbers as text. It can be the source of many problems, but there are ways to avoid mistakes.

How to convert numbers stored as text into numbers?

Create a helper column and type 1. Next, apply a simple multiply using this formula: number (stored as text) * 1. A result is a number.

Convert Numbers Stored as Text into Numbers

Read more about how to convert text to numbers.

#6- Eliminate blank cells in a list or range

Now we’ll highlight the whole range that also contains the empty lines. Press the F5 key, and from the popup window, choose the ‘Special’ option.

All the cells in the selected range that are not blank are deselected, leaving only the blank cells selected.

Eliminate blank cells in a list or range

Click ‘Delete‘ in the ‘Cells’ section of the ‘Home‘ tab.

Finally, select ‘Delete Sheet Rows‘ from the drop-down menu.

#7 – Split text into columns

It’s easy to split a text into multiple cells. For example, suppose that our data set contains our clients’ first and family names in one cell. We want to split the names that the first names and family names be in different columns.

Start the ‘Convert Text to Columns Wizard’ and follow the steps below:

First, select the range which contains names. Next, click Text to Columns in the Data Tools group.

text to columns

Select the Delimiters checkbox and click Next.

click next

Leave all the checkboxes empty under the Delimiters section. Now check the Space check box (in our example). Click Finish!

split names final step

#8 – Concatenate text using the TEXTJOIN function

Let’s see the opposite of the previous section. Now we would like to join the text from several cells into one. How can we solve this problem quickly?

The TEXTJOIN function joins (what a surprise) values using a selected delimiter. It provides better functionality than CONCAT. This function allows you to supply a range of cells and has a setting to ignore empty values.

Take a look at the formula bar! You can use space as a delimiter. The second parameter is TRUE because we would like to ignore the empty cells (if it is available).

Concatenate text using the TEXTJOIN function

If you are using an earlier version of Excel, you should install 3rd party add-in.

#9 – Change Text to Lower – Upper – Proper Case

Let’s see what kind of Excel text transformation formulas we can use to transform the text. It is worth using them during data cleansing.

  • LOWER() transforms the given text into lower case letters. Example: LOWER(‘EXCEL’), result: ‘excel’
  • UPPER() changes the given text into upper case letters. Example: UPPER(‘excel’), result: ‘EXCEL’
  • PROPER() transforms the first letter of the character string and the letters standing after none-letters into upper case letters.

Example:

=PROPER(‘Information is BEAUTIFUL’)

clean data to change text into proper case

The result of the conversion is: ‘Information Is Beautiful. Not too often, but we might use these kinds of transformations also.

#10 – Remove non-printable characters – The CLEAN formula

Removing non-printable characters is one of the most advanced parts of data cleansing! We can say that work is relatively complex with special text formulas. We have made some short VBA codes for several possibilities.

The non-printable Unicode characters are 129, 141, 143, 144, and 157.

Let’s see how it works with the help of the following example. The text in cell B3 contains non-printable characters. So, first, you can check the result using LEN() formula. Then, to remove the unnecessary text parts, use the CLEAN function.

remove non-printable characters

#11 – Remove unnecessary characters from text automatically

If we only want to delete special characters, the following tool can help. Download our add-in to remove alpha, numeric, or non-numeric characters in seconds.

Remove unnecessary characters from text using macros

#12 – Clean extra spaces using the TRIM function

Removing extra spaces from the text needs more attention.

You can find spaces at the beginning, in the middle, or at the end of the text. The hardest to recognize are the letter ones because they are invisible to the naked eye.

We’ll explain the two reasons they have to be removed. First that they can cause mistakes in the use of formulas. Imagine, for example, that the result of the VLOOKUP() formula is incorrect. It may sound funny, but it is not. Some formula mistakes may influence the outcome of the final calculation.

Clean extra spaces using the TRIM function

The other reason is the size of the file. However, indeed, the one-character surplus is not baneful. Problems occur when the Excel database contains several million records.

Yes, there are some like this. Just think about the external data sources and Power Query output. In this case, the size increase can even be 5% – 10%. We don’t have to depict the harmful effects of this. Use the TRIM() formula and guarantee that the list will be clean.

#13 – Insert text after specific nth characters

Using some Excel functions, we can resolve that we insert some characters in front of or behind a given string. The problem starts when we perform this operation on an Excel database.

By choosing the ‘Text Tools’ option, we can manipulate massive data sets in seconds! However, executing this task manually is the real challenge if you use simple Excel formulas.

Insert text after specific nth characters

From the Excel string manipulation formulas we have to emphasize the SEARCH(), FIND(), MID(), LEFT(), RIGHT() functions.

Now imagine that comfortable situation where you don’t have to create endless formulas ever again! Instead, start the functions that you can see in the picture.

#14 – Delete the text after the nth character

The following data cleansing task is a real threat to Excel users. Let’s stay a little more with the string manipulation functions. Opposite to the previous paragraph here, we don’t insert the characters but remove them.

At this point, all people who are good at Excel would say the following sentence: Let’s cut the characters from the front and back. What if we have to do this from the middle of the text or starting from the fifth character?

Probably this is the moment when the average user would throw the keyboard from the desk.

Delete the text after the nth character

Please try the remove by position function; it is a very effective working tool. Then, look through the list. The result will quickly restore your blood pressure.

#15 – Remove special characters from a text

Finally, let’s see the special characters. We distinguish three different types. The first is the non-printable characters, the second is the alpha characters, and the third is the numerical data. The tools seen in the picture can remove only one type of character from the text.

Imagine how useful can be this Excel application if we have to cleanse a database containing phone numbers. The cells have numbers and other characters also.

Remove special characters from a text

Let’s say we need data that only can contain numbers. And the list of phone numbers has five thousand rows. What do you think about how long would it take to remove the hyphens and other non-numerical characters? It takes too long, only thinking about it! Use Excel and automated data cleaning functions! The execution time narrows down to seconds. It looks great!

Final Thoughts: Use Effective methods to Clean Data in Excel

Let’s summarize what we have learned today! We have introduced the most important text transformations and their field of use. The free data cleansing add-in will take a lot of weight off your shoulders. If you would like to increase your knowledge of Excel, visit the well-known Excel forums.

There is no shame in learning from professionals. That’s what we have done in the early days. However, excel string manipulation functions are a huge subject that needs time and attention. As we can see from the article, Excel offers many possibilities to improve data quality.

Some functions we made with VBA macros, you should check these out closely. We hope that you will create a flawless analysis with the help of the introduced data cleansing techniques.

Additional resources:

Istvan Vozar

Istvan is the co-founder of Excelkid. He writes blog posts and helps people to reach the top in Excel.