Remove duplicates from a list in Python

Lists are one of the most common data types in Python and are used to store data sequentially. We’ll often need to remove duplicates entries to ensure that we are handling a unique list of elements.

In this tutorial, we’ll look at the different options that we have at our disposal to perform this duplicate removal.

Use a for loop to remove duplicates from a list

One of the obvious way to remove duplicates from a list is to use a for loop. As we’ll see below, there are more efficient ways to achieve our objective, but it has the advantage to be easy to understand. We basically create a new list from our original list, adding, through a for loop, only elements that are not already included in this new list.

#our original list original_list = [1,2,3,4,5,5,6,7,7] #new list with unique items unique_list = [] #we start our loop for element in original_list: #we check if the element is already in our unique list if element not in unique_list: #we add it unique_list.append(element) print(unique_list)
Code language: Python (python)

Please note that our code use the in operator that is commonly used to check if an element is in a list. Output:

[1, 2, 3, 4, 5, 6, 7]
Code language: JSON / JSON with Comments (json)

Some notes on this approach:

  • Not advisable for large lists. The code may take a while to run.
  • The order of the original list is kept.
  • You can simplify the syntax by using a list comprehension:
#our original list original_list = [1,2,3,4,5,5,6,7,7] #new list with unique items new_list = [] #list comprehension [new_list.append(element) for element in original_list if element not in new_list] print(new_list)
Code language: Python (python)

Use a set() to remove duplicates from a list

This is maybe the fastest and most common to achieve our objective. If you’re not familiar with sets, this data type has one characteristic that is useful in our case: duplicates are not allowed. By converting a list to a set, we remove duplicated because they are not allowed.

#our original list original_list = [1,2,3,4,5,5,6,7,7] #new list with unique items new_list = list(set(original_list)) print(new_list)
Code language: Python (python)

Output:

[1, 2, 3, 4, 5, 6, 7]
Code language: JSON / JSON with Comments (json)

Use a dictionary to remove duplicates from a list

This approach is very similar to the previous one in the sense that it uses the fact that a dictionary can only have unique keys. We create a dictionary from our list and then convert it again to a list.

#our original list original_list = [1,2,3,4,5,5,6,7,7] #new list with unique items new_list = list(dict.fromkeys(original_list)) print(new_list)
Code language: Python (python)

Use Pandas to remove duplicates from a list

Lastly, you can use the Pandas library to remove duplicates. I recommend doing it like only if you are already working with a DataFrame. It comes with the drop_duplicates() whose function is to … drop duplicates!

import pandas as pd df = {'numbers':[1,2,3,4,5,5,6,7,7]} df = pd.DataFrame(df, columns=['numbers']).drop_duplicates() print(df)
Code language: Python (python)

Output:

1
2
3
4
5
6
7

Leave a Reply

Your email address will not be published. Required fields are marked *