Python Remove Duplicates from a List

Python Remove Duplicates from a List

Table of Contents

A list in Python is a type of data structure that is used to store a collection of data. The individual data can be strings, integers, floats, booleans and are separated by commas. In other programming languages like C or Java, a list is called an array. In this tutorial, we will discuss how to remove duplicates in a list. You will learn the different methods that can be used to remove duplicates in a list in Python

Specifically, here’s what you will learn by the end of this tutorial. 

  • How to remove duplicate in a list by typecasting to a set
  • How to remove duplicate in a list by creating a conditional loop
  • How to remove duplicate in a list by using the OrderKeys class
  • How to remove duplicate in a list by using Pandas

Without further ado, let’s jump into it. 

Typecasting the List to a Set

A set in python is another built-in data structure in Python that is used to store a collection of data. What differentiates a list from a set, however, is that while a list can have duplicate values, the values in a set are unique. 

With this in mind, a list with duplicate values can be typecast (converted) to a set. Typecasting the list to a set automatically removes duplicate values in the list since a set does not allow duplicates. Thereafter, the set can be typecast back to a list. This process is shown in the code snippet below. Let’s say we want to remove the duplicates in the list: [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3. 5, 3, 4, 2, 2, 1]

#define a list
dup_list =  [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1]
 
#typecast list to set
the_set = set(dup_list)
 
#print unique list
print(list(the_set))
Output: [1, 2, 3, 4, 5, 6]

Removing Duplicates By Creating a Temporary List

Another approach to removing duplicates from a list is to create an empty list. Then loop through each element of the list that contains duplicate values. Add a condition using the if statement to only add elements of the duplicate list to the empty list once. So the condition checks whether or not the element already exists in the empty list. 

If it does not exist, then the element can be added. If it does exist already, the element is not added. At the end of the day, the empty list will be populated with only unique elements. See the code snippet below.

#define a list
dup_list =  [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1]
 
#create an empty list
unique_list = []
 
#add elements of the list only if its not in the unique list
for each in dup_list:
    if each not in unique_list:
        unique_list.append(each)
 
#print the unique list
print(unique_list)

Output:
[1, 2, 3, 4, 5, 6]

Removing Duplicates by Using the OrderDict Class 

From Python 2,7 and above, the OrderDict class can be used to remove duplicates from a list. You begin by importing the class from the collections module. 

The fromkeys() method of the OrderDict class can be called to return an OrderDict object which contains a list of tuples with unique elements in the list. To print only the unique values, typecast the OrderedDict to a list. An example is shown below.

#import the necessary library
from collections import OrderedDict
 
#define a list
dup_list =  [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1]
 
#convert list to OrderDict object
ordered_dict = OrderedDict.fromkeys(dup_list)
 
#print the list with unique values
print(list(ordered_dict))
Output:
[1, 2, 3, 4, 5, 6]

If you are using Python 3.5 and above, you can replicate the above step in a shorter way by converting the list into a dictionary using the dict.fromkeys() method. Then typecast the dict to a list. See an example below.

#define a list
dup_list =  [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1]
 
#convert list to a dictionary
dict_keys = dict.fromkeys(dup_list)
 
#print the list with unique values
print(list(dict_keys))
Output:
[1, 2, 3, 4, 5, 6]

Removing duplicates using the unique() method in Pandas

Pandas has a method that is used to return unique values in a list. This is a quick and easy method. An example is shown below. 

#import the necessary library
import pandas as pd
 
#define a list
dup_list =  [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1]
 
#print the unique list. 
print(pd.unique(dup_list))
Output:
[1 2 3 4 5 6]

In conclusion, you have seen four different methods of removing duplicates in a list. You learned that by typecasting a list to a set, the duplicates get automatically removed. Also, you learned how to create loops with conditions that return unique values to the list. We went further to discuss how you can utilize the OrderDict class and return on the keys as a list using the fromkeys() method. Finally, you discovered how to use the unique() method in pandas to return unique values in a list. 

If you have any questions, feel free to leave them in the comment section and I’d do my best to answer them. 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class