A list in Python is a type of data structure that is used to store a collection of data. The individual data can be strings, integers, floats, booleans and are separated by commas. In other programming languages like C or Java, a list is called an array. In this tutorial, we will discuss how to remove duplicates in a list. You will learn the different methods that can be used to remove duplicates in a list in Python.
Specifically, hereās what you will learn by the end of this tutorial.
- How to remove duplicate in a list by typecasting to a set
- How to remove duplicate in a list by creating a conditional loop
- How to remove duplicate in a list by using the OrderKeys class
- How to remove duplicate in a list by using Pandas
Without further ado, let’s jump into it.
Typecasting the List to a Set
A set in python is another built-in data structure in Python that is used to store a collection of data. What differentiates a list from a set, however, is that while a list can have duplicate values, the values in a set are unique.
With this in mind, a list with duplicate values can be typecast (converted) to a set. Typecasting the list to a set automatically removes duplicate values in the list since a set does not allow duplicates. Thereafter, the set can be typecast back to a list. This process is shown in the code snippet below. Let’s say we want to remove the duplicates in the list: [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3. 5, 3, 4, 2, 2, 1]
#define a list dup_list = [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1] #typecast list to set the_set = set(dup_list) #print unique list print(list(the_set))
Output: [1, 2, 3, 4, 5, 6]
Removing Duplicates By Creating a Temporary List
Another approach to removing duplicates from a list is to create an empty list. Then loop through each element of the list that contains duplicate values. Add a condition using the if statement to only add elements of the duplicate list to the empty list once. So the condition checks whether or not the element already exists in the empty list.
If it does not exist, then the element can be added. If it does exist already, the element is not added. At the end of the day, the empty list will be populated with only unique elements. See the code snippet below.
#define a list dup_list = [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1] #create an empty list unique_list = [] #add elements of the list only if its not in the unique list for each in dup_list: if each not in unique_list: unique_list.append(each) #print the unique list print(unique_list)
Output:
[1, 2, 3, 4, 5, 6]
Removing Duplicates by Using the OrderDict Class
From Python 2,7 and above, the OrderDict class can be used to remove duplicates from a list. You begin by importing the class from the collections module.
The fromkeys() method of the OrderDict class can be called to return an OrderDict object which contains a list of tuples with unique elements in the list. To print only the unique values, typecast the OrderedDict to a list. An example is shown below.
#import the necessary library from collections import OrderedDict #define a list dup_list = [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1] #convert list to OrderDict object ordered_dict = OrderedDict.fromkeys(dup_list) #print the list with unique values print(list(ordered_dict))
Output:
[1, 2, 3, 4, 5, 6]
If you are using Python 3.5 and above, you can replicate the above step in a shorter way by converting the list into a dictionary using the dict.fromkeys() method. Then typecast the dict to a list. See an example below.
#define a list dup_list = [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1] #convert list to a dictionary dict_keys = dict.fromkeys(dup_list) #print the list with unique values print(list(dict_keys))
Output:
[1, 2, 3, 4, 5, 6]
Removing duplicates using the unique() method in Pandas
Pandas has a method that is used to return unique values in a list. This is a quick and easy method. An example is shown below.
#import the necessary library import pandas as pd #define a list dup_list = [1, 2, 2, 2, 3, 4, 1, 2, 4, 5, 6, 3, 5, 3, 4, 2, 2, 1] #print the unique list. print(pd.unique(dup_list))
Output:
[1 2 3 4 5 6]
In conclusion, you have seen four different methods of removing duplicates in a list. You learned that by typecasting a list to a set, the duplicates get automatically removed. Also, you learned how to create loops with conditions that return unique values to the list. We went further to discuss how you can utilize the OrderDict class and return on the keys as a list using the fromkeys() method. Finally, you discovered how to use the unique() method in pandas to return unique values in a list.
If you have any questions, feel free to leave them in the comment section and Iād do my best to answer them.