Working with CSV files
CSV files are the most common format of exchanging files in Data Science. Most online repositories would rather store files for download in CSV format than an xlsx format. Although they are both files in spreadsheet format, a CSV file is a simplified and lightweight spreadsheet that stores data as texts.
As a data scientist, it is extremely vital to learn how to work with CSV files. That’s the aim of this tutorial. By the end of this article, you will learn
- What CSV files are
- What a CSV file looks like
- How to read and write data to a CSV file using the csv library
- How to read and write data to a CSV file using the pandas library
Let’s get started.
What is a CSV file?
A CSV file simply stands for comma-separated values. It is a file format for storing and extracting texts in a spreadsheet. Each cell in the spreadsheet is separated by commas, hence the name.
The texts are separated by commas which form the individual column. Conversely, each line of text is seen as a row and a new line indicates a new row. This is how a CSV file structures texts into a tabular format.
When working with CSV files, you do not need to reinvent the wheel by building systems to parse and write texts in CSV files. There are modules that do these tasks. We will use the CSV module as well as the pandas module in this tutorial. Before we jump into it, let’s understand a CSV file really looks like.
What a CSV file looks like.
As mentioned above, texts separated in columns are converted into a tabular format. Let’s see an example.
Login email | Identifier | First name | Last name |
[email protected] | 2070 | Laura | Grey |
[email protected] | 4081 | Craig | Johnson |
[email protected] | 9346 | Mary | Jenkins |
[email protected] | 5079 | Jamie | Smith |
The CSV file for the table above is shown below.
Login email,Identifier,First name,Last name
[email protected],2070,Laura,Grey [email protected],4081,Craig,Johnson [email protected],9346,Mary,Jenkins [email protected],5079,Jamie,Smith
As seen, the columns are separated by commas while a new line is depicted as a new row.
Working with CSV files.
In Python, there exists an in-built module for working with CSV files, the csv module. The module allows you to pull data, write, modify and store data to a CSV file. Let’s list the functions available with the csv module in python
- csv.writer: This is used to write data to a CSV file
- csv.reader: This function is used to read a CSV file
- csv.register_dialect: This function is used to map string names to a dialect class
- csv.unregister_dialect: This function is used to delete the string name connected to a dialect class
- csv.list_dialects: This function is used to return the list of dialect names
- csv.get_dialect: This function is used to return instances of dialect connected to name
- csv.field_size_limit: The function is used to set an upper limit on the parser
Here, we will focus our attention on the csv.writer and csv.reader function as those are the most functions used to do the most common tasks, reading and writing to a CSV file
Reading a CSV file
This function is used to read the data in a CSV file. The csv.reader method parses the data in the CSV file and returns the rows. A for loop can then be used to parse and print all the rows.Â
#open the file in text mode with open('email.csv', 'rt') as file:     #store it in a variable     data = csv.reader(file)     #parse each row and print the result     for each_row in data:         print(each_row)
Output:
['Login email', 'Identifier', 'First name', 'Last name'] ['[email protected]', '2070', 'Laura', 'Grey'] ['[email protected]', '4081', 'Craig', 'Johnson'] ['[email protected]', '9346', 'Mary', 'Jenkins'] ['[email protected]', '5079', 'Jamie', 'Smith']
Let’s unpack what we did in the code above. We began by opening the file with the ‘with open’ statement so that we don’t have to close it manually. Thereafter, we read the file using the reader() method. We then parse each row of the file using a for loop.
Notice that the column names were the first lists printed. This does not look too great. You can read the list in a nicer way by reading the CSV file as a dictionary. Let’s see how to do it.
How to Read a CSV File as a Dictionary.
To read a CSV file as a dictionary, you use the DictReader() method rather than the reader() method. Let’s see an example.Â
#open the file in text mode with open('email.csv', 'rt') as file:     #store it in a variable     data = csv.DictReader(file)     #parse each row and print the result     for each_row in data:         print(each_row)
Output:
OrderedDict([('Login email', '[email protected]'), ('Identifier', '2070'), ('First name', 'Laura'), ('Last name', 'Grey')]) OrderedDict([('Login email', '[email protected]'), ('Identifier', '4081'), ('First name', 'Craig'), ('Last name', 'Johnson')]) OrderedDict([('Login email', '[email protected]'), ('Identifier', '9346'), ('First name', 'Mary'), ('Last name', 'Jenkins')]) OrderedDict([('Login email', '[email protected]'), ('Identifier', '5079'), ('First name', 'Jamie'), ('Last name', 'Smith')])
As seen, this time, the result is a list of tuples containing the column name and the values for each entry.
Writing Data to a CSV file
Just as it is possible to read data in a CSV, you can convert write data to a CSV file. To do this, you use the writer() method. The writerow() method is then used to add data to a row.
To see how this is done, see the example below.Â
#import necessary libraries import csv  #open the CSV file with open('writen_data.csv', mode='w') as file:     data = csv.writer(file)      #write data to the file, row by row     data.writerow(['Login email', 'Identifier', 'First name', 'Last name'])     data.writerow(['[email protected]', '2070', 'Laura', 'Grey'])     data.writerow(['[email protected]', '4081', 'Craig', 'Johnson'])     data.writerow(['[email protected]', '9346', 'Mary', 'Jenkins'])     data.writerow(['[email protected]', '5079', 'Jamie', 'Smith'])
Output:
Here’s what the file looks like
Working with CSV files using Pandas
Most often than not, you’d be using the Pandas library to work with CSV files. This is because it is easier to read CSV files on the terminal and do other operations using pandas. In this section, we will discuss how to work with CSV files using pandas.
To read CSV with pandas, we use the read_csv() method while to write to a CSV in pandas, we use the to_csv() method. Let’s illustrate each of these tasks using examples.
Reading a CSV file using Pandas
As mentioned earlier, the read_csv() method is used to read a CSV file using pandas
#import the necessary libraries import pandas as pd #read the CSV file data = pd.read_csv('email.csv') #print the result print(data)
Output:
Login email Identifier First name Last name 0 [email protected]    2070   Laura   Grey 1 [email protected]    4081   Craig  Johnson 2  [email protected]    9346    Mary  Jenkins 3 [email protected]    5079   Jamie   Smith
Easy as that!
Writing to a CSV file using Pandas
To write data into a CSV file, the to_csv() method is used. However, it is important to point out that the data is preferably stored as a dictionary. The dictionary is converted into a dataframe. Thereafter, the dataframe is converted to a CSV file. The example below illustrates this process.
#import the necessary libraries import pandas as pd #create a data in dictionary form data = {'Login email': ['[email protected]', '2070', 'Laura', 'Grey'], Â Â Â Â Â Â Â 'Identifier': ['[email protected]', '4081', 'Craig', 'Johnson'], Â Â Â Â Â Â Â Â 'First name': ['[email protected]', '9346', 'Mary', 'Jenkins'], Â Â Â Â Â Â Â 'Last name' : ['[email protected]', '5079', 'Jamie', 'Smith'] } #convert the dictionary to a pandas dataframe df = pd.DataFrame(data) #save the dataframe as a CSV file df.to_csv('second_written_data.csv')
Output:
The CSV file looks like this:
That is all there is to it.
In this tutorial, you have learned how to write and read CSV files using the csv module. In addition, you learned how to do the same using the pandas library. It is worthy of mention to recall these methods.
- csv.reader()
- csv.DictReader()
- csv.writer()
- pandas.read_csv()
- pandas.to_csv()
If you’ve got any questions, feel free to leave them in the comment section and I’d do my best to answer them.