Introduction to Seaborn

Introduction to Seaborn

Table of Contents

There is just something extraordinary about a well-designed visualization. The colors stand out, the layers blend nicely together, the contours flow throughout, and the overall package not only has a nice aesthetic quality, but it provides meaningful insights to us as well. This is quite important in data science where we often work with a lot of messy data. Having the ability to visualize it is critical for a data scientist. Our stakeholders or clients will more often than not rely on visual cues rather than the intricacies of a machine learning model. There are plenty of excellent Python visualization libraries available, including the built-in  Matplotlib. Matplotlib has proven to be an incredibly useful and popular visualization tool, but even avid users will admit it often leaves much to be desired. An answer to all these problems is Seaborn  

What is Seaborn?  

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn provides simple high-level functions for common statistical plot types and integrates with the functionality provided by Pandas DataFrames. Ā 

Why should we use Seaborn over matplotlib?  

ā€¢ Matplotlib functions donā€™t work well with data frames, whereas  seaborn does  

ā€¢ Seaborn comes with a large number of high-level interfaces and  customized themes that matplotlib lacks as itā€™s not easy to figure out  the settings that make plots attractive  

ā€¢ Matplotlib’s API is a relatively low level. Doing sophisticated statistical visualization is possible, but often requires a lot of boilerplate code.  

ā€¢ Matplotlib predated Pandas by more than a decade and thus is not designed for use with Pandas DataFrames. To visualize data from a  Pandas DataFrame, you must extract each Series and often concatenate them together in the right format. It would be nicer to have a plotting library that can intelligently use the DataFrame labels in a plot. 

How to install Seaborn  

To install Seaborn and use it effectively, first, we need to install the aforementioned dependencies.  The following are the four mandatory dependencies you need to have

  • NumPy ( Version >= 1.9.3) 
  • SciPy ( Version >= 0.14.0) 
  • matplotlib ( Version >= 1.4.3)  
  • Pandas ( Version >= 0.15.2) 

Some of the Optional dependencies you need to have  ā€¢ Statsmodels, for advanced regression plots 

ā€¢ Fastcluster, for clustering large matrices  

Once this step is done, we are all set to install Seaborn and enjoy its mesmerizing plots. To install Seaborn, you can use the following line of  code  

To install the latest release of seaborn, you can use pip

pip install seaborn  

You can also use conda to install the latest version of seaborn:  conda install seaborn To import the dependencies and seaborn itself in your code, you can  use the following code  

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
import seaborn as sns  
from scipy import stats 

Data Visualization using Seaborn 

The following are the list of plots we can perform using seaborn ā€¢ Relational plots 

ā€¢ Distribution plots 

ā€¢ Categorical plots 

ā€¢ Regression plots 

ā€¢ Matrix plots 

ā€¢ Pair plots & Joint plots  

Relational plots  

Statistical analysis is a process of understanding how variables in a Ā dataset relate to each other and how those relationships depend on other variables. Visualization can be a core component of this process Ā because, when data are visualized properly, the human visual system Ā can see trends and patterns that indicate a relationship Ā The one we will use most is relplot(). This is a figure-level function for visualizing statistical relationships using two common approaches scatter plots and line plots.Ā Ā 

  • scatterplot() ## Draw a scatter plot with the possibility of several semantic groupings. 
  • lineplot() ## Draw a line plot with the possibility of several semantic groupings. 

Distribution plots  

Whenever we are dealing with a dataset, we want to know how the data or the variables are being distributed. 

distplot()is the figure-level function for visualizing distribution plots.  Distribution of data could tell us a lot about the nature of the data, as  we all know that there are two types 

Univariate Distributions  

  • distplot() ## Flexibly plot a univariate distribution of  observations.

Bivariate Distributions

  • histplot() ## Plot univariate or bivariate histograms to show  distributions of datasets. 
  • kde plot() ## Plot univariate or bivariate distributions using  kernel density estimation. 
  • ecdfplot() ## Plot empirical cumulative distribution functions. 
  • rugplot() ## Plot marginal distributions by drawing ticks along  the x and y axes. 

Categorical plots  

In this section, weā€™ll see the relationship between two variables of which one would be categorically divided into different groups. Weā€™ll be using  catplot() is the figure-level function of a seaborn library to draw the  plots of categorical data  

  • barplot() ## Show point estimates and confidence intervals as rectangular bars. 
  • countplot() ## Show the counts of observations in each categorical bin using bars. 
  • boxplot() ## Draw a box plot to show distributions concerning categories. 
  • voilinplot() ## Combination of boxplot and kernel density estimate. 
  • point plot() ## Show point estimates and confidence intervals using scatter plot glyphs. 
  • swarmplot() ## Draw a categorical scatterplot with non-overlapping points.

Regression Plots  

The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Regression plots as the name suggest create a  regression line between 2 parameters and help to visualize their linear relationships. 

  •  regplot() ## Plot data and a linear regression model fit.
  • residplot() ## Plot the residuals of linear regression.

Matrix Plots  

A matrix plot is a plot of matrix data. A matrix plot is a color-coded diagram that has rows of data, columns of data, and values. 

  • heatmap() ## Plot rectangular data as a color-encoded matrix. 
  • clustermap() ## Plot a matrix dataset as a hierarchically clustered heatmap. 

Pair plots & Joint plots  

We can also plot multiple bivariate distributions in a dataset by using the seaborn library. This shows the relationship between each column of the database. It also draws the univariate distribution plot of each variable on the diagonal axis.  

  • pairplot() ## Plot pairwise relationships in a dataset. 
  • jointplot() ## Draw a plot of two variables with bivariate and univariate graphs. Using all these plots we can perform data analysis and draw meaningful conclusions. We can also quickly see trends and outliers. If we can see something, we internalize it quickly.  

Importing Datasets  

Seaborn comes with a few important inbuilt datasets in the library.  When Seaborn is installed, the datasets download automatically. You can use any of these datasets for your learning. With the help of  the following function you can load the required dataset  

import seaborn as sns  
sns.load_dataset()  
load_dataset (name, cache: bool=True,**kws) 
name : name of the dataset ( name.csv on https://github.com/  mwaskom/seaborn-data ).  

cache : boolean, optional 

If True, then cache data locally and use the cache on subsequent calls  kws : dict, optional Passed to pandas.read_csv  To view all the available data sets in the Seaborn library, you can use  the following command with the get_dataset_names() function as  shown below

sns.get_dataset_names()

  Output:  

['anagrams','anscombe','attention','brain_networks',  'car_crashes','diamonds','dots','exercise','flights',  'fmri','gammas','geyser','iris','mpg','penguins',  ā€˜planetsā€™,'tips','titanic']  

This is the list of all available datasets in seaborn. We can use these  datasets for our practise purpose.  

Now we will learn how to import these datasets.  

Importing Data as Pandas DataFrame  

Now, we will import a dataset. This dataset loads as Pandas DataFrame by default. If there is any function in the Pandas DataFrame, it works on this DataFrame.Ā 

import seaborn as sns  

df = sns.load_dataset(ā€˜tipsā€™)  
df.head()  

We imported inbuilt tips data set from seaborn 

The above line of code will generate the following output  

Output : 

total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4

In the next article, we will learn how to visualize all the seaborn plots. 

4 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article
Subscribe
By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Need a Free Demo Class?
Join H2K Infosys IT Online Training
Enroll Free demo class