The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. A FacetGrid can be drawn with up to three dimensions: row, col, and hue. The first two have obvious correspondence with the resulting array of axes.
It can also represent levels of a third variable with the hue parameter, which plots different subsets of data in different colors. This uses color to resolve elements on a third dimension, but only draws subsets on top of each other and will not tailor the hue parameter for the specific visualization of the way that axes-level functions that accept hue will.
This class maps a dataset into multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the dataset. The plots it produces are often called “lattice”, “trellis”, or “small-multiple” graphics.
The basic workflow is to initialize the FacetGrid object with the dataset and the variables that are used to structure the grid. Then one or more plotting functions can be applied to each subset by calling FacetGrid.map() or FacetGrid.map_dataframe().
Finally, the plot can be tweaked with other methods to do things like changing the axis labels, use different ticks, or add a legend. See the detailed code examples below for more information. We will use the tips data set for this example.
import seaborn as sns
tips = sns.load_dataset(“tips")
Output:
total_bill | tip | sex | smoker | day | time | size | |
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
sns.FacetGrid(tips)
Output:
To draw a plot on every facet, pass a function and the name of one or more columns in the dataframe to FacetGrid.map()
g = sns.FacetGrid(tips, col=”time”, row=“sex”) g.map(sns.scatterplot, “total_bill”, “tip”)
The variable specification in FacetGrid.map() requires a positional argument mapping, but if the function has a data parameter and accepts named variable assignments, you can also use multi-plot grids FacetGrid.map_dataframe().
One difference between the two methods is that FacetGrid.map_dataframe() does not add axis labels.
g = sns.FacetGrid(tips, col="time", row=“sex”)
g.map_dataframe(sns.histplot, x=“total_bill”)
The FacetGrid constructor accepts a hue parameter. Setting this will condition the data on another variable and make multi-plot grids in different colors. Where possible, label information is tracked so that a single legend can be drawn
g = sns.FacetGrid(tips, col="time", hue=“sex") g.map_dataframe(sns.scatterplot, x=“total_bill", y=“tip")
The size and shape of the plot are specified at the level of each subplot using multi-plot grids the height and aspect parameters. Change the height and aspect ratio of each facet.
g = sns.FacetGrid(tips, col="day", height=3.5, aspect=.65)
g.map(sns.histplot, “total_bill”)
Note that margin_titles isn’t formally supported by the matplotlib API, and may not work well in all cases. In particular, it currently can’t be used with a legend that lies outside of the plot.
The size of the figure is set by providing the height of each facet, along with the aspect ratio
g =sns.FacetGrid(tips, col="day", height=4,aspect=.5) g.map(sns.barplot, "sex", "total_bill",
order=[“Male”, “Female”])
The default ordering of the facets is derived from the information in the multi-plot grids DataFrame. If the variable used to define facets has a categorical type, then the order of the categories is used.
Otherwise, the facets will be in the order of appearance of the category levels. It is possible, however, to specify an ordering of any facet dimension with the appropriate *_order parameter
ordered_days = tips.sex.value_counts().index g = sns.FacetGrid(tips, row="sex",
row_order=ordered_days,height=1.7, aspect=4,) g.map(sns.kdeplot, “total_bill”)
If you have many levels of one variable, you can plot it along with the columns but “wrap” them so that they span multiple rows. When doing this, you cannot use a row variable.
attend = sns.load_dataset(“attention").query("subject <= 12”)
Unnamed: 0 subject attention solutions score
Unnamed: 0 | subject | attention | solutions | score | |
0 | 0 | 1 | divided | 1 | 2.0 |
1 | 1 | 2 | divided | 1 | 3.0 |
2 | 2 | 3 | divided | 1 | 3.0 |
3 | 3 | 4 | divided | 1 | 5.0 |
4 | 4 | 5 | divided | 1 | 4.0 |
g = sns.FacetGrid(attend, col=”subject”, col_wrap=4, height=2, ylim=(0, 10))
g.map(sns.pointplot, “solutions”, “score”, order=[1, 2, 3], color=”.3″, ci=None)
Using custom functions
You’re not limited to existing matplotlib and seaborn functions when using FacetGrid. However, to work properly, any function you use must follow a few rules:
1. It must plot onto the “currently active” matplotlib Axes. This will be true of functions in the matplotlib.pyplot namespace, and you can call matplotlib.pyplot.gca() to get a reference to the current Axes if you want to work directly with its methods.
2. It must accept the data that it plots in positional arguments. Internally, FacetGrid will pass a series of data for each of the named positional arguments passed to FacetGrid.map().
3. It must be able to accept color and label keyword arguments, and, ideally, it will do something useful with them. In most cases, it’s easiest to catch a generic dictionary of **kwargs and pass it along to the underlying plotting function multi-plot grids.
Let’s look at a minimal example of a function you can plot with. This function will just take a single vector of data for each facet
from scipy import stats
def quantile_plot(x, **kwargs):
quantiles, xr = stats.probplot(x, fit=False) plt.scatter(xr, quantiles, **kwargs)
g = sns.FacetGrid(tips, col=”sex”, height=4) g.map(quantile_plot, “total_bill”)
Plotting pairwise data relationships
PPairGrid also allows you to quickly draw a grid of small subplots using the same plot type to visualize data in each. In a PairGrid, each row and column is assigned to a different variable, so the resulting plot shows each pairwise relationship in the dataset. This style of the plot is sometimes called a “scatterplot matrix”, as this is the most common way to show each relationship, but PairGrid is not limited to scatterplots.
It’s important to understand the differences between a FacetGrid and a PairGrid. In the former, each facet shows the same relationship conditioned on different levels of other variables. In the latter, each plot shows a different relationship (although the upper and lower triangles will have mirrored plots). Using PairGrid can give you a very quick, very high-level summary of interesting relationships in your dataset.
The basic usage of the class is very similar to FacetGrid. First, you initialize the grid, then you pass the plotting function to a map method and it will be called on each subplot. There is also a companion function, pairplot() that trades off some flexibility for faster plotting.
We will use iris dataset for this example
iris = sns.load_dataset(“iris”)
sepal_length | sepal_width | petal_length | petal_width | species | |
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
Now we plot pairplot() for this iris dataset
g = sns.PairGrid(iris)
g.map(sns.scatterplot)
By default every numeric column in the dataset is used, but you can focus on particular relationships if you want.
g = sns.PairGrid(iris, vars=[“sepal_length", "sepal_width"], hue=“species")
g.map(sns.scatterplot)
The square grid with identity relationships on the diagonal is actually just a special case, and you can plot with different variables in the rows and columns.
g = sns.PairGrid ( t i p s , y _ v a r s = [ " t i p " ] , x_vars=["total_bill", "size"], height=4) g.map(sns.regplot, color=“.3")
g.set(ylim=(-1, 11), yticks=[0, 5, 10])