Profile picture

Summary of Seaborn Visualizations

Last updated: June 14th, 20192019-06-14Project preview

rmotr


Summary of Seaborn visualizations

Far from being a visualization package created completely from scratch, the goal of seaborn has always been to offer simpler access to the creation of graphics based on matplotlib data. In fact, as indicated in its documentation, seaborn "should be seen as a complement to matplotlib and not a replacement of it."

seaborn depends on matplotlib for the own creation of graphics and only implements a layer of superior abstraction offers:

  • Mapping of visual elements from full DataFrame used as input.
  • Automation of certain statistical graphics.
  • Topics and styles more visually attractive than matplotlib (at least in previous versions).
  • Ease of access to color palettes.
  • Greater ease for the creation of "multigraphic" graphics.

purple-divider

Hands on!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
In [2]:
iris = sns.load_dataset('iris')
In [3]:
plt.scatter(iris['sepal_length'], iris['sepal_width'])
Out[3]:
<matplotlib.collections.PathCollection at 0x7f80781b7c50>

green-divider

Some graphics with seaborn

Since seaborn is a complement to matplotlib, the types of graphics it offers are precisely those that matplotlib lacks (or complicates its use). However, since matplotlib continues to evolve, some of the graphics are already part of the library and, over time, it is expected that they will continue to be incorporated.

 One-variable distribution

As we have seen, one of the deficiencies that most impact when working with matplotlib is the possibility of showing the distribution of the variables (although we have histograms and, now, of the violin graphics).

Seaborn offers multiple functions that solve this lack for both univariate distributions:

In [4]:
sns.distplot(iris['sepal_length'],
             hist=True,
             kde=True,
             rug=True)
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f807611ca58>
In [5]:
sns.kdeplot(iris['sepal_length'])
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f80760a1eb8>
In [6]:
sns.rugplot(iris['sepal_length'])
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f807602e400>

green-divider

Two-variable distribution

Also for bivariate distributions:

In [7]:
sns.jointplot(x='sepal_length', y='sepal_width', data=iris,
              kind='scatter')
Out[7]:
<seaborn.axisgrid.JointGrid at 0x7f8075fedeb8>
In [8]:
sns.jointplot(x='sepal_length', y='sepal_width', data=iris,
              kind='hex')
Out[8]:
<seaborn.axisgrid.JointGrid at 0x7f8075ecd320>
In [9]:
sns.jointplot(x='sepal_length', y='sepal_width', data=iris,
              kind='kde')
Out[9]:
<seaborn.axisgrid.JointGrid at 0x7f8075dfdfd0>

green-divider

 Pairplots, relating two or more variables

These plots are used to study the relationship between all the variables with a single visualization.

In [10]:
sns.pairplot(iris)
Out[10]:
<seaborn.axisgrid.PairGrid at 0x7f80b8052eb8>
In [11]:
sns.pairplot(iris, vars=['sepal_width', 'petal_length'])
Out[11]:
<seaborn.axisgrid.PairGrid at 0x7f8075418a20>
In [12]:
sns.pairplot(iris, diag_kind='kde')
Out[12]:
<seaborn.axisgrid.PairGrid at 0x7f80740d8940>

green-divider

Regression Plots

Another of the weak points of matplotlib is the inclusion of "statistics" that help to understand the content of a visualization and its underlying dataset directly (since they can always be calculated and included as an additional layer to any graphic).

Seaborn tries to include this functionality to "approximate" that stats.

In [13]:
sns.regplot(x='sepal_length', y='sepal_width', data=iris)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f80739237b8>

That doesn't look too good. Regressions are used to analyze the correlation between different variables. If there's a strong correlation, the points will be aligned in a diagonal.

Let's create a new column which makes this relationship a little bit more clear:

In [14]:
iris['Sepal Area'] = iris['sepal_length'] * iris['sepal_width']
In [15]:
sns.regplot('sepal_length', 'Sepal Area', data=iris)
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f807368a828>

There's also a lmplot, which lets you specify "polynomial regressions" (this will make more sense when we start talking about Machine Learning in the future):

In [16]:
sns.lmplot(data=iris, x='sepal_length', y='Sepal Area',
           order=3)
Out[16]:
<seaborn.axisgrid.FacetGrid at 0x7f8075c9ac50>
In [17]:
sns.lmplot(data=iris, x='sepal_length', y='Sepal Area',
           hue='species',
           markers=["+", "o", "^"])
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7f80735df438>
In [18]:
del iris['Sepal Area']

green-divider

 Charts based on categorical variable

We have seen that matplotlib treats ALL the variables included in the charts as numerical sets. If you want to make visualizations based on categorical variables, you need to work with them as numeric and "trick" the content of the axes to give the "feeling" of working with numerical variables.

Seaborn allows the direct use of categorical variables and their correct treatments.

In [19]:
sns.stripplot(data=iris, x='species', y='sepal_length')
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f80735efdd8>
In [20]:
sns.stripplot(data=iris, x='sepal_length', y='species',
              jitter=False)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8073522d30>
In [21]:
sns.stripplot(data=iris, x='sepal_length', y='sepal_width',
              hue='species')
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8073493b00>
In [22]:
sns.swarmplot(data=iris, x='species', y='sepal_length')
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8073383eb8>
In [23]:
sns.boxplot(data=iris, x='species', y='sepal_length')
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8073365f98>
In [24]:
sns.violinplot(data=iris, x='species', y='sepal_length')
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f80732ea7f0>

green-divider

Multigraphic with seaborn

Although we have seen that matplotlib allows the creation of multigraphic graphics, we have also talked about the limitations of its generation since they are based solely on positioning and it is the responsibility of the user to adjust scales, the drawing of each graphic in its corresponding position, etc.

In [25]:
grid = sns.FacetGrid(iris, col='species', hue='species')

grid.map(plt.scatter,'sepal_length', 'sepal_width')
Out[25]:
<seaborn.axisgrid.FacetGrid at 0x7f80732dcef0>

Also, a legend can be added to the grid plots:

In [26]:
grid = sns.FacetGrid(iris, col='species', hue='species')

grid.map(plt.scatter,'sepal_length', 'sepal_width')

grid.add_legend()
Out[26]:
<seaborn.axisgrid.FacetGrid at 0x7f80732cc940>
In [27]:
diamonds = pd.read_table('data/diamonds.txt', sep=',')

grid = sns.FacetGrid(diamonds, col='cut', row='color')

grid.map(plt.hist, 'price')
Out[27]:
<seaborn.axisgrid.FacetGrid at 0x7f80730f2898>

green-divider

Styles in seaborn

At the time of the creation of seaborn, the style management of matplotlib was very limited and "forced" its definition based on matplotlibrc files (costly, uncomfortable, inflexible and a priori).

Currently matplotlib has improved a lot in this aspect and the "advantages" offered by seaborn are not so clear.

In [28]:
sns.axes_style()
Out[28]:
{'axes.facecolor': 'w',
 'axes.edgecolor': 'k',
 'axes.grid': False,
 'axes.axisbelow': 'line',
 'axes.labelcolor': 'k',
 'figure.facecolor': (1, 1, 1, 0),
 'grid.color': '#b0b0b0',
 'grid.linestyle': '-',
 'text.color': 'k',
 'xtick.color': 'k',
 'ytick.color': 'k',
 'xtick.direction': 'out',
 'ytick.direction': 'out',
 'lines.solid_capstyle': 'projecting',
 'patch.edgecolor': 'k',
 'image.cmap': 'viridis',
 'font.family': ['sans-serif'],
 'font.sans-serif': ['DejaVu Sans',
  'Bitstream Vera Sans',
  'Computer Modern Sans Serif',
  'Lucida Grande',
  'Verdana',
  'Geneva',
  'Lucid',
  'Arial',
  'Helvetica',
  'Avant Garde',
  'sans-serif'],
 'patch.force_edgecolor': False,
 'xtick.bottom': True,
 'xtick.top': False,
 'ytick.left': True,
 'ytick.right': False,
 'axes.spines.left': True,
 'axes.spines.bottom': True,
 'axes.spines.right': True,
 'axes.spines.top': True}
In [29]:
sns.lmplot(data=iris, x='sepal_length', y='sepal_width')
Out[29]:
<seaborn.axisgrid.FacetGrid at 0x7f80730abe80>
In [30]:
sns.set_style('white')

sns.lmplot(data=iris, x='sepal_length', y='sepal_width')
Out[30]:
<seaborn.axisgrid.FacetGrid at 0x7f8071f8a240>
In [31]:
sns.set_style('darkgrid', {
    "grid.color": "#cccccc",
    "grid.linestyle": "--",
    "figure.facecolor": "#eeeeee"})

sns.lmplot(data=iris, x='sepal_length', y='sepal_width')
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x7f807148a7b8>

Top and right borders can be removed using the dispine() function.

In [32]:
sns.set_style("white")

sns.distplot(iris['sepal_length'])

sns.despine()

green-divider

Adding title and labels

You can also modify title and axis labels in Seaborn.

In [33]:
sns.distplot(iris['sepal_length'])
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8071bc1a90>
In [34]:
plot = sns.distplot(iris['sepal_length'])

plot.set(title='Sepal length',
         xlabel='Value',
         ylabel='Frequency')
Out[34]:
[Text(0,0.5,'Frequency'), Text(0.5,0,'Value'), Text(0.5,1,'Sepal length')]

purple-divider

Notebooks AI
Notebooks AI Profile20060