# zava¶ zava rocks!

Massive and high-dimensional numerical (or continuous) data may visualized using parallel coordinates. For a technical discussion of parallel coordinates see [Weg90]. In parallel coordinates, axes are drawn parallel to one another (as opposed to drawn orthogonal to one another). A vector (or row) of data, $$(x_1, x_2, \ldots, x_n)$$, is plotted by drawing $$x_1$$ on axis 1, $$x_2$$ on axis 2, and so on through $$x_n$$ on axis n. The plotted points are joined by a broken line. The use of parallel coordinates to visualize massive and high-dimensional data is often a first step in exploratory data analysis EDA where one may wish to visually identify patterns, clusters, or outliers. Towards the purpose of EDA, a generalized rotation of the coordinate axes in high-dimensional space, referred to as the Grand Tour [EJW02], may be used in combination with hue and saturation brushing techniques [EJW96].

## Quickstart¶

### Installation¶

To install zava from pypi, use pip.

pip install zava


### Basic Usage¶

#### Data¶

Everything in zava starts with data. Your data should be either a 2-dimensional numpy array (ndarray) or a pandas dataframe. If you are using a pandas dataframe, the axis will be labeled according to the dataframe column names; otherwise, you get generic axis names.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import numpy as np import pandas as pd # you can use this numpy array M = np.array([ [1, 1, 1, 1], [2, 2, 2, 1], [3, 3, 3, 3], [1, 2, 3, 4], [2, 2, 1, 1], [1, 1, 3, 3] ]) # or you can convert the array to a pandas dataframe columns = ['v0', 'v1', 'v2', 'v3'] M = pd.DataFrame(M, columns=columns) 

#### Grand Tour¶

You can then proceed to create a GrandTour instance passing in the data. Note the parameters c and d which are to control the scaling of your data. Since the variables in your data may be on different scale, this normalization is required to bring all of them into the same range for plotting with parallel coordinates.

 1 2 3 4 5 6 from zava.core import GrandTour c = 0.0 d = 100.0 grand_tour = GrandTour(M, c, d) 

#### Rotations¶

With the GrandTour instance, you can invoke the rotate() method to get the rotated data. If your data is huge, you most likely do NOT want to do this operation as shown below, as it will store 360 matrices (you do not even want to do this operation, it’s just here for illustration purpose on how to get the rotated data).

 1 R = [grand_tour.rotate(degree) for degree in range(361)] 

#### Visualization¶

Most likely, you will want to rotate your data and visualize each transformation at a time. Below is a simple example of what you can do with matplotlib just visualizing one rotation.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import matplotlib.pyplot as plt # rotates the data by 1 degree S = grand_tour.rotate(1) # start setting up plot with matplotlib fig, ax = plt.subplots(figsize=(15, 3)) # note that S is a pandas dataframe # we can use S to make line plots that mimics parallel coordinates params = { 'kind': 'line', 'ax': ax, 'color': 'r', 'marker': 'h', 'markeredgewidth': 1, 'markersize': 5, 'linewidth': 0.8 } _ = S.plot(**params)) # some additional plotting configurations/manipulations _ = ax.get_legend().remove() _ = ax.xaxis.set_major_locator(plt.MaxNLocator(S.shape)) _ = ax.get_yaxis().set_ticks([]) _ = ax.set_title('Grand Tour') 

Later, we will look at how to use zava in a Jupyter notebook.

#### Animations¶

Below is a full example of how to use zava to create and save the animation. You should have ffmpeg installed and in your path to get this example to work since matplotlib relies on ffmpeg to create the video.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib import animation from zava.core import GrandTour from zava.plot import SinglePlotter, MultiPlotter # 1. create or get your data columns = ['v0', 'v1', 'v2', 'v3'] M1 = np.array([ [1, 1, 1, 1], [2, 2, 2, 1], [3, 3, 3, 3] ]) M2 = np.array([ [1, 2, 3, 4], [2, 2, 1, 1], [1, 1, 3, 3] ]) M1 = pd.DataFrame(M1, columns=columns) M2 = pd.DataFrame(M2, columns=columns) # 2. create your GrandTour instances c = 0.0 d = 100.0 gt1 = GrandTour(M1, c, d) gt2 = GrandTour(M2, c, d) # 3. create your plotters for each GrandTour instance # Note how the first dataset will have red lines # and the second dataset will have green lines. # The parameters passed in go into drawing the lines # and will help create powerful parallel coordinate with # grand tour visuals with hue and saturation effects. sp1 = SinglePlotter(gt1, params={'color': 'r'}) sp2 = SinglePlotter(gt2, params={'color': 'g'}) # 4. setup plotting and animation # don't forget to disable warnings and set the style plt.rcParams.update({'figure.max_open_warning': 0}) plt.style.use('ggplot') # Note how we use MultiPlotter to plot both datasets? fig, ax = plt.subplots(figsize=(15, 3)) mp = MultiPlotter([sp1, sp2], ax=ax) # matplotlib.animation will help us create animations params = { 'fig': fig, 'func': mp, 'frames': np.linspace(0, 360, 360), 'interval': 20, 'init_func': mp.init } anim = animation.FuncAnimation(**params) plt.close(fig) # 5. save the animation params = { 'filename': 'test.mov', 'dpi': 500, 'progress_callback': lambda i, n: print(f'Saving frame {i} of {n}'), 'metadata': { 'title': 'Parallel Coordinates with Grand Tour', 'artist': 'Clint Eastwood', 'genre': 'Action', 'subject': 'Exploratory data visualization', 'copyright': '2020', 'comment': 'One-Off Coder' } } anim.save(**params) 

Your animation video will look like the following. If you have great tips on how to customize animations with matplotlib, please let us know!

There is a lot of known issues with ffmpeg and matplotlib. You could also try saving the visualization as an animated gif.

  1 2 3 4 5 6 7 8 9 10 # set up your MultiPlotter as before plt.rcParams.update({'figure.max_open_warning': 0}) plt.style.use('ggplot') # note how we do not pass in an axis? mp = MultiPlotter([sp1, sp2], ax=None) # save # you have to play around with the duration parameter to get smoothness mp.save_gif('test.gif', duration=0.0001, start=0, stop=180) #### Considerations¶

It might not be a good idea to plot ALL your data due to computation and memory limitations. You might want to sample your data instead and plot that subset. Even with the simple, made-up data in this running example, creating a whole animation was intensive (laptop fans start to crank up).

## Jupyter¶

To get these examples to work in Jupyter, you will need to install the following.

### Widgets¶

Let’s see how we can use zava to work with ipywidgets. First, we got to get some data.

:

import numpy as np
import pandas as pd

M = np.array([
[1, 1, 1, 1],
[2, 2, 2, 1],
[3, 3, 3, 3],
[1, 2, 3, 4],
[2, 2, 1, 1],
[1, 1, 3, 3]
])


Now we create an instance of GrandTour with the data and also specifying the minimum c and maximum d values for scaling.

:

from zava.core import GrandTour

c = 0
d = 1
grand_tour = GrandTour(M, c, d)


Finally, we use a function f annotated with @interact to create an interactive visualization with parallel coordinates and Grand Tour.

:

import matplotlib.pyplot as plt
from ipywidgets import interact

@interact(degree=(0, 360 * 4, 0.5))
def f(degree=0):
S = grand_tour.rotate(degree)

fig, ax = plt.subplots(figsize=(15, 3))

params = {
'kind': 'line',
'ax': ax,
'color': 'r',
'marker': 'h',
'markeredgewidth': 1,
'markersize': 5,
'linewidth': 0.8
}

_ = S.plot(**params)
_ = ax.get_legend().remove()
_ = ax.set_xticks(np.arange(len(S.index)))
_ = ax.set_xticklabels(S.index)
_ = ax.get_yaxis().set_ticks([])
_ = ax.set_title('Grand Tour')


### Animations¶

Now let’s see how we can create HTML5 animations in a notebook using matplotlib.animation. Again, start with some data.

:

import numpy as np
import pandas as pd

M = np.array([
[1, 1, 1, 1],
[2, 2, 2, 1],
[3, 3, 3, 3],
[1, 2, 3, 4],
[2, 2, 1, 1],
[1, 1, 3, 3]
])


Create a GrandTour instance with the data.

:

from zava.core import GrandTour

c = 0
d = 1
grand_tour = GrandTour(M, c, d)


We have to wrap the GrandTour instance with a SinglePlotter. The SinglePlotter plots only a single set of data with an axis and does not concern itself with the greater plot (e.g. the title). The params argument is a dictionary that you can override to change the line drawings.

:

from zava.plot import SinglePlotter

single_plotter = SinglePlotter(grand_tour, params={'color': 'r'})


The MultiPlotter controls all the plots and takes in a list of SinglePlotters as well as an axis. You can then use an instance of this object with animation.FuncAnimation() as usual to produce an animation.

:

from zava.plot import MultiPlotter
from matplotlib import animation

fig, ax = plt.subplots(figsize=(5, 3))

multi_plotter = MultiPlotter([single_plotter], ax=ax)

params = {
'fig': fig,
'func': multi_plotter,
'frames': np.linspace(0, 360, 360),
'interval': 20,
'init_func': multi_plotter.init
}
anim = animation.FuncAnimation(**params)

plt.close(fig)


Finally, render the video.

:

%%time

from IPython.display import HTML

HTML(anim.to_html5_video())

CPU times: user 21.4 s, sys: 574 ms, total: 22 s
Wall time: 22.1 s

:


### Animation, colors¶

You might find yourself doing cluster analysis of high-dimensional data. If you recover some clusters, you can break the data apart according to the clusters and visualize them with different colors. Here’s a full working example (without the clustering) of how to visualize two datasets.

:

%%time

# 1. here are your two datasets, M1 and M2

columns = ['v0', 'v1', 'v2', 'v3']

M1 = np.array([
[1, 1, 1, 1],
[2, 2, 2, 1],
[3, 3, 3, 3]
])
M2 = np.array([
[1, 2, 3, 4],
[2, 2, 1, 1],
[1, 1, 3, 3]
])

M1 = pd.DataFrame(M1, columns=columns)
M2 = pd.DataFrame(M2, columns=columns)

# 2. create your GrandTour instances

c = 0.0
d = 100.0

gt1 = GrandTour(M1, c, d)
gt2 = GrandTour(M2, c, d)

# 3. create corresponding SinglePlotters

sp1 = SinglePlotter(gt1, params={'color': 'r'})
sp2 = SinglePlotter(gt2, params={'color': 'g'})

# 4. create a MultiPlotter from the SinglePlotters
fig, ax = plt.subplots(figsize=(5, 3))
mp = MultiPlotter([sp1, sp2], ax=ax)

params = {
'fig': fig,
'func': mp,
'frames': np.linspace(0, 360, 360),
'interval': 20,
'init_func': mp.init
}
anim = animation.FuncAnimation(**params)

plt.close(fig)

# 5. display the animation
HTML(anim.to_html5_video())

CPU times: user 25 s, sys: 608 ms, total: 25.6 s
Wall time: 25.7 s

: