Quickstart

Installation

To install zava from pypi, use pip.

pip install zava

Basic Usage

Data

Everything in zava starts with data. Your data should be either a 2-dimensional numpy array (ndarray) or a pandas dataframe. If you are using a pandas dataframe, the axis will be labeled according to the dataframe column names; otherwise, you get generic axis names.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np
import pandas as pd

# you can use this numpy array
M = np.array([
    [1, 1, 1, 1],
    [2, 2, 2, 1],
    [3, 3, 3, 3],
    [1, 2, 3, 4],
    [2, 2, 1, 1],
    [1, 1, 3, 3]
])

# or you can convert the array to a pandas dataframe
columns = ['v0', 'v1', 'v2', 'v3']
M = pd.DataFrame(M, columns=columns)

Grand Tour

You can then proceed to create a GrandTour instance passing in the data. Note the parameters c and d which are to control the scaling of your data. Since the variables in your data may be on different scale, this normalization is required to bring all of them into the same range for plotting with parallel coordinates.

1
2
3
4
5
6
from zava.core import GrandTour

c = 0.0
d = 100.0

grand_tour = GrandTour(M, c, d)

Rotations

With the GrandTour instance, you can invoke the rotate() method to get the rotated data. If your data is huge, you most likely do NOT want to do this operation as shown below, as it will store 360 matrices (you do not even want to do this operation, it’s just here for illustration purpose on how to get the rotated data).

1
R = [grand_tour.rotate(degree) for degree in range(361)]

Visualization

Most likely, you will want to rotate your data and visualize each transformation at a time. Below is a simple example of what you can do with matplotlib just visualizing one rotation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import matplotlib.pyplot as plt

# rotates the data by 1 degree
S = grand_tour.rotate(1)

# start setting up plot with matplotlib
fig, ax = plt.subplots(figsize=(15, 3))

# note that S is a pandas dataframe
# we can use S to make line plots that mimics parallel coordinates
params = {
    'kind': 'line',
    'ax': ax,
    'color': 'r',
    'marker': 'h',
    'markeredgewidth': 1,
    'markersize': 5,
    'linewidth': 0.8
}
_ = S.plot(**params))

# some additional plotting configurations/manipulations
_ = ax.get_legend().remove()
_ = ax.xaxis.set_major_locator(plt.MaxNLocator(S.shape[0]))
_ = ax.get_yaxis().set_ticks([])
_ = ax.set_title('Grand Tour')

Later, we will look at how to use zava in a Jupyter notebook.

Animations

Below is a full example of how to use zava to create and save the animation. You should have ffmpeg installed and in your path to get this example to work since matplotlib relies on ffmpeg to create the video.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import animation

from zava.core import GrandTour
from zava.plot import SinglePlotter, MultiPlotter

# 1. create or get your data
columns = ['v0', 'v1', 'v2', 'v3']

M1 = np.array([
    [1, 1, 1, 1],
    [2, 2, 2, 1],
    [3, 3, 3, 3]
])
M2 = np.array([
    [1, 2, 3, 4],
    [2, 2, 1, 1],
    [1, 1, 3, 3]
])

M1 = pd.DataFrame(M1, columns=columns)
M2 = pd.DataFrame(M2, columns=columns)

# 2. create your GrandTour instances
c = 0.0
d = 100.0

gt1 = GrandTour(M1, c, d)
gt2 = GrandTour(M2, c, d)

# 3. create your plotters for each GrandTour instance
# Note how the first dataset will have red lines
# and the second dataset will have green lines.
# The parameters passed in go into drawing the lines
# and will help create powerful parallel coordinate with
# grand tour visuals with hue and saturation effects.
sp1 = SinglePlotter(gt1, params={'color': 'r'})
sp2 = SinglePlotter(gt2, params={'color': 'g'})

# 4. setup plotting and animation

# don't forget to disable warnings and set the style
plt.rcParams.update({'figure.max_open_warning': 0})
plt.style.use('ggplot')

# Note how we use MultiPlotter to plot both datasets?
fig, ax = plt.subplots(figsize=(15, 3))
mp = MultiPlotter([sp1, sp2], ax=ax)

# matplotlib.animation will help us create animations
params = {
    'fig': fig,
    'func': mp,
    'frames': np.linspace(0, 360, 360),
    'interval': 20,
    'init_func': mp.init
}
anim = animation.FuncAnimation(**params)

plt.close(fig)

# 5. save the animation
params = {
    'filename': 'test.mov',
    'dpi': 500,
    'progress_callback': lambda i, n: print(f'Saving frame {i} of {n}'),
    'metadata': {
        'title': 'Parallel Coordinates with Grand Tour',
        'artist': 'Clint Eastwood',
        'genre': 'Action',
        'subject': 'Exploratory data visualization',
        'copyright': '2020',
        'comment': 'One-Off Coder'
    }
}
anim.save(**params)

Your animation video will look like the following. If you have great tips on how to customize animations with matplotlib, please let us know!

There is a lot of known issues with ffmpeg and matplotlib. You could also try saving the visualization as an animated gif.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# set up your MultiPlotter as before
plt.rcParams.update({'figure.max_open_warning': 0})
plt.style.use('ggplot')

# note how we do not pass in an axis?
mp = MultiPlotter([sp1, sp2], ax=None)

# save
# you have to play around with the duration parameter to get smoothness
mp.save_gif('test.gif', duration=0.0001, start=0, stop=180)
test grand tour and parallel coordinate animation.

Considerations

It might not be a good idea to plot ALL your data due to computation and memory limitations. You might want to sample your data instead and plot that subset. Even with the simple, made-up data in this running example, creating a whole animation was intensive (laptop fans start to crank up).