banner



How To Create Interactive Data Visualizations

Interactive Visualizations In Jupyter Notebook

5agado

This entry is a non-exhaustive introduction on how to create interactive content directly from your Jupyter notebook. Content mostly refers to data visualization artifacts, but we'll see that we can easily expand beyond the usual plots and graphs, providing worthy interactive bits for all kind of scenarios, from data-exploration to animations.

I am going to start with a brief introduction to D ata Visualization and better define the scope and meaning of interactiveness as intended in this article.
I will then provide a quick overview of the tools involved (Plotly and ipywidgets) plus some generic suggestions around the Jupyter ecosystem.
Finally, I will showcase some concrete examples for all I have blabbered about, mostly referring to personal projects of mine, and improvements I obtained when relying on these interactive bits. This final part is exactly to demonstrate the capabilities of such tools on an already more than impressive framework like Jupyter. It is all about pushing you to try out for yourself on your projects and spread the word.

Intro

Data Visualization is one of the core skill required to be a good data scientist — or any other role involving data for that matter. It is both about allowing you (or other people in the team) to better understand the nature of a dataset, as well as the ability to convey the proper message to an external audience (technical and non-technical).

One of the most common suggested programming libraries when searching for "data visualization" is D3.js, but some affirm that it is worth diving into such tool the more you are searching for a very personalized/customized approach and consequent results. If your goal is instead a more immediate and "standard" visualization, the already available packages of your language of choice might be a better way to go.

For the Python ecosystem, one starts inevitable with the foundation block of Matplotlib, for then possibly expanding to one of the higher level alternative (e.g. Seaborn, Bokeh). An additional increasingly-guaranteed choice — especially for data sciency figures — is the use of Jupyter notebook.
I believe one of the main powers and reasons behind the uprising of Jupyter is how well it packages different medium in one simple solution: you code, you write and you visualize. It not only ends up to be a smooth and enjoyable process for you while working, but it also greatly simplifies the sharing of such work, both for educational and collaborative purposes.

Personally, combined with the support of Jupyter, I found that the Matplotlib+Seaborn combination works great for my visualization needs, even better with the additional embedded plotting capabilities of Pandas. When in need of animation functionalities, the simple animation framework present in Matplotlib from version 1.1 provides a great compromise between usability and results.

But at one point one feels the need of something more. Is not moving on entirely to new tools, just relying on them when necessity calls. In this regard I personally see two types of interactiveness:

  • interactive plots: real-time info about the specific points or areas currently explored, plus possibility of highlighting/hiding specific content.
  • widgets for content interaction: this extends beyond graphical plots, to additional type of media and content, and generally requires one or more complex UI elements for collecting external inputs.

For the former, I for now find great satisfaction in the use of Plotly. For the latter, I specifically searched for solutions that can easily be embedded in Jupyter, such that I can practically turn any of my notebooks into interactive dashboards. For this purpose we are going to explore ipywidgets.

The Tools

Before going into the details of the previously mentioned tools, here some personal general suggestions around the Python and Jupyter ecosystem which I always found worth sharing.

First suggestion: use Anaconda (a Python distribution and much more).
Second suggestion: use Virtual Environments (equivalent for Anaconda). Is about managing multiple isolated Python environments.
Third suggestion: see nb_conda_kernels to manage multiple environments/kernels from your Jupyter notebook.
Forth suggestion: customize the **** out of your Jupyter with Jupyter Extensions.

Plotly

Plotly makes it really easy to create and share interactive plots.
One of the great aspects of this library is that it provides seamless web hosting capabilities for your graphs. As often is the case, you can get a free account, but at the price that everything you host will be public. But contrary to what some say, if needed you can definitely use Plotly entirely offline, rendering interactive plots directly inside a notebook plus capabilities to export them to (still interactive) HTML files.

If this was not enough, two more features make the package even more inviting: plot_mpl and cufflinks.
The former method performs a direct transformation of a pure matplotlib figure to an interactive Plotly one (an operation that is not perfect yet, but is quickly getting better and better).

Cufflinks is instead an automatic binding between Pandas dataframes and Plotly. This binding guarantees that again, with just one single call, you can get a high quality data visualization of your dataframe content, with gems like automatic parsing and formatting of datetime values as well as inferred hues and labels management for your columns.

Getting started with Plotly is (in the average case) as easy as running

          pip install plotly
pip install cufflinks

You will have then to opt for online or offline plotting for each of your notebooks. While keeping in mind that .iplot() is the magic syntax to display the content inside Jupyter.

ipywidgets

ipywidgets is about easy-to-build interactive GUIs in your notebook. Again a tool that delivers a great compromise between flexibility and usability.

After the installation steps it is really just about getting creative and understanding how to make the best use of the new capabilities in your projects. The idea is having widgets as interactive interfaces to your content, these can be for example sliders, checkboxes, buttons, textfields, etc.

The most impressive bit when you get started is the interact function, which autogenerates your widgets (with possible multiple components) based on passed parameters. For example passing a parameter with boolean value automatically generates a corresponding checkbox; a list of values is instead transformed into a dropdown widget.
All mechanisms are pretty straightforward, and greatly explained in the official documentation, so let's move to the showcase section.

Showcase

Now, for the purpose of demonstrating what explained until now, plus a bit of shameless self advertising, here a list of examples from personal project.

The fact of encompassing different more complex projects is actually why I didn't simply write a notebook for this entry, and opted instead for an old-fashioned articled. However you can find all projects in my Github repo, some as simple as self-contained and reproducible notebooks

Example 1: Fitbit Dashboard

This example might especially be of interest for some of my fellow Quantified Selfers, or in general for the ones who want to easily dashboard personal data for the sake of exploration and insight gaining, without too much refined work (building a complete dashboard app) but still with a guarantee of flexibility to own — possibly fickle— needs (yes, that implies coding).

My Fitbit sleep data is one example where I have different statistics to visualize (e.g. sleep efficiency, sleep values count, first minute asleep) at possible different levels of granularity (e.g. day, weekday, year). One option is to use a Seaborn factorplot and visualize a subset of target stats in one go.

Static Seaborn Factorplot for summary stats

This works for simple cases, but when the content becomes more chaotic the plots simply lose their effectiveness. For example moving to a visualization of weekday stats by month we can end up with something like the following

Static weekday stat by month

Already too much info there, and if you not good with colors as I am, not easy to get around the data. If this was a printed plot, it will surely not comply with good communication and plots guidelines. Moving to Plotly provides a solution for this situation, and one can proceed in three ways:

  • passing directly your matplotlib figure to the iplot_mpl method
  • creating your plot from scratch using Plotly syntax
  • use cufflinks and plot directly from your Pandas dataframe

In this case the last option is the most immediate and precise one, cause from the original data format is simply a matter of pivoting weekdays as rows and months and columns for the target stat, and here there is, my weekday sleep efficiency by month.

Combining all together, I end up with what I believe is a pretty decent dashboard for my sleep data.

Demo of Jupyer as dashboard for Fitbit sleep data

Example 2: Nutritional Database

You might want to have a quick interface to some structured content. I did this for example while exploring nutritional data. In particular, I relayed on the USDA National Nutrient Database. It is a pretty rich and relatively complex dataset, but can be easily sorted out with Python and Pandas.

One might argue that this simply falls in the database scope, and a proper SQL tool would prove much more adapt for the case. This might be true, but I nevertheless wanted to showcase how easily this can be achieved in Jupyter via ipywidgets, and how this can end up to be for some a better temporary choice for the task than a separate database system, especially during data exploration.

Here you can see that I again simply rely on the interact function, passing my original Python function (parameterized), plus the parameters that will automatically be mapped to widgets:

  • food: empty list, which is converted to a text box
  • nutrient: list of unique nutrients present in the db, which is converted to a dropdown widget

Example 3: Animations

As initially mentioned interactiveness comes in handy for all kind of scenarios once you have good tools at hand, and animation is sure one of such cases.

A simple intslider (automatically generated by interact when passing a (min,max,step) tuple), can greatly help in exploring animations to different resolutions or complexity, as shown in this simulation of Conway's Game of Life.

Conway's Game of Life simulation

Additionally you can build custom interfaces to quickly explore your data and related analytical results, like in this example where I visualize an internal layer output for a CNN trained on the cats and dogs dataset.

CNN Layer Output Heatmap Demo

Many frameworks and tools for visualization are available out there, especially for machine learning tasks, but sometimes a quick and dirty solution like the one above can save a lot of time, while providing all the info and functionalities actually needed.

Conclusions

All showcases have been obtained using the very basic functionalities offered by the listed tools. For example consider that all the ipywidgets related examples simply used interact and type inference for the generated widgets. Much more can be obtained with custom widgets definition and combination, as nicely explained in the official documentation.

With this entry, I wanted exactly to spread the word, and quickly show the great value in term of simplicity and customization of these tools. At the same time I am also interest in general feedback regarding data visualization tools, especially regarding frameworks that could encompass more borderline areas like 3D modeling and animation.

As a final freebie wanted to highly recommend this repository for alternative Jupyter themes and additional visual customization.

How To Create Interactive Data Visualizations

Source: https://towardsdatascience.com/interactive-visualizations-in-jupyter-notebook-3be02ab2b8cd

Posted by: ruddmyris1978.blogspot.com

0 Response to "How To Create Interactive Data Visualizations"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel