Show all posts
11 months ago

Viz as a Query Language

Bad programmers ignore details. Bad designers get lost in details (Nate Kirby)

I believe the modern “big data” world is moving in the direction of faster and more visual interaction with data. A good chart tool must be designed for more than just visualization – it should be designed for communicating information.

But how do we achieve this?

When 2 + 2 > 4

The visualization process consists of two main parts:

Minimizing the gap between these two parts can potentially bring a powerful expressiveness – combining visualization grammar and data query language results in the curious effect of visualization seamlessly reflecting queries and vice versa.

Let’s consider this idea with some examples.

Imagine a conceptual language which combines possibilities for querying and visualizing data. Actually we can use the Grammar of Graphics notation, but for simplicity, let’s take a familiar SQL-like syntax and add a self describing Viz grammar to it. Here is an example:

SELECT a, b FROM data WHERE a > 0 AND b > 0
DRAW line
WITH a AS x, b AS y

NOTE: this language does not exist – it’s purpose to demonstrate a principles. Concrete implementation can use as a query language SQL, REST or custom query mechanism. For samples in this article I use alaSQL in conjunction with Taucharts. This is where the declarative visualization interface of the library unfolds its power.

First steps

Let’s query and plot something simple to start with.

Show correlation between imdb and tomato ratings for Oscar nominees.

SELECT imdbRating, tomatoRating
FROM oscar_nominees
DRAW scatterplot
WITH imdbRating AS x, tomatoRating AS y
SHOW trends

NOTE: the chart is described using metadata only. The size of data or specific values can vary while the chart’s specification remains intact. We can think of it as a visualization pattern for the specific schema and reasoning behind data at a high level.

Aggregate

Plain data rarely gives insights. Let’s investigate it in-depth.

Show distribution of imdbRating values.

SELECT imdbRating, COUNT(imdbID) as FilmsCount
FROM oscar_nominees
GROUP BY imdbRating
DRAW bar
WITH imdbRating AS x, FilmsCount AS y

Transitioning to a new point of view on the data only required us changing a few lines of code. It is a main benefit of using metadata instead of a particular series of data.

Taucharts supports facet plots so several data segments can be presented on a single chart.

Show distribution of imdbRating values splitted by “is Oscar winner” feature separately for each Genre.

SELECT imdbRating, Genre, isWinner, COUNT(imdbID) as FilmsCount
FROM oscar_nominees
WHERE Genre = 'Drama' OR Genre = 'Biography' OR Genre = 'Comedy'
GROUP BY imdbRating, Genre, isWinner
DRAW bar
WITH imdbRating AS x,
Genre, FilmsCount AS y,
isWinner AS color
SHOW Legend

Reveal

Let’s recall that a good viz tool should be designed for communicating information, rather than just visualization. Let’s allow for a jump from a visual element to the subset of data that the element is built upon.

Show subset of data that contributes to a separate distribution bar.

Given that the bar we are looking at is a following tuple:

imdbRating = 8
count = 39

NOTE: this data is available from the chart’s API e.g. on a “click” event.

Let’s take this tuple and the original query to reveal an aggregation. Use the properties enumerated in the original GROUP BY statement in the WHERE clause:

SELECT imdbID, Title, Genre
FROM oscar_nominees
WHERE imdbRating = 8

The result of this query is a subset of rows that contributed to the specific bar value.

The main point here – a possibility to seamlessly switch between big picture and particular data the picture is build on.

As well you can transform a revealed data to a some visual summary using queries chain.

Show distribution of the revealed data by genre.

(SELECT imdbID, Title, Genre FROM oscar_nominees WHERE imdbRating = 8) AS revealed

SELECT Genre, COUNT(imdbID) AS FilmsCount
FROM revealed
GROUP BY Genre
ORDER BY FilmsCount
DRAW horizontal-bar
WITH Genre AS y, FilmsCount AS x

Click a bar to reveal background data and see a summary by genre.


You might also want to check out our blogpost on “How to reveal aggregations with Taucharts”

Dashboards. Gathering all together.

As a rule, one chart is not enough to describe data. There are several patterns and several points of view which reflect the nature of the data and its underlying model. Dashboards are a quite popular solution in this case. They can be a kind of visual model with some input parameters. We can explore the model by manipulating the parameters.

Show dashboard that depends on “isWinner” and “Year” features.

var IS_WINNER;
var START;
var END;

SELECT imdbRating, imdbVotes, Year, Title
FROM oscar_nominees
WHERE isWinner=%IS_WINNER% AND Year > %START% AND Year < %END%
DRAW scatterplot
WITH imdbRating AS x,
imdbVotes AS y,
Year AS color
...

SELECT imdbRating, imdbVotes, Genre, Title
FROM oscar_nominees
WHERE isWinner=%IS_WINNER% AND Year > %START% AND Year < %END%
DRAW scatterplot
WITH Genre AS X, imdbRating AS y
SHOW BoxWhiskers
...

Summary

Combining visualization and query languages in a single framework provides powerful capabilities for reporting, analysis, and communication of data.

Taking this idea further leads us to high-level interactive visualization tools (e.g. [1], [2], [3]). They operate with visualization on the highest level, allowing users to interact with data in a natural way, transforming user actions seamlessly into a visual representation of the data.

vizydrop-screen

Implementing such an approach in reporting / analytics systems can help us to create a more deep and engageable user experience in comparison to a set of read-only / pre-defined reports.

I see it as a golden ratio that helps us to not get lost in the details, but stops us from ignoring them at the same time.

  • boedah

    Hi Vladimir!

    Thanks for the post. Unfortunately, the examples do not work anymore.
    Probably because of new taucharts beta versions?
    I tried various versions on jsfiddle but could not get it to work.
    Any suggestions?

    • Vladimir Petriko

      Hi @boedah:disqus !
      Huge thanks for the notice! I’ve fixed examples. Hope you find the lib useful.

      The problem is taucharts doesn’t compatible with D3.v4 yet, while //cdn.jsdelivr.net/d3js/latest/d3.min.js points to the latest D3.v4 version now.

      • boedah

        Thanks!

  • PatrickMartin

    I love your post. I’ve had similar thoughts around a query/visualization language though not as well articulated as you have here. I believe that supporting technology is beginning to align to make this task much easier. ANTLR now has a Javascript target and there are many mature visualization packages available.

    SQL is a great base for extracting information from structured tabular data but is that enough? What about nested data like XML/JSON and unstructured data?

    • Vladimir Petriko

      Thank you for kind words and sharing you thoughts here as well.
      In theory any unstructured data can be expressed (or transformed) to relational model. As well we have clear and well-known theory behind it. This allows SQL to be declarative and well-composed by nature.

      Another interesting aspect of that question is that we don’t have good general-purpose models for visualizing unstructured data or deeply nested data (maps are too specific). So the clear approach at the moment is to plain a data to a “simple” relational model and then visualize it.