An intro to data visualization
This is the first in an occasional series on data visualization.
Do you remember learning how to communicate effectively in writing? First you learned the alphabet, next words, then their categories (noun, adjective, etc.) and sentence structure followed. Once you graduated to essays, you learned to develop a thesis, separate thoughts into paragraphs, form conclusions… You get the picture. By contrast, what was your training in visual communication?
In business we rely on visuals to help to explain our ideas. Sometimes we use a compelling image with a few facts to generate interest around a topic, which we would call an “infographic”. More often, perhaps, we use visuals to explain data, something we call “data visualizations”.
Excel has enabled us to generate all kinds of pie, line, bar and column charts. We can select a colour palette, use 3D representations and apply drop shadows for a little “glam”. Next, we can pop our visuals into PowerPoint and string them together to create presentations supported by lively animations. Our audiences may leave with an appreciation of the quantity of data now at our fingertips, but have we deepened their understanding and enabled them to make informed decisions?
If we evaluate our data visualizations and presentation using our knowledge of best writing practices, would we find that we have an over abundance of adjectives, unfinished sentences, or poorly argued thoughts? Visual communication, like any form of communication, is about making ideas clear and easy to grasp. Data visualizations also strive to communicate a vast amount of data in the most digestible form. To quote the book Data Visualization: Representing Informational Relationships by Steele and Illinsky, we can exploit the “capabilities and bandwidth of the visual system to move a huge amount of information into the brain very quickly”.
Visual communication also comes with its own set of rules and best practices. We can leverage colour and cultural meanings to enhance understanding and we can layer content by using visual encodings to represent multiple data dimensions.
But, before we get to design, we must first consider the context of the data visualization. Who is our audience, why do they require this information and how will it be used? At Systemscope we look to use data to form explanations over explorations. To do so, we follow a process for creating effective data visualizations.
Here’s a quick introduction to our steps for creating a data visualization.
Step 1: Define the goal of your data visualization and audience
The purpose of the data visualization is to communicate a large quantity of data quickly, to help to identify patterns or relationships and to inspire new questions and/or identify problems. With this in mind, who is your audience and what goal are you helping them to achieve?
Step 2: Assess the data
Become familiar with your data. What are the types of data you plan to use? Would you classify the data as categorical, linear, multi-variable and how many dimensions exist?
Step 3: Prioritize your data
From here we consider the relationships that exist within the data. What is the most important dimension and what is secondary? This allows us to decide what belongs on the axis, for example, and what can be encoded using other design devices.
Step 4: Select the type of chart
Charts are not created equal. While scatter diagrams depict relationships, bar charts allow for comparison between factors and pie diagrams enable comparisons of factors against the whole. Thoughtful data visualizations consider the user, the purpose and select accordingly.
For example, two approaches to presenting the surface area of Canada’s province and territories are provided below.
The table may be the best approach, if the viewer has a need to know precise values. The table format does not leverage visual perception to communicate the data. Instead, the viewer must process the variables in their head to understand the relative size differences. The table is also flawed because it lacks a heading and presents extraneous data, like the flag images, that distract from the core purpose of the table.
By contrast, the pie chart leverages visual perception. At a glance, the viewer can quickly see the relative size of the provinces and territories compared with the whole of Canada. From a design perspective, the challenge with the pie chart is clearly labeling the wedges. The small slices demand abbreviated labels, which in turn requires more processing on the part of the viewer. The pie chart is also a poor choice if you want to compare provinces and territories to one another. Try comparing Nova Scotia and PEI. To do this, your brain needs to process the surface area of each wedge. Now imagine performing the same task with a bar chart. Your eyes could quickly see the difference by comparing the length of the bars.
Step 5: Plan the visual encodings and layout
Next, we encode the data that has not been addressed by the choice of chart. We aren’t seeking to encode all the data available, only that which adds to the explanation. In a scatter diagram we may layer information using colour, size, texture and borders. The balance we are trying to strike is conveying a depth of information without overwhelming the user. We add plain language labels and ensure the legend is clear.
Producing an effective data visualization relies on the designer’s experience and judgement, but we also encourage user testing to be sure that the meaning is being conveyed as intended. It is also important to consider how multiple data visualizations can work together to tell a complete story and support decision-making, but that is a conversation for an upcoming blog!