Before I keep going, I wanted to reflect on some of the lessons that I’ve already learned launching and working on these first two case studies (How India Eats and The Bitcoin Economy), in an effort to improve and refine my approach for future case studies.
Keep it Simple
After writing 7300 words on this site o far, the single most important lesson I’ve learned is to keep it simple. And “it” refers to everything. Keep the design, the colors, the fonts, the size of items as simple as possible to convey the information that needs to be conveyed. To be slightly hyperbolic, there are literally unlimited options on how can display that information. The issue I (and likely most others) struggle with is that achieving simple is not that easy.
Simple is Hard
There’s a story I heard once about Pablo Picasso. He’s sitting in a bar, enjoying a glass of wine. A man recognizes him, and asks Picasso if he can draw him something. Picasso obliges, grabs a napkin, sketches this wonderfully elegant drawing of a bird in about in a minute. When he was done, the man is ecstatic at the work! When Picasso hands it to him, he says, “That’ll be 1000 francs.” To which the man says, “but it only took you a minute to do that.” Picasso replied, “No, it took me a lifetime to be able to do that.”
When I started this, I definitely did not think it would be an easy task to improve visualizations. However, I did not realize it would be this difficult. In Edward Tufte’s The Visual Display of Quantitative Information, he easily removed unnecessary feature after unnecessary feature. Similar to Picasso’s ability to capture the elegance of a bird was built up from a lifetime of painting and drawing, Tufte’s ability to simplify visualizations comes from years of experience and practice. The only way I’m going to get better at visualizations is by continuing to make visualizations. This is likely true for most budding data visualizers.
No one tries to make bad visualizations
One of my “aha!” moments is that many of these visualizations that get circulated or critiqued as “bad visualizations” get there for a variety of different reasons. The main one is that these visualizations get published along text, and the visualization is there to draw readers to the article or simplify the article into a pretty graphic. It should be the most exciting thing on the page! Whereas as a data purist we might think that a bar chart of scatter plot is the most effective way to convey the information, the editor of the page wants something that will draw your eye to it, and a line chart will just not do.
So, data visualizers have to make trade-offs. They add extra color, and extra graphics because those items make it pop-off the screen. In specialized publications, data is visualized for the specialists that will want to glean every piece of relative information, so those extra design features are not needed. For mass media, if the visualization is not attractive, it will not even be read. Understanding who will consume your visualization is an important factor to considering your overall aesthetic.
Design is relative
Since aesthetic will vary depending on who we are trying to communicate to, the design becomes relative. For what I think might be a good design, might not be for others. When I published Case Study: The Bitcoin Economy, I received some on-point critiques from Professor Y.Y. Ahn regarding the decisions I made in my redo.
- The use of logarithmic scale was misleading, it made Bitcoin look larger. He suggested using an inset to show a zoomed-in region.
- Getting rid of the graphics that the original one had for each data point made it less interesting.
- The font size on the labels was too small.
I agree with all of these comments. One of the struggles with that visualization is that it spanned over five orders of magnitude. While log-scale is the best way to show that from a pure data point-of-view, it’s not the most intuitive thing. An audience of scientists and engineers would have no problem understanding it. A mass media audience, perhaps not. His suggestion of using it on a linear scale, and then an inset to show the smaller regions is something people are more familiar with, as we see that more often than we see log-scale. Doing that inset does make the graphic a little more complicated, but it probably increases the likelihood it will be read and understood, which is our actual main goal in creating any visualization.
Labeling the data is another struggle. The labels were longer than I liked. I didn’t like the graphics representing them because they all needed explanation, so you might as well not use them. Professor Ahn’s critique gets us back to the main point of a visualization, is for people to understand it. So, despite the fact that the graphics need to be explained, they are likely to make the overall visualization more interesting, which will make more people read it, which leads to a higher overall understanding of the visualization.
It’s all about understanding
Ultimately, as creators of data visualizations, we should have one goal in mind–how do we create a graphic that is most likely to be understood. We increase the likelihood they are understood by making them simple and interesting. Getting to that point takes time and practice.
As I publish the next group of case studies, I will keep this main point in mind. In order to refine my process for evaluating these visualizations, I have also added to my rubric, a category of “Is it interesting?”
Thank you for your time, and please leave any comments below. I would love to hear from you.
Featured image “Refine” used courtesy of Max Pixel under a free license.