Case Study: U.S. Tourism

Tourism in the United States is a $1.5 trillion industry.  I came across this infographic from The Traveler Zone in 2009 that attempted to visualize  where U.S. tourism was coming from.

What is going on here?  Source:

The top half of the graphic hurts my head.  There appears to be some scale used for the larger countries, but the smaller countries all seem to be about the same scale.  I’m not sure exactly what the design is.  Actually, it reminds me of Candyland.

Yep, that;’s probably what they were going for. Source: FanPop

I guess that means the U.S. is the castle at the end?  The top half of the infographic needs a major re-do.

The bottom half of the graphic is not that bad, with one exception.  The first bar in the bar is truncated.


Travel from Canada is over 9 million, but they stopped it at 7 million. This probably happened because they ran out of space, and/or didn’t want to make the rest of the bars look too short.  Otherwise, it’s not that bad of a bar graph.  I wouldn’t classify it as troubled, just slightly lost.  For the purposes of this case study, I’m going to focus on just the top-half of the infographic, which is in dire need of help.

Let’s give it some TLC

Alright, on to the analysis and rehabilitation.


The data visualization is an infographic, designed to draw attention to the website.  The visualization is about the Top 20 Tourist generating countries traveling to the United States of America.  The dataset is twenty countries, with one value for each.  The value represents the number of people from that country that visited the United States.  The data visualized with two different visualizations.  One is a path reminiscent of Candyland that has each of the twenty countries, along with the number of visitors from that country.  Around the path, the data set is also presented in text with the name of the country, the visitors from that country, and the country’s flag.

Explanation of data

The data set is generated from visitor travel data made publicaly available by the U.S. Department of Commerce.  The data is based on the immigration form that everyone completes entering the country.  The original data set is a credible source.  It is one dimensional data.  There is a country and the associated number of visitors.  Fairly straight forward.

There is an issue with how the data was then used in order to create the visualization.  The data visualization does not indicate the time-frame.  Is this a month’s total? A yearly total?  I bet you never would have guessed it is the total from January 2009 through September 2009.  That is an odd-time frame, so if you are going to use that, then needs to be indicated.

Recommendation 1: Indicate the time-frame used, or if possible, use a more intuitive timeframe.

Explanation of visualization techniques

I love board games.  Candyland was a great one when I was a kid.  One thing to note on the candyland board is that every square is more or less the same size.  On this visualization, the squares (err, rectangles) are not all sized the same.

Scale, we don’t no stinking scale!

The spaces for Canada and Mexico are larger than most, and then the United Kingdom on the turn appears to be the largest of them all.    Germany, Japan, and France all appear to be the same size, despite very different values.  This just leaves to confusion as the large rectangles indicate it’s a scale, while the other ones are a more consistent size.

Recommendation 2: Use a scale that makes sense for representing the number of arrivals for each country.

It fails to mention what the label is in the box.  In the title, it does say “Tourist Generating,” and on the text next to the flags it says arrivals.   In the part that readers are most likely to read (the Candyland path) it does not have a label.  Perhaps it’s my reading, I think that could be confused because we think of things like income generating not tourist generating.  Could someone think those are revenue generated numbers, and that’s the metric of interest? Possibly.

Recommendation 3: If the title is not clear, properly label the values.

Lastly, around the path, the graphic created a rather large data table of names around it.  If you’ve already put the names and visitor values on the candyland path, why do you need the table? The flags make for nice eye candy, but do not do anything to actually enhance the visualization.

Recommendation 4: State the data once, and one time only.

Another thing to note with the unnecessary flags is that the creator did not pay close attention to which flags to use.  First, the flag for Japan was:

Flag of the Japan Maritime Self-Defense Force, not of the Country of Japan

This appears to be the flag of the Japan Maritime Self-Defense Force:

And, not the flag of the country of Japan:

Then, when I looked at the data visualization, it portrayed the data as being from only  Hong Kong:

Data was not from only Hong Kong

As it turns out, upon reviewing the actual data set, it was for Hong Kong and the People’s Republic of China (PRC).  Since Hong Kong has been part of China since 1999, the PRC flag would be more appropriate:

Overall, those are unforced errors that the visualization creator could have avoided.

Effectiveness of the visualization

If the goal of the visualization is let the reader know that Canada and Mexico are the countries that provide the most visitors to the United States, then yes, it does achieve that.  Otherwise, it is not a terribly effective visualization.  The Candyland path is actually a bit difficult to follow along, and the reader loses interests as you move along it.  The data table is interesting in that it’s legible, but it’s unnecessary.  Overall, this visualization needs some simplification

Recommendation 5:  Simplify the overall design.

While this is our objective for most of our work, this one needs particular special emphasis here.  It’s a non-intuitive piece of information that needs to become easily digestible for the reader.

Integrity of the visualization

The data is definitely distorted as there is not a consistent scale across the Candyland path.  This could lead people to think that the level of tourism is more or less the same for the top twenty countries except for those couple at the top.  This is probably the highest priority for a revised version of this data visualization.  Just about any properly applied data visualization technique should be able to correct for this distortion.


The design is a bit engaging because the Candyland path makes the read wonder what is going on here.  Once it has the interest of the reader, it loses it rather quickly with the color   The dark background does make it difficult to read or comprehend all of the numbers that are presented in the table.  Further, the bright colors on the dark background are a little harsh on the eyes.  I think a softer palette on light background would do much better.

Recommendation 6: Use a lighter background and softer palette of colors.


This data visualization is definitely meant for mass media, as it is not focused towards any particular part of the tourism industry.  Tourism is definitely an interesting thing, as most people have traveled somewhere in their lives.   It is in the “fun facts about travel” category.  It is the kind of thing that is meant to drive traffic to the The Traveler’s Zone.  Overall the graphic will draw interest, but will lose it due to its confusing nature.

On the rehabilitated graphic, I will definitely need to focus on keeping the visualization interesting and playful for the mass media.

U.S. Tourism – Fixed

Let’s review all of the recommendations:

  1. Indicate the time-frame used, or if possible, use a more intuitive timeframe.
  2. Use a scale that makes sense for representing the number of arrivals for each country.
  3. If the title is not clear, properly label the values.
  4. State the data once, and one time only.
  5. Simplify the overall design.
  6. Use a lighter background and softer palette of colors.

One of the first decisions I made, is that I would use the data for all of 2009 instead of the first nine months of 2009.  There’s no reason to truncate that data after nine months.  It does change our data set a bit:

Old Data – Jan – Sep 2009 New Data – All of 2009
CANADA 14,043,658 CANADA 17,958,000
MEXICO 4,295,124 MEXICO 6,023,225
JAPAN 2,169,716 JAPAN 2,918,268
GERMANY 1,263,344 GERMANY 1,686,825
FRANCE 930,265 FRANCE 1,204,490
BRAZIL 613,347 BRAZIL 892,611
KOREA, SOUTH 560,405 ITALY 753,310
ITALY 558,594 KOREA, SOUTH 743,846
PRC & HONG KONG 488,484 PRC & HONG KONG 640,840
SPAIN 451,530 SPAIN 596,766
INDIA 447,079 INDIA 549,474
IRELAND 296,771 COLOMBIA 424,526
COLOMBIA 288,439 IRELAND 411,203
ISRAEL 237,818 SWEDEN 324,417
Total Top 20 31,372,928 Total Top 20 41,517,674
All Worldwide  Arrivals 35,990,071 All Worldwide Arrivals 47,737,409

Overall, it does not make that much of a difference in terms of rank or relative value.  A handful of countries swapped places, and Israel was replaced by Sweden as the 20th country.  The most important part is that we now have a consistent, logical data set.

Next, I played around the data to get a feel for the data. So, I plotted a trusty bar chart, because bar charts don’t lie:

Sketch 1 – Bar chart goodness

The first thing strikes me in the bar chart in Sketch 1 is that Canada is by far in first, followed by Mexico and the United Kingdon a distant second and third.  This scale will make it difficult to show both the smaller values in detail and the larger values.

Next, I thought a waffle chart would work, with a total of 1000 squares.  This attempt failed miserably.  It was so bad, I actually stopped in the middle of making it, because I could see this would get me absolutely no where.

Sketch 2 – This was just headed for disaster, so I stopped.

Aside from a terrible choice in colors by me, I did not see this as anything that would have any clarity, or something that would be interesting.  I got confused making it!  Time to rethink this visualization.

Then, I thought back to some of my favorite visualizations that I’ve seen.  I began to wonder if I could do something like the Avengers network graph.

Awesome looking Avengers network graph.

My thought would be for me to group the countries by region, and then each region would flow into the United States.  I thought I might be able to modify the Avenger code or use it at least to guide me through what I needed to do on my own code.  When I looked at the underlying code, I realized this was too many steps beyond my current skill.  It uses d3, along with bootstrap, and some custom javascript.  This is something that I will aspire to be able to create in the future, but right now, it’s just a dream.

The reason I pointed that failed attempt out is because it gave me the idea to break the data into geographic regions.  First, I sketched it out, with old fashioned pen and paper.

Sketch 2 – Sometimes, you need to do it by hand.

This gave me a feel for what it might look like.  It’s not pretty, but it was enough for me to commit to using this approach.  Next, using Gephi, I created a network diagram of the top 20, and I added supplemental data points to account for the rest of the world.

Sketch 3 – Country of Origin for Visitors to the United States in 2009

I like this.  The countries are placed in familiar locations based on a Mercator map, but there is no actual map.  The thickness of the line indicates the relative amount of tourists.   Canada, which is by Number 1, has a thick like, while “Rest of Oceania” is fairly faint.   It needs a little bit more description (like a title and legend), but I think this is heading in  the right direction.

I prefer to keep my text to a minimum, and let legends and titles give a lot of the context.  With that in mind, here is the final version of the network graph.

Sketch 4 – Final – Titles and a Legend Help

The differences are subtle, namely a title that gives the worldwide number and makes it clear we are talking about where people came from.  I added a legend by indicating the lines that indicated the most and the fewest visitors.  This creates a mental bound that the reader can use to judge the relative flow of people.  Ultimately, I decided this was best because the reader does not care about the exact number, but the relative scale.  At a glance, the reader can see that the Canada’s line is the thickest, with Mexico, the UK, and Japan as the next big contributors.

I knew it would be a challenge to use a visualization technique that showed the great range of values that the data set had.  In my handmade sketch, I had the area of the node represent the volume.  On the final graph in Gephi, one of the decisions I had to make was to determine whether the nodes would be scaled to the number of visitors from that country, or if that information would encoded into the edges connecting the nodes.  Ultimately, I choose to go with the edges, because proportional nodes would have resulted in barely visible nodes for the low value countries.


Now, let’s see how the final image did against the recommendations:

  1. Indicate the time-frame used, or if possible, use a more intuitive timeframe.
    • Achieved! I used the data for all of 2009, and clearly indicated it in the title.  It should not be an issue for anyone to understand the timeframe.
  2. Use a scale that makes sense for representing the number of arrivals for each country.
    • Achieved! (Barely!) This was the hardest part.  There were three orders of magnitude that needed to be properly visualized in this one-dimensional data set.  I think the thick edge lines work, but only barely.  If this was an interactive graphic, highlighting the path would definitely be an improvement to this.
  3. If the title is not clear, properly label the values.
    • Achieved! The title makes it clear we are talking where the people come from.  Further, with proportional edge lines, I only really needed to label the minimum and maximum cases.
  4. State the data once, and one time only.
    • Achieved! I got rid of the flags, text, and values that reiterated information that people were likely not that concerned with.
  5. Simplify the overall design.
    • Achieved! There is only one visualization of the data, instead of two, and there is a minimal amount of text.  The title gives all of the context that is needed to understand what is going on in the data visualization.
  6. Use a lighter background and softer palette of colors.
    • Achieved! The white background was easy.  The colors are not actually that much different than the ones used in the original.  Because of the white background, they are much easier on the eye.

The thing I spent a lot of time on, was trying to make something interesting and playful.  I like to think this data visualization is interesting because it seems familiar to the reader (a map they have seen thousands of time), has alluring colors, and appears to be active.  It’s an abstract map, if you will.

Another interesting thing to note is that dark background should only be used minimally.  It worked really well for the Solar Eclipse Case Study .  But, it only had two non-gray colors.  The original visualization had 12 different colors.  On black background, they look terrible.  On mine, I used 11 colors, and it looked much better on a white background.

Overall, this one was a challenge.  What I wanted to create was limited by my technical skills.  Ultimately, I went back to my skill set, and figured out how to make something interesting and informative.

Thank you for your time, and please leave any comments below. I would love to hear from you.