Learning path

This week I have concluded the last two courses for this autumn semester. Although I have always promoted “lifelong learning” I had not really done courses the last years beside some hands-on training in my field. After finishing my studies I have started to teach, and the preparation of these courses always resulted in staying up to date. But this autumn I started taking classes again.

MOOC’s

This autumn I discovered the Massive Open Online Course. In total I have finished four courses, three at coursera, and one at the Knight Center for Journalism in the Americas. In the start it was a bit strange. Would I still be able to study, do homework on a regular base, pass the quizzes? After two weeks this fear was fully over. Classes by a number of video messages of about 12 to 15 minutes each, is good (in total about 1 hour per course each week). Doing the weekly assignments, sometimes multiple choice on the course material, but also assignments with a lot of maths, drawing, building, and designing. I even managed to build a prototype of a juicer!

design assignment

In the end I have done 4 courses, and although the subjects were all different I see a very nice learning path. Let me tell you about what courses I did:

  • Model Thinking, coursera, Scott E. Page, University of Michigan
    “Why do models make us better thinkers? Models help us to better organize information – to make sense of that fire hose or hairball of data (choose your metaphor) available on the Internet. Models improve our abilities to make accurate forecasts. They help us make better decisions and adopt more effective strategies. They even can improve our ability to design institutions and procedures.”
  • An Introduction to Operations Management, coursera, Christian Terwiesch, University of Pennsylvania
    “In short, you will learn how to analyze business processes and how to improve them.”
  • Design: Creation of Artifacts in Society, coursera, Karl T. Ulrich, University of Pennsylvania
    “The course marries theory and practice, as both are valuable in improving design performance. Lectures and readings will lay out the fundamental concepts that underpin design as a human activity.”
  • Introduction to Infographics and Data Visualization, Knight Center, Alberto Cairo, University of Miami’s School of Communication
    “How to work with graphics to communicate and analyze data.”

Although the courses all look very different I found a very nice learning path in it. There have been some moments where ideas from the one course came up while struggling with the other course. For example, when needing to make a presentation on business processes in the course on Operations Management I have chosen to create an infographics to present the data and the outcomes. The model thinking course on the other side helped a lot in data organization and further exploring a way of thinking.

In my professional work I can make use of all four courses, that was in first instance not the goal. Another thing is that I have used a number of new tools to make and order my course notes. Among these tooling is Evernote and Tableau Public, tools that I had not used before and has proven very valuable. For other tools that I have used already for a long time, like Freemind and Inkscape I have found new ways of applying them.

Plans for 2013

In 2013 I have subscribed to new courses: Computing for Data Analysis, Game Theory, and Creative Programming for Digital Media & Mobile Apps. Besides I have started to work on my teaching materials to create my on-line course, an introduction to GIS and Geospatial data. For this course I have started on udemy a platform for on-line courses in a wide range of subjects.

So… On-line courses and lifelong learning, it will continue.

Advertisements

Final Assignment: Baller Gets 8 Year Extra

My final assignment for the MOOC on Infographics and Data Visualization is finished. Last week I collected all the data, this week it was about filling in the details and the design. And it is proven to be hard. Collecting data and presenting it is one thing, but in an infographics it should have just that extra. Plus it should have the story telling part. All in all I struggled.

The general story, like I wrote in my blog of last week is that Statistics Netherlands posted a new dataset on Life Expectancy and Income classes. A nice subject with a number of good and juicy details. It is the data from last year, but this institute already collects this kind of data for a long time. This allowed me to do something extra with the data, which is to show that life expectancy has changed over the last centuries.

In the end my infographics has 4 images.

  • Life Expectancy for Men, for the year 2011 in the age range of 0 to 80.
  • Life Expectancy for Woman, same as above
  • A detail showing one of the findings based on the analysis of the dataset
  • A historical overview of Life Expectancy in 25 year blocks from 1875 to 2000

I have first prepared my data sets in gnumeric. The data needed some cleaning, not all columns were needed and I wanted to separate the data for male and female. I have also struggled for a while with the income classes. The dataset has a subdivision between 4 classes: lower, lower middle, middle, upper middle, and upper class. All other datasets from the Statistics Netherlands have ten different classes, based on income, but they are not linear divided. It was quite some work to get them to match. Finally I used a histogram function based on standard deviation and a normal distribution. The book “Making History Count” from Feinstein and Thomas that once I used in my teaching has been a great help in this.

After importing the data into Tableau Public I started thinking about the presentation. I wanted to show the whole range, from the Life Expectancy of a new born child to the 80-years old. And I wanted to show the five different classes. After making a line diagram first I ended up changing my datapoints into rings, hoping that in that way the overlapping data for the classes would be more visible. I have partly succeeded in this.

Male

What is not immediately visible is that as a new born child it is good to be born in a upper class family, but after the age of 55 we see that the middle class and upper middle class do better. Therefor I have included a detail showing this conclusion.

Detail_1

Finally I have created 6 time series showing similar data ranges. This time not for the different income groups, but for male and female. I have chosen 6 different moments in time, each with a difference of 25 years, from 1875 until 2000.

Over_the_Ages

I have exported all the graphs to PDF and then the big play with the presentation started. For this I have used Inkscape, comparable with Illustrator, but Open Source. One of the tricks here is to play with the layers, making subdivisions between the different groups like graphs, images, text, and background. Setting guides to work and then prepare blocks. Since my first version was still dull I have decided to make a second version with some images in the historical series.

And the result? I am happy about it, but there is certainly place for much improvement. My goal was to learn more about Infographics in general. I succeeded in that, I now at least can see what needs to be done.

FinalAssignment_EK2012

http://www.elwink.nl/infographics/FinalAssignment.pdf

For all those reading this blog, and wondering… Alberto Cairo and the Knight Center for Journalism in the Americas start a new course in January 2013!

Thinking about data visualization

Earlier this week we have received the final assignment for the MOOC on infographics and data visualization. Alberto does not spare his students, writing: “This time, I am giving you the freedom to do whatever you want.” My first idea was a slight jubilation, everything is possible. Then we get 7 steps: starting with making a headline and gathering the data and ending with getting the results back. While commuting to work I saw  a small headline in one of the free newspapers: “Well-off people live longer in good health”.

Statistics Netherlands (in Dutch CBS) collects and processes data “in order to publish statistics to be used in practice“. Their website has a nice series of interactive infographics, and already years ago they were one of the first that introduced a webmapping interface to their statline website. For many of my GIS classes I have used their data. So a very useful source of wonderful data. But let me return to the assignment, the first step: getting a headline.

life expectancy based on data from cbs.nl

A Simple Headline … but Tease the Information

This week someone in my tweet-lists posted a tweet on a Webinar: How to Write Headlines for the Web. After watching the webinar I understood that my headline could make or break my great story. A good story without a catchy headline will been read less on the web. It should contain big numbers and they should be easily digestible. Wow… as a non journalist this is quite a challenge. And according to Alberto Cairo I need to “Try to find a focus, a headline.” In the webinar one of my favorite techniques is used: free association, with in the back of my mind the main question: what the story is about. If not just with a blank sheet of paper, I often use a MindMapping tool (in my case the fabulous Freemind) for this process.

The general scope given by Statistics Netherlands is: “Men and women from high-income households on average live about 8 and 7 years longer respectively than their counterparts in low-income households.” Far to long for a headline. In Dutch we have a proverb “Riches alone make no man happy” (or “Money isn’t everything”). This is what my associations led to. Leaving me with a number of keywords: Riches, happiness, long live, income, and 7 and 8 years. What about “How to earn an 8 years longer life?” Is it simple enough? There is another great tool that I love to browse: the urban dictionary. Wealthy has many hits: moneybags, ballers. So… “Baller gets 8 years extra”?

Gather the data … Combinations and Context

The next step is to think about the story I want to tell. In this case I will focus on the Dutch data first. My mind wanders on: it would be nice to get data for another country. There must also be some historical data on this subject. The Dutch Economic-Historical Archive (NEHA) has this kind of data, also the Statistics Netherlands have data back to 1899 in its historical series. While talking about the subject over dinner my son came up with the fact from his history class: the average life expectancy of a worker in Manchester during the industrial revolution was very low (an average of 17).

Another idea may be to link the data to the life style. There are lots of data on that subject too. On the other hand it is more difficult, and the context may be a lot harder to give. The Statistics Netherlands also mentions good health and good mental health. This may be subjects to include, but not as a main subject for the assignment for now.

So the plan…

What is the story I want to tell? There is enough historical data available. I want to tell the story of the rich, the poor, and the middle class at different moments in time. The turn of the 20th century, the 1930’s crisis, the after war period, the late 1970’s where many patterns changed, and now the 21st century. This approach will tell many stories. It will tell about prosperity, the working class, history, and many, many social elements that make a culture.

My story will be about culture and people, based on historical statistics. Now the next step is to think about the form.

This weeks assignment: data manipulation

This weeks assignment in the MOOC on infographics and data visualization by Alberto Cairo is about maps. From his new to appear book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. Like his earlier book – The Functional Art -, also this chapter is a well written piece with many great visuals.1 Alberto Cairo describes thematic maps as “the purest and most successful form of information graphics”, and I certainly do not disagree. And about the assignment? That is to use data from the data from the US Bureau of Labor Statistics and show unemployment in the US. In a way like The Guardian’s Data Blog published a story about unemployment in the US: http://www.guardian.co.uk/news/datablog/interactive/2011/sep/08/us-unemployment-obama-jobs-speech-state-map. But with more functionality and depth.

With Tableau, TileMill, or even with ESRI ArcGIS online this task is not that big. The data is easily accessible and well-organized. But this is exactly where standard mapping differs from making infographics. I am quite sure that our teacher does not want the standard map. In the first assignment I created a map and Alberto commented: “However, it doesn’t improve the original as much as it could. The reason is that you are forcing me to click on each country to get the data, rather than giving me the opportunity to explore the data in different ways, such as creating rankings, comparisons between countries, etc.”. So no standard map this time, but something that will focus on the exploration.

I decided to focus on two of the questions that are raised in the assignment:

  • What kind of graphs or maps would you need to tell a compelling story based on this?
  • How would you give context to the data?

If we are talking about unemployment during the first period of Obama there are  some nice infographics on this subject already in the run to the elections.

My approach: Geo Tagging

The latest data available is from September 2012 and looking at that data you immediately see the big differences between states. Montana, Wyoming, North and South Dakota, Nebraska. From my times being there I know the views of large (or even more than large) plains, the emptiness. The number of people per square meter must be very low. On the other hand the giant peak in California, and the higher (but not peak) values in the dataset for states along the east coast, and the big cities like Chicago, Detroit, and states like Texas, and Florida. My first impression of the dataset is that it is not averaged by population density.

In order to map the data, and to show the population – unemployment relation the data must be geo-tagged. Since the data is ordered by state, and the state codes are given this is not a difficult task. MaxMind offers a nice table of states and their longitude and latitude. By combining this data with the given data set I now at least have point data that is geo referenced. And so it can be mapped with a centroid.

Then I started to play with tableau public. Within the map option of the software there are several settings and datasets preloaded, population, population by race, occupations.

What appears to me is that population density and mixture of race both have effects on the figures. States with many big cities seem to have a higher unemployment rate. So a first step would be to map the data against the population density.

The Census Bureau Data

The United States Census Bureau has a nice dataset for the census of April 2010. Although the census data is for the full population, including those people that are too young to work, or those that are retired the figures change already with the first quick lay-out. Rhode Island that at first was a small dot, now suddenly becomes one of the largest. On the other hand California that seemed to be the state with an incredible high unemployment rate now has become an average player. All in all we see how the differences have become smaller when it comes to the percentage of the total population that is unemployed.

Conclusion and the final result

So “to tell a compelling story” I made an interactive infographic where the map is a main element on the page. From this map you can click on a state and see the unemployment figures for 10 periods in time: at the start of the first period of Obama, and then each year, until just before his re-election. The context is that data is multi interpretable, even though everyone knows the facts. By leaving out specific details data manipulation becomes a word with a double meaning.

ps. Did I tell you that subscription to the second course, starting in January is open? You can subscribe here: Knight Center for Journalism in the Americas’s Distance Learning program.

1 In the first version this paragraph read: “From his book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. The book is well written with many great visuals and the same is true in this chapter.”

Data, Information, Knowledge … and Wisdom

Last week the MOOC on Infographics and Data Visualization at the Knight Center for Journalism in the Americas started, and I am one of the 2000 lucky students. About 8 years ago my former employer dropped a book on my desk. Mouth watering and in one go I finished it. The book had to do with data processing and information visualization in a way that as a computer scientist and art historian I could understand so well. The book by Edward Tufte has been a source of inspiration for many lectures and thoughts while working with, in my case, the presentation of geographical data.

One of the main reasons for me to start with the course is that I see the importance of data visualization. I am neither a journalist, nor a professional designer yet I want to visualize my data analysis. For example the data about assets in a geo information system, and in a way like Stephen Few describes it: Meaningful decoded data where the nature of the data as well as the relationships between the different objects is clear.  I want to be able to present this data in a way that it is understandable for non geo informatics people.

Data information knowledge model

Last week Alberto Cairo introduced us to the concepts of the data information knowledge model and on how to analyze the ongoing stream of infographics that are produced in his first lecture on information visualization.  One of the references he makes is to a chapter on data visualization from Stephen Few. Few writes: “The goal is to translate abstract information into visual representations that can be easily, efficiently, accurately, and meaningfully decoded.” In the information processing for Geographical Information Systems we often meet the same goals.

One of the future directions that Few mentions in his text is: “The integration of geo-spatial and network displays (such as node and link diagrams) with other forms of display for seamless interaction and simultaneous use.” And that is exactly why 10 years ago that book landed on my desk. I believe that the integration as mentioned above is a very obvious one,  but we should be careful. Geo data looks very “sexy”, and we see that many designers of infographics tend to use maps as  a background, or when a location is given, map the data to that location. That brings me to one of the questions of Stephen Few: “Is it obvious how people should use the information”.

In the discussion last week on a map given by Alberto Cairo, the instructor of the course, mentioned the map as a background played an important role in many people’s responses. Information on Internet use for several countries had to be presented. Some of the responses tended towards the fact that everyone knows where specific countries are on the map, so why try to map a chart to a location. On the level of countries or continents I can understand that argument, but in many cases we work with data on a smaller scale. When it comes to statistics on your assets the map is an excellent carrier of information.

Location data

Probably more then 90% of the data in a geo-database has nothing to do with the map in a direct way. It does not contain X, Y, or Z coördinates by itself, but the data is linked to other data tables that do have the location connection. For example we can have a postal code that will link customer data to a specific area. In this way we can enrich the data. The last few years there have been a number of companies that showed us wonderful examples of how to do. The result is that much of the data that is available in information systems now can be linked in one way or the other to a specific location.

If you have a shopping card from the local supermarket, data is collected on the products you buy. Different queries can be run on this raw data, for example on price ranges or on the type of products. All this data has no location component, it’s products, prices, and quantities. And we can, based on this data, run wonderful statistics. We can add extra value to this data set when we combine these statistics to the postal code of the consumer. Suddenly we start to see patterns, for example when it comes to age categories in a certain area of town.

The next step, and here I refer back to the future direction, is to change the information that we get from the different queries into knowledge. Visualizations based on the above example may have added value. But this added value can only be achieved when the data is easily and efficiently available, plus easy to read and interpret.

This is exactly where we can learn from the designers that work in the newspaper offices. This is, besides the fun, a reason for me to take the course on infographics and data visualization.