Learning path

This week I have concluded the last two courses for this autumn semester. Although I have always promoted “lifelong learning” I had not really done courses the last years beside some hands-on training in my field. After finishing my studies I have started to teach, and the preparation of these courses always resulted in staying up to date. But this autumn I started taking classes again.

MOOC’s

This autumn I discovered the Massive Open Online Course. In total I have finished four courses, three at coursera, and one at the Knight Center for Journalism in the Americas. In the start it was a bit strange. Would I still be able to study, do homework on a regular base, pass the quizzes? After two weeks this fear was fully over. Classes by a number of video messages of about 12 to 15 minutes each, is good (in total about 1 hour per course each week). Doing the weekly assignments, sometimes multiple choice on the course material, but also assignments with a lot of maths, drawing, building, and designing. I even managed to build a prototype of a juicer!

design assignment

In the end I have done 4 courses, and although the subjects were all different I see a very nice learning path. Let me tell you about what courses I did:

  • Model Thinking, coursera, Scott E. Page, University of Michigan
    “Why do models make us better thinkers? Models help us to better organize information – to make sense of that fire hose or hairball of data (choose your metaphor) available on the Internet. Models improve our abilities to make accurate forecasts. They help us make better decisions and adopt more effective strategies. They even can improve our ability to design institutions and procedures.”
  • An Introduction to Operations Management, coursera, Christian Terwiesch, University of Pennsylvania
    “In short, you will learn how to analyze business processes and how to improve them.”
  • Design: Creation of Artifacts in Society, coursera, Karl T. Ulrich, University of Pennsylvania
    “The course marries theory and practice, as both are valuable in improving design performance. Lectures and readings will lay out the fundamental concepts that underpin design as a human activity.”
  • Introduction to Infographics and Data Visualization, Knight Center, Alberto Cairo, University of Miami’s School of Communication
    “How to work with graphics to communicate and analyze data.”

Although the courses all look very different I found a very nice learning path in it. There have been some moments where ideas from the one course came up while struggling with the other course. For example, when needing to make a presentation on business processes in the course on Operations Management I have chosen to create an infographics to present the data and the outcomes. The model thinking course on the other side helped a lot in data organization and further exploring a way of thinking.

In my professional work I can make use of all four courses, that was in first instance not the goal. Another thing is that I have used a number of new tools to make and order my course notes. Among these tooling is Evernote and Tableau Public, tools that I had not used before and has proven very valuable. For other tools that I have used already for a long time, like Freemind and Inkscape I have found new ways of applying them.

Plans for 2013

In 2013 I have subscribed to new courses: Computing for Data Analysis, Game Theory, and Creative Programming for Digital Media & Mobile Apps. Besides I have started to work on my teaching materials to create my on-line course, an introduction to GIS and Geospatial data. For this course I have started on udemy a platform for on-line courses in a wide range of subjects.

So… On-line courses and lifelong learning, it will continue.

Advertisement

Final Assignment: Baller Gets 8 Year Extra

My final assignment for the MOOC on Infographics and Data Visualization is finished. Last week I collected all the data, this week it was about filling in the details and the design. And it is proven to be hard. Collecting data and presenting it is one thing, but in an infographics it should have just that extra. Plus it should have the story telling part. All in all I struggled.

The general story, like I wrote in my blog of last week is that Statistics Netherlands posted a new dataset on Life Expectancy and Income classes. A nice subject with a number of good and juicy details. It is the data from last year, but this institute already collects this kind of data for a long time. This allowed me to do something extra with the data, which is to show that life expectancy has changed over the last centuries.

In the end my infographics has 4 images.

  • Life Expectancy for Men, for the year 2011 in the age range of 0 to 80.
  • Life Expectancy for Woman, same as above
  • A detail showing one of the findings based on the analysis of the dataset
  • A historical overview of Life Expectancy in 25 year blocks from 1875 to 2000

I have first prepared my data sets in gnumeric. The data needed some cleaning, not all columns were needed and I wanted to separate the data for male and female. I have also struggled for a while with the income classes. The dataset has a subdivision between 4 classes: lower, lower middle, middle, upper middle, and upper class. All other datasets from the Statistics Netherlands have ten different classes, based on income, but they are not linear divided. It was quite some work to get them to match. Finally I used a histogram function based on standard deviation and a normal distribution. The book “Making History Count” from Feinstein and Thomas that once I used in my teaching has been a great help in this.

After importing the data into Tableau Public I started thinking about the presentation. I wanted to show the whole range, from the Life Expectancy of a new born child to the 80-years old. And I wanted to show the five different classes. After making a line diagram first I ended up changing my datapoints into rings, hoping that in that way the overlapping data for the classes would be more visible. I have partly succeeded in this.

Male

What is not immediately visible is that as a new born child it is good to be born in a upper class family, but after the age of 55 we see that the middle class and upper middle class do better. Therefor I have included a detail showing this conclusion.

Detail_1

Finally I have created 6 time series showing similar data ranges. This time not for the different income groups, but for male and female. I have chosen 6 different moments in time, each with a difference of 25 years, from 1875 until 2000.

Over_the_Ages

I have exported all the graphs to PDF and then the big play with the presentation started. For this I have used Inkscape, comparable with Illustrator, but Open Source. One of the tricks here is to play with the layers, making subdivisions between the different groups like graphs, images, text, and background. Setting guides to work and then prepare blocks. Since my first version was still dull I have decided to make a second version with some images in the historical series.

And the result? I am happy about it, but there is certainly place for much improvement. My goal was to learn more about Infographics in general. I succeeded in that, I now at least can see what needs to be done.

FinalAssignment_EK2012

http://www.elwink.nl/infographics/FinalAssignment.pdf

For all those reading this blog, and wondering… Alberto Cairo and the Knight Center for Journalism in the Americas start a new course in January 2013!

Thinking about data visualization

Earlier this week we have received the final assignment for the MOOC on infographics and data visualization. Alberto does not spare his students, writing: “This time, I am giving you the freedom to do whatever you want.” My first idea was a slight jubilation, everything is possible. Then we get 7 steps: starting with making a headline and gathering the data and ending with getting the results back. While commuting to work I saw  a small headline in one of the free newspapers: “Well-off people live longer in good health”.

Statistics Netherlands (in Dutch CBS) collects and processes data “in order to publish statistics to be used in practice“. Their website has a nice series of interactive infographics, and already years ago they were one of the first that introduced a webmapping interface to their statline website. For many of my GIS classes I have used their data. So a very useful source of wonderful data. But let me return to the assignment, the first step: getting a headline.

life expectancy based on data from cbs.nl

A Simple Headline … but Tease the Information

This week someone in my tweet-lists posted a tweet on a Webinar: How to Write Headlines for the Web. After watching the webinar I understood that my headline could make or break my great story. A good story without a catchy headline will been read less on the web. It should contain big numbers and they should be easily digestible. Wow… as a non journalist this is quite a challenge. And according to Alberto Cairo I need to “Try to find a focus, a headline.” In the webinar one of my favorite techniques is used: free association, with in the back of my mind the main question: what the story is about. If not just with a blank sheet of paper, I often use a MindMapping tool (in my case the fabulous Freemind) for this process.

The general scope given by Statistics Netherlands is: “Men and women from high-income households on average live about 8 and 7 years longer respectively than their counterparts in low-income households.” Far to long for a headline. In Dutch we have a proverb “Riches alone make no man happy” (or “Money isn’t everything”). This is what my associations led to. Leaving me with a number of keywords: Riches, happiness, long live, income, and 7 and 8 years. What about “How to earn an 8 years longer life?” Is it simple enough? There is another great tool that I love to browse: the urban dictionary. Wealthy has many hits: moneybags, ballers. So… “Baller gets 8 years extra”?

Gather the data … Combinations and Context

The next step is to think about the story I want to tell. In this case I will focus on the Dutch data first. My mind wanders on: it would be nice to get data for another country. There must also be some historical data on this subject. The Dutch Economic-Historical Archive (NEHA) has this kind of data, also the Statistics Netherlands have data back to 1899 in its historical series. While talking about the subject over dinner my son came up with the fact from his history class: the average life expectancy of a worker in Manchester during the industrial revolution was very low (an average of 17).

Another idea may be to link the data to the life style. There are lots of data on that subject too. On the other hand it is more difficult, and the context may be a lot harder to give. The Statistics Netherlands also mentions good health and good mental health. This may be subjects to include, but not as a main subject for the assignment for now.

So the plan…

What is the story I want to tell? There is enough historical data available. I want to tell the story of the rich, the poor, and the middle class at different moments in time. The turn of the 20th century, the 1930’s crisis, the after war period, the late 1970’s where many patterns changed, and now the 21st century. This approach will tell many stories. It will tell about prosperity, the working class, history, and many, many social elements that make a culture.

My story will be about culture and people, based on historical statistics. Now the next step is to think about the form.

This weeks assignment: data manipulation

This weeks assignment in the MOOC on infographics and data visualization by Alberto Cairo is about maps. From his new to appear book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. Like his earlier book – The Functional Art -, also this chapter is a well written piece with many great visuals.1 Alberto Cairo describes thematic maps as “the purest and most successful form of information graphics”, and I certainly do not disagree. And about the assignment? That is to use data from the data from the US Bureau of Labor Statistics and show unemployment in the US. In a way like The Guardian’s Data Blog published a story about unemployment in the US: http://www.guardian.co.uk/news/datablog/interactive/2011/sep/08/us-unemployment-obama-jobs-speech-state-map. But with more functionality and depth.

With Tableau, TileMill, or even with ESRI ArcGIS online this task is not that big. The data is easily accessible and well-organized. But this is exactly where standard mapping differs from making infographics. I am quite sure that our teacher does not want the standard map. In the first assignment I created a map and Alberto commented: “However, it doesn’t improve the original as much as it could. The reason is that you are forcing me to click on each country to get the data, rather than giving me the opportunity to explore the data in different ways, such as creating rankings, comparisons between countries, etc.”. So no standard map this time, but something that will focus on the exploration.

I decided to focus on two of the questions that are raised in the assignment:

  • What kind of graphs or maps would you need to tell a compelling story based on this?
  • How would you give context to the data?

If we are talking about unemployment during the first period of Obama there are  some nice infographics on this subject already in the run to the elections.

My approach: Geo Tagging

The latest data available is from September 2012 and looking at that data you immediately see the big differences between states. Montana, Wyoming, North and South Dakota, Nebraska. From my times being there I know the views of large (or even more than large) plains, the emptiness. The number of people per square meter must be very low. On the other hand the giant peak in California, and the higher (but not peak) values in the dataset for states along the east coast, and the big cities like Chicago, Detroit, and states like Texas, and Florida. My first impression of the dataset is that it is not averaged by population density.

In order to map the data, and to show the population – unemployment relation the data must be geo-tagged. Since the data is ordered by state, and the state codes are given this is not a difficult task. MaxMind offers a nice table of states and their longitude and latitude. By combining this data with the given data set I now at least have point data that is geo referenced. And so it can be mapped with a centroid.

Then I started to play with tableau public. Within the map option of the software there are several settings and datasets preloaded, population, population by race, occupations.

What appears to me is that population density and mixture of race both have effects on the figures. States with many big cities seem to have a higher unemployment rate. So a first step would be to map the data against the population density.

The Census Bureau Data

The United States Census Bureau has a nice dataset for the census of April 2010. Although the census data is for the full population, including those people that are too young to work, or those that are retired the figures change already with the first quick lay-out. Rhode Island that at first was a small dot, now suddenly becomes one of the largest. On the other hand California that seemed to be the state with an incredible high unemployment rate now has become an average player. All in all we see how the differences have become smaller when it comes to the percentage of the total population that is unemployed.

Conclusion and the final result

So “to tell a compelling story” I made an interactive infographic where the map is a main element on the page. From this map you can click on a state and see the unemployment figures for 10 periods in time: at the start of the first period of Obama, and then each year, until just before his re-election. The context is that data is multi interpretable, even though everyone knows the facts. By leaving out specific details data manipulation becomes a word with a double meaning.

ps. Did I tell you that subscription to the second course, starting in January is open? You can subscribe here: Knight Center for Journalism in the Americas’s Distance Learning program.

1 In the first version this paragraph read: “From his book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. The book is well written with many great visuals and the same is true in this chapter.”

Happy GISDay

A week ago (on Thursday) someone tweeted “happy postgisday”, and yes, it was the day after GISDay. This yearly event is, as is stated on the GISDay website: “The annual salute to geospatial technology and its power to transform and better our lives”. Looking at the event map as published on the website I was amazed by two things, firstly the wide spread of events all over the globe, secondly that there was no event planned in The Netherlands, although there exists a good and well organized GIS community. And I did not organize any event either.

My thought was: What could I have organized to bring GIS to a wider audience? The following themes come to my mind:

The application of GIS in area’s where you do not expect it.

Not so long ago, about a decade or so, GIS mainly took place in the drawing room. Networks were not any longer designed and maintained on the large drawing boards with pen and paper. In the GIS era these drawing boards were replaced by digitizer boards and large monitors, and the blueprints were replaced by bits and bytes. With all this development answering questions on the assets became easier. Examples of questions you can answer in this context are: What is the current state of our network, what type of asset had the biggest interference sensitivity over the last period, what customers should be informed about the upcoming repair work?

As said in an earlier blog post on this subject the main shift appeared when navigation systems became more and more a commodity. Nowadays  GIS is not any longer limited only to the drawing room. We see GIS in many different contexts, and different industries, on places where you would not expect it. To tell this story may be my first presentation.

Your safety monitored with GIS

The second story is about Geo-data and boundaries. In European context Inspire is becoming more and more grown up. Inspire is the initiative that should create an infrastructure to make geo information and spatial data better accessible. When you cross a border (and in this case I am not even talking about the country borders), it may well be that the data that you find on the other side of the border is not directly usable. This can cause problems, for example when a river gets polluted, and we want to take steps to prevent the pollution to get into the drinking water supply chain. Best is to have data that can be easily exchanged between different organizations.

Different local governments store their data in different ways, this is due to for example the GIS software they use. The main result of this is that if we want to get a full overview of data available we should first create a common language. But not only we should store the spatial data in a common way, it must also be found across the different borders. So labels to the data and the datasets, the metadata, must be generalized too. In the last years we have seen a fast growth of the so called geo-portals, in the future these will be the entrance to the European data. They are a wonderful way to tell a larger audience how spatial data, and the systems storing and analyzing this data work together on monitoring safety.

The past analyzed with GIS

A growing theme in historical studies is the application of GIS to study spatio-temporal processes. Mapping differences between two or more different time periods, and showing where changes appeared. In the last decades I have published a number of these studies. For example on detecting changes in the urban landscape (how a city developed). But there is so much more that can be done on this subject. In the book “Past Time, Past Place” Anne Knowles collected a number of very good examples on how GIS can be applied in history. This book was published in 2002 and since then there has been a lot of new development. For example GIS has become better accessible and more a commodity in the historical sciences.

If we apply GIS to history we also come to the subject of story telling. With the historical datasets that we have available we can tell a story that may have been hidden before. This story can make the past more interactive, how odd this may sound. We can show the development of a town, starting from a little village on a sand ridge, and how, based on the written deeds we find in the archives, we see that over time the village grew. For example we can show the map, and how more and more streets and houses appear. In addition to this map we can add the deeds on which we base our findings to the different plots.

Next year… GISDay

Next year on GISDay (Wednesday, November 20, 2013) I would like to show small projects on these three examples, mainly to introduce GIS to a wider audience. In the mean time I will post examples here.

Data, Information, Knowledge … and Wisdom

Last week the MOOC on Infographics and Data Visualization at the Knight Center for Journalism in the Americas started, and I am one of the 2000 lucky students. About 8 years ago my former employer dropped a book on my desk. Mouth watering and in one go I finished it. The book had to do with data processing and information visualization in a way that as a computer scientist and art historian I could understand so well. The book by Edward Tufte has been a source of inspiration for many lectures and thoughts while working with, in my case, the presentation of geographical data.

One of the main reasons for me to start with the course is that I see the importance of data visualization. I am neither a journalist, nor a professional designer yet I want to visualize my data analysis. For example the data about assets in a geo information system, and in a way like Stephen Few describes it: Meaningful decoded data where the nature of the data as well as the relationships between the different objects is clear.  I want to be able to present this data in a way that it is understandable for non geo informatics people.

Data information knowledge model

Last week Alberto Cairo introduced us to the concepts of the data information knowledge model and on how to analyze the ongoing stream of infographics that are produced in his first lecture on information visualization.  One of the references he makes is to a chapter on data visualization from Stephen Few. Few writes: “The goal is to translate abstract information into visual representations that can be easily, efficiently, accurately, and meaningfully decoded.” In the information processing for Geographical Information Systems we often meet the same goals.

One of the future directions that Few mentions in his text is: “The integration of geo-spatial and network displays (such as node and link diagrams) with other forms of display for seamless interaction and simultaneous use.” And that is exactly why 10 years ago that book landed on my desk. I believe that the integration as mentioned above is a very obvious one,  but we should be careful. Geo data looks very “sexy”, and we see that many designers of infographics tend to use maps as  a background, or when a location is given, map the data to that location. That brings me to one of the questions of Stephen Few: “Is it obvious how people should use the information”.

In the discussion last week on a map given by Alberto Cairo, the instructor of the course, mentioned the map as a background played an important role in many people’s responses. Information on Internet use for several countries had to be presented. Some of the responses tended towards the fact that everyone knows where specific countries are on the map, so why try to map a chart to a location. On the level of countries or continents I can understand that argument, but in many cases we work with data on a smaller scale. When it comes to statistics on your assets the map is an excellent carrier of information.

Location data

Probably more then 90% of the data in a geo-database has nothing to do with the map in a direct way. It does not contain X, Y, or Z coördinates by itself, but the data is linked to other data tables that do have the location connection. For example we can have a postal code that will link customer data to a specific area. In this way we can enrich the data. The last few years there have been a number of companies that showed us wonderful examples of how to do. The result is that much of the data that is available in information systems now can be linked in one way or the other to a specific location.

If you have a shopping card from the local supermarket, data is collected on the products you buy. Different queries can be run on this raw data, for example on price ranges or on the type of products. All this data has no location component, it’s products, prices, and quantities. And we can, based on this data, run wonderful statistics. We can add extra value to this data set when we combine these statistics to the postal code of the consumer. Suddenly we start to see patterns, for example when it comes to age categories in a certain area of town.

The next step, and here I refer back to the future direction, is to change the information that we get from the different queries into knowledge. Visualizations based on the above example may have added value. But this added value can only be achieved when the data is easily and efficiently available, plus easy to read and interpret.

This is exactly where we can learn from the designers that work in the newspaper offices. This is, besides the fun, a reason for me to take the course on infographics and data visualization.

Location everywhere

Already for some time I have wanted to write about indoor positioning. Since Indoor positioning is going to be a future direction for a number of fields, including GIS. I had collected articles and did my research. And then yesterday by following another post (on big data, food, and visualization) I end up at a (Dutch) post from Numrush: “Indoor navigation system Wifarer announces first customer” [My translations, EK]. Author Johan Voets states in his post exactly what I wanted to tell you in my blog: “Indoor navigation. It sounds a bit unnatural, but it is definitely a fast growing market.”

In my earlier blogs on the fast developing GIS market I already indicated that mobile devices, such as smart phones, offer great possibilities. On a post from envisioningtech the location awareness is mentioned in the context of new sensors. The article uses the term “planned spontaneity”, as where – based on earlier experiences – your system takes decisions, based on a certain context. And yes this context does include location too.

The four elements

In another recent study Latitude mentions the 4 I’s: “four elements—the ‘4 I’s’—that will continue to play a significant role in our experiences with narrative-based media”. Immersion, interactivity, integration, and impact. To cite their report even further: “Immersion and interactivity primarily help an audience to go deeper into a story, while integration and impact are about bringing a story of out of the screen, into our actual lives.”

Location based services can play a major role in experiencing the 4 I’s. What if we can offer extra experience based on the current location? From my background as an art historian and travel guide I think I can say something about this story telling effect here too. People want to go around through a town or a museum and as a guide you need to be pointing out the particularities of a certain object or view. Applications that do so already exist in “open air” situations. And also many musea offer you the possibilities for an online guided tour. I have seen over the last years many of these wonderful initiatives.

But in musea we still see people typing in coded numbers on devices in order to receive the stories and the context. The Indoor Geo Database will include many Points Of Interest. And our smart algorithm will select the right combination of these POI for the current context. Many stories to tell, and based on your interest I can show you the same museum in a number of different ways.

“will people even indoors use the smartphone to navigate?”

Voets ends his post with “The question is: will people even indoors use the smart phone to navigate?”. My answer is clear, Yes they will. And Indoor Positioning is not only to be used in a museum or cultural context. What if I could go through a department store where my smart phone shows me the latest gadgets and offers, based on my recent online searches? Or maybe the system could combine earlier experiences and show me something real life that I was looking for a month ago.

Does this sound scary, or do you see the new possibilities? Like I said before I see new, and serious applications of this technology, in many different fields.