Tweet #200

On October 30th I posted my 100th tweet, now 2 months later on December 26th,  I reached the 200th tweet. When I reached tweet number 100 I posted a Wordle diagram with the subjects of my last 100 posts, now at number 200 I will do the same.

What have I done in the last months, and what have been my areas of interest? I believe that twitter is a good way to find that out. So I created the wordle again, with the 150 most used words.

tweet1-200_noRTMT

It is clear from seeing the diagram that GIS, MOOC, dataviz, and infographics have been main subjects. The teacher of the MOOC Alberto Cairo, that I have retweeted a lot is also very prominent. A term that is missing is web design, also the pizza that was more prominent from my food blogs last time do not make it to the most used 150 words. On the other hand a term that is special for this time of year and got listed is “Santa” this due to the wonderful Norad Santa tracker and the Google Maps interface.

All in all, a good overview, and now on my way to tweet #300

Geo vs GIS, a long lasting theme

Is there a difference between Geo and GIS? Many people ask me this question, and when they do not ask I tend to tell them that there is that clear difference. Last week I had such a discussion again, and it kept me thinking. The main reason for this blog post is to put my ideas on virtual paper, so I can in the future to refer to it. Of course I know that this kind of subjects can also start a discussion. If that is the case, please feel free to add your comments.

The Definition

Lets start at the begin, the definition of GIS, since that is the source of much confusion. I believe that for decades all was clear: we had GIS software that processed Spatial data or Geo data. But then came online mapping tools and navigation systems. These systems work with geo information, so in the above description they will be GIS. But are they really?

All this depends on our definition of what Geographical Information Systems are. If we look at it from an information system perspective the GIS will be a wide range of techniques. For example when we take a definition as they can be found in dictionaries on the web for Information Systems (IS) we read:

information system: The entire infrastructure, organization, personnel, and components that collect, process, store, transmit, display, disseminate, and act on information. [from The Free Dictionary]

Other descriptions are similar, they all mention a mix of technical and human resources that in combination are able to process data. In this way we can talk about “the GIS department”, “the GIS software”, and even “the GIS data”. In this definition our navigation systems, departments and teams, and online mapping tools like Bing and Google Maps are GIS. As I stated before this definition is too wide to my opinion.

In the most strict definition one can say: GIS is the software, the toolbox. Geo is the information that the GIS needs. Geo is the model and GIS works with it. I do realize that this definition leaves the human resource side and the special hardware of our field out of the discussion. And this is exactly the point where complexity starts.

Let me summarize briefly the different definitions of GIS.

GIS is:

  1. The entire infrastructure, organization, personnel, and components that work with spatial data.
  2. The software and the spatial data.
  3. The software.

Much of the discussion depends on what you choose as a definition.

The Discussion

It should be mentioned that this GIS is nothing without the spatial data that can be processed with it. But is it part of the GIS? I often ask a question like: “Does the word processor also contain the texts you are about to write when you unpack the box?” These text for the word processor are like the geo data for the GIS.

So I make a clear distinction between Geo and GIS. The consequence of this distinction is that for me the only possible definition is the third one. And I realize that this conflicts with the general definition of Information Systems.

Let me even go one step ahead — especially when we would like to keep the information system definition — and propose to make more use of the term Spatial Information System when it comes to the definition for the infrastructure, organization, personnel, and components. In that way we can reserve the term GIS for the software and Geo (or Spatial) for the data. Combinations with other fields like Spatial Intelligence but also the place of Remote Sensing may come more usable in this way.

I wonder what others are thinking…

Learning path

This week I have concluded the last two courses for this autumn semester. Although I have always promoted “lifelong learning” I had not really done courses the last years beside some hands-on training in my field. After finishing my studies I have started to teach, and the preparation of these courses always resulted in staying up to date. But this autumn I started taking classes again.

MOOC’s

This autumn I discovered the Massive Open Online Course. In total I have finished four courses, three at coursera, and one at the Knight Center for Journalism in the Americas. In the start it was a bit strange. Would I still be able to study, do homework on a regular base, pass the quizzes? After two weeks this fear was fully over. Classes by a number of video messages of about 12 to 15 minutes each, is good (in total about 1 hour per course each week). Doing the weekly assignments, sometimes multiple choice on the course material, but also assignments with a lot of maths, drawing, building, and designing. I even managed to build a prototype of a juicer!

design assignment

In the end I have done 4 courses, and although the subjects were all different I see a very nice learning path. Let me tell you about what courses I did:

  • Model Thinking, coursera, Scott E. Page, University of Michigan
    “Why do models make us better thinkers? Models help us to better organize information – to make sense of that fire hose or hairball of data (choose your metaphor) available on the Internet. Models improve our abilities to make accurate forecasts. They help us make better decisions and adopt more effective strategies. They even can improve our ability to design institutions and procedures.”
  • An Introduction to Operations Management, coursera, Christian Terwiesch, University of Pennsylvania
    “In short, you will learn how to analyze business processes and how to improve them.”
  • Design: Creation of Artifacts in Society, coursera, Karl T. Ulrich, University of Pennsylvania
    “The course marries theory and practice, as both are valuable in improving design performance. Lectures and readings will lay out the fundamental concepts that underpin design as a human activity.”
  • Introduction to Infographics and Data Visualization, Knight Center, Alberto Cairo, University of Miami’s School of Communication
    “How to work with graphics to communicate and analyze data.”

Although the courses all look very different I found a very nice learning path in it. There have been some moments where ideas from the one course came up while struggling with the other course. For example, when needing to make a presentation on business processes in the course on Operations Management I have chosen to create an infographics to present the data and the outcomes. The model thinking course on the other side helped a lot in data organization and further exploring a way of thinking.

In my professional work I can make use of all four courses, that was in first instance not the goal. Another thing is that I have used a number of new tools to make and order my course notes. Among these tooling is Evernote and Tableau Public, tools that I had not used before and has proven very valuable. For other tools that I have used already for a long time, like Freemind and Inkscape I have found new ways of applying them.

Plans for 2013

In 2013 I have subscribed to new courses: Computing for Data Analysis, Game Theory, and Creative Programming for Digital Media & Mobile Apps. Besides I have started to work on my teaching materials to create my on-line course, an introduction to GIS and Geospatial data. For this course I have started on udemy a platform for on-line courses in a wide range of subjects.

So… On-line courses and lifelong learning, it will continue.

Final Assignment: Baller Gets 8 Year Extra

My final assignment for the MOOC on Infographics and Data Visualization is finished. Last week I collected all the data, this week it was about filling in the details and the design. And it is proven to be hard. Collecting data and presenting it is one thing, but in an infographics it should have just that extra. Plus it should have the story telling part. All in all I struggled.

The general story, like I wrote in my blog of last week is that Statistics Netherlands posted a new dataset on Life Expectancy and Income classes. A nice subject with a number of good and juicy details. It is the data from last year, but this institute already collects this kind of data for a long time. This allowed me to do something extra with the data, which is to show that life expectancy has changed over the last centuries.

In the end my infographics has 4 images.

  • Life Expectancy for Men, for the year 2011 in the age range of 0 to 80.
  • Life Expectancy for Woman, same as above
  • A detail showing one of the findings based on the analysis of the dataset
  • A historical overview of Life Expectancy in 25 year blocks from 1875 to 2000

I have first prepared my data sets in gnumeric. The data needed some cleaning, not all columns were needed and I wanted to separate the data for male and female. I have also struggled for a while with the income classes. The dataset has a subdivision between 4 classes: lower, lower middle, middle, upper middle, and upper class. All other datasets from the Statistics Netherlands have ten different classes, based on income, but they are not linear divided. It was quite some work to get them to match. Finally I used a histogram function based on standard deviation and a normal distribution. The book “Making History Count” from Feinstein and Thomas that once I used in my teaching has been a great help in this.

After importing the data into Tableau Public I started thinking about the presentation. I wanted to show the whole range, from the Life Expectancy of a new born child to the 80-years old. And I wanted to show the five different classes. After making a line diagram first I ended up changing my datapoints into rings, hoping that in that way the overlapping data for the classes would be more visible. I have partly succeeded in this.

Male

What is not immediately visible is that as a new born child it is good to be born in a upper class family, but after the age of 55 we see that the middle class and upper middle class do better. Therefor I have included a detail showing this conclusion.

Detail_1

Finally I have created 6 time series showing similar data ranges. This time not for the different income groups, but for male and female. I have chosen 6 different moments in time, each with a difference of 25 years, from 1875 until 2000.

Over_the_Ages

I have exported all the graphs to PDF and then the big play with the presentation started. For this I have used Inkscape, comparable with Illustrator, but Open Source. One of the tricks here is to play with the layers, making subdivisions between the different groups like graphs, images, text, and background. Setting guides to work and then prepare blocks. Since my first version was still dull I have decided to make a second version with some images in the historical series.

And the result? I am happy about it, but there is certainly place for much improvement. My goal was to learn more about Infographics in general. I succeeded in that, I now at least can see what needs to be done.

FinalAssignment_EK2012

http://www.elwink.nl/infographics/FinalAssignment.pdf

For all those reading this blog, and wondering… Alberto Cairo and the Knight Center for Journalism in the Americas start a new course in January 2013!

Thinking about data visualization

Earlier this week we have received the final assignment for the MOOC on infographics and data visualization. Alberto does not spare his students, writing: “This time, I am giving you the freedom to do whatever you want.” My first idea was a slight jubilation, everything is possible. Then we get 7 steps: starting with making a headline and gathering the data and ending with getting the results back. While commuting to work I saw  a small headline in one of the free newspapers: “Well-off people live longer in good health”.

Statistics Netherlands (in Dutch CBS) collects and processes data “in order to publish statistics to be used in practice“. Their website has a nice series of interactive infographics, and already years ago they were one of the first that introduced a webmapping interface to their statline website. For many of my GIS classes I have used their data. So a very useful source of wonderful data. But let me return to the assignment, the first step: getting a headline.

life expectancy based on data from cbs.nl

A Simple Headline … but Tease the Information

This week someone in my tweet-lists posted a tweet on a Webinar: How to Write Headlines for the Web. After watching the webinar I understood that my headline could make or break my great story. A good story without a catchy headline will been read less on the web. It should contain big numbers and they should be easily digestible. Wow… as a non journalist this is quite a challenge. And according to Alberto Cairo I need to “Try to find a focus, a headline.” In the webinar one of my favorite techniques is used: free association, with in the back of my mind the main question: what the story is about. If not just with a blank sheet of paper, I often use a MindMapping tool (in my case the fabulous Freemind) for this process.

The general scope given by Statistics Netherlands is: “Men and women from high-income households on average live about 8 and 7 years longer respectively than their counterparts in low-income households.” Far to long for a headline. In Dutch we have a proverb “Riches alone make no man happy” (or “Money isn’t everything”). This is what my associations led to. Leaving me with a number of keywords: Riches, happiness, long live, income, and 7 and 8 years. What about “How to earn an 8 years longer life?” Is it simple enough? There is another great tool that I love to browse: the urban dictionary. Wealthy has many hits: moneybags, ballers. So… “Baller gets 8 years extra”?

Gather the data … Combinations and Context

The next step is to think about the story I want to tell. In this case I will focus on the Dutch data first. My mind wanders on: it would be nice to get data for another country. There must also be some historical data on this subject. The Dutch Economic-Historical Archive (NEHA) has this kind of data, also the Statistics Netherlands have data back to 1899 in its historical series. While talking about the subject over dinner my son came up with the fact from his history class: the average life expectancy of a worker in Manchester during the industrial revolution was very low (an average of 17).

Another idea may be to link the data to the life style. There are lots of data on that subject too. On the other hand it is more difficult, and the context may be a lot harder to give. The Statistics Netherlands also mentions good health and good mental health. This may be subjects to include, but not as a main subject for the assignment for now.

So the plan…

What is the story I want to tell? There is enough historical data available. I want to tell the story of the rich, the poor, and the middle class at different moments in time. The turn of the 20th century, the 1930′s crisis, the after war period, the late 1970′s where many patterns changed, and now the 21st century. This approach will tell many stories. It will tell about prosperity, the working class, history, and many, many social elements that make a culture.

My story will be about culture and people, based on historical statistics. Now the next step is to think about the form.

This weeks assignment: data manipulation

This weeks assignment in the MOOC on infographics and data visualization by Alberto Cairo is about maps. From his new to appear book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. Like his earlier book – The Functional Art -, also this chapter is a well written piece with many great visuals.1 Alberto Cairo describes thematic maps as “the purest and most successful form of information graphics”, and I certainly do not disagree. And about the assignment? That is to use data from the data from the US Bureau of Labor Statistics and show unemployment in the US. In a way like The Guardian’s Data Blog published a story about unemployment in the US: http://www.guardian.co.uk/news/datablog/interactive/2011/sep/08/us-unemployment-obama-jobs-speech-state-map. But with more functionality and depth.

With Tableau, TileMill, or even with ESRI ArcGIS online this task is not that big. The data is easily accessible and well-organized. But this is exactly where standard mapping differs from making infographics. I am quite sure that our teacher does not want the standard map. In the first assignment I created a map and Alberto commented: “However, it doesn’t improve the original as much as it could. The reason is that you are forcing me to click on each country to get the data, rather than giving me the opportunity to explore the data in different ways, such as creating rankings, comparisons between countries, etc.”. So no standard map this time, but something that will focus on the exploration.

I decided to focus on two of the questions that are raised in the assignment:

  • What kind of graphs or maps would you need to tell a compelling story based on this?
  • How would you give context to the data?

If we are talking about unemployment during the first period of Obama there are  some nice infographics on this subject already in the run to the elections.

My approach: Geo Tagging

The latest data available is from September 2012 and looking at that data you immediately see the big differences between states. Montana, Wyoming, North and South Dakota, Nebraska. From my times being there I know the views of large (or even more than large) plains, the emptiness. The number of people per square meter must be very low. On the other hand the giant peak in California, and the higher (but not peak) values in the dataset for states along the east coast, and the big cities like Chicago, Detroit, and states like Texas, and Florida. My first impression of the dataset is that it is not averaged by population density.

In order to map the data, and to show the population – unemployment relation the data must be geo-tagged. Since the data is ordered by state, and the state codes are given this is not a difficult task. MaxMind offers a nice table of states and their longitude and latitude. By combining this data with the given data set I now at least have point data that is geo referenced. And so it can be mapped with a centroid.

Then I started to play with tableau public. Within the map option of the software there are several settings and datasets preloaded, population, population by race, occupations.

What appears to me is that population density and mixture of race both have effects on the figures. States with many big cities seem to have a higher unemployment rate. So a first step would be to map the data against the population density.

The Census Bureau Data

The United States Census Bureau has a nice dataset for the census of April 2010. Although the census data is for the full population, including those people that are too young to work, or those that are retired the figures change already with the first quick lay-out. Rhode Island that at first was a small dot, now suddenly becomes one of the largest. On the other hand California that seemed to be the state with an incredible high unemployment rate now has become an average player. All in all we see how the differences have become smaller when it comes to the percentage of the total population that is unemployed.

Conclusion and the final result

So “to tell a compelling story” I made an interactive infographic where the map is a main element on the page. From this map you can click on a state and see the unemployment figures for 10 periods in time: at the start of the first period of Obama, and then each year, until just before his re-election. The context is that data is multi interpretable, even though everyone knows the facts. By leaving out specific details data manipulation becomes a word with a double meaning.

ps. Did I tell you that subscription to the second course, starting in January is open? You can subscribe here: Knight Center for Journalism in the Americas’s Distance Learning program.

1 In the first version this paragraph read: “From his book we have to read the fourth chapter on Cartography for Journalists, or as the chapter title reads: Thematic Maps, Statistics and Cartography Meet. The book is well written with many great visuals and the same is true in this chapter.”

Happy GISDay

A week ago (on Thursday) someone tweeted “happy postgisday”, and yes, it was the day after GISDay. This yearly event is, as is stated on the GISDay website: “The annual salute to geospatial technology and its power to transform and better our lives”. Looking at the event map as published on the website I was amazed by two things, firstly the wide spread of events all over the globe, secondly that there was no event planned in The Netherlands, although there exists a good and well organized GIS community. And I did not organize any event either.

My thought was: What could I have organized to bring GIS to a wider audience? The following themes come to my mind:

The application of GIS in area’s where you do not expect it.

Not so long ago, about a decade or so, GIS mainly took place in the drawing room. Networks were not any longer designed and maintained on the large drawing boards with pen and paper. In the GIS era these drawing boards were replaced by digitizer boards and large monitors, and the blueprints were replaced by bits and bytes. With all this development answering questions on the assets became easier. Examples of questions you can answer in this context are: What is the current state of our network, what type of asset had the biggest interference sensitivity over the last period, what customers should be informed about the upcoming repair work?

As said in an earlier blog post on this subject the main shift appeared when navigation systems became more and more a commodity. Nowadays  GIS is not any longer limited only to the drawing room. We see GIS in many different contexts, and different industries, on places where you would not expect it. To tell this story may be my first presentation.

Your safety monitored with GIS

The second story is about Geo-data and boundaries. In European context Inspire is becoming more and more grown up. Inspire is the initiative that should create an infrastructure to make geo information and spatial data better accessible. When you cross a border (and in this case I am not even talking about the country borders), it may well be that the data that you find on the other side of the border is not directly usable. This can cause problems, for example when a river gets polluted, and we want to take steps to prevent the pollution to get into the drinking water supply chain. Best is to have data that can be easily exchanged between different organizations.

Different local governments store their data in different ways, this is due to for example the GIS software they use. The main result of this is that if we want to get a full overview of data available we should first create a common language. But not only we should store the spatial data in a common way, it must also be found across the different borders. So labels to the data and the datasets, the metadata, must be generalized too. In the last years we have seen a fast growth of the so called geo-portals, in the future these will be the entrance to the European data. They are a wonderful way to tell a larger audience how spatial data, and the systems storing and analyzing this data work together on monitoring safety.

The past analyzed with GIS

A growing theme in historical studies is the application of GIS to study spatio-temporal processes. Mapping differences between two or more different time periods, and showing where changes appeared. In the last decades I have published a number of these studies. For example on detecting changes in the urban landscape (how a city developed). But there is so much more that can be done on this subject. In the book “Past Time, Past Place” Anne Knowles collected a number of very good examples on how GIS can be applied in history. This book was published in 2002 and since then there has been a lot of new development. For example GIS has become better accessible and more a commodity in the historical sciences.

If we apply GIS to history we also come to the subject of story telling. With the historical datasets that we have available we can tell a story that may have been hidden before. This story can make the past more interactive, how odd this may sound. We can show the development of a town, starting from a little village on a sand ridge, and how, based on the written deeds we find in the archives, we see that over time the village grew. For example we can show the map, and how more and more streets and houses appear. In addition to this map we can add the deeds on which we base our findings to the different plots.

Next year… GISDay

Next year on GISDay (Wednesday, November 20, 2013) I would like to show small projects on these three examples, mainly to introduce GIS to a wider audience. In the mean time I will post examples here.