INST 301
Introduction to Information Science
Spring 2016
Assignment H8
Open Data


This assignment is due before class on the date indicated on the schedule.

The goal of this assignment is to gain some appreciation for the potential benefits and the potential challenges of working with open data. You will do this by use one or more open data repositories to find two different datasets that come from two different agencies that can be used to make useful comparisons.

The first thing to do is to select one or more open data repositories. The one we used in class Data.gov, would be one obvious choice, and you could do is entire assignment using that one repository if you like. But other governments also run open data repositories, and there are also open data repositories that are run by non-government sources (see, for example, a list from Simmons College). And a Web search for open data will find you many more.

The next thing to do is to figure out what sort of questions you might want to explore. For example, I might interested in how the NASA budget compares to the global spending on chewing gum in each of the years between 1958 and 2016. To answer that question I might need to find the NASA budget every year, and the estimated spending on chewing gum in each country for each year. Well, that's too big a question (since there are lots of countries), but I could perhaps find the chewing gum spending data for one country (lets say, France). Of course, that would initially be in French Francs, and later in Euros, so I would also need to know the exchange rates between Francs or Euros and US Dollars every year, and for that I would need to decide on a date to choose to make the conversion between the exchange rate varies ever day (and indeed it changes over the course of a day). Answering the question would thus require that I identify sources of the data that I want to compare, and also how to put that data into a comparable form, and that might in turn require that I find some additional data.

Now I don't actually recommend that you start with a question that specific because you may find that the data you need is not yet available as open data. So instead I suggest that you start with one one part of your question (e.g., I want to compare NASA's budget to something that will help people to develop a personal understanding of whether it is big or small) and then look for some data that would let you make that comparison. As another example, you might set out to find out which countries in Europe have the highest educational attainment, and then looking into that you might find that there is data on the number of Masters degrees awarded each year by public universities in three European countries and so you might decide instead to compare those figures to population statistics for the same country). The bottom line is that you should find a question that you can answer with the available data, and one way to do that is to start with a question and then once you find some related data to sharpen or alter your question to one that you can answer using that data.

The next thing to do is to answer the question. To do this you should get the data, do whatever conversions are needed, and process it in whatever way you need to to obtain an answer. You might use Excel if you are dealing with calculations, or Base (or Microsoft Access) is you are deadline with relational data, or you might just make tables in Word that illustrate the comparisons you want to make (which you might compute using a calculator, if conversions are needed).

You should turn in a document (e.g., written using Microsoft Word) in which you state the question you answered (not necessarily the one you started out with -- no need to recap your entire journey), the sources of the data you found (describe them with a few words and then for each data file that you used provide the URL to the specific data file (or in some other way tell us precisely hot to get it ourselves), then tell us how you did any computations that were needed (e.g., maybe you had to add up monthly statistics for bumble bee deaths to get annual statistics for bumble bee deaths so that you could compare bumble bee deaths to annual global temperature increases). Finally, show us the answer to the question (which might be a table, or a graph, or some qualitative assessment; for example if you wanted to know which country had the highest educational attainment, the answer might just be the name of that country).

There's a real risk that this assignment might take you longer than it is worth if you start with too complicated a question. So don't ask questions like whether the number of neutrinos detected each year in Antarctica is a good predictor of fish catches anywhere in the world -- that might be a good question, but not for a homework assignment! Start with a simple part of a question, poke around to find some related data, then choose some question that you can answer with that date. You should be able to do a fine job on this assignment in 3 hours, so if an hour has passed and you're not yet making good progress it is probably time to simplify your question!

Note, however, the requirement that the data that you use must come from two different agencies. The reason for that is that data from the same agency is often designed in a way that makes the comparison straightforward. For example, if I obtain inflation data from the Bureau of Labor Statistics, I would find that all of the different measures of inflation are in tables that follow exactly the same format. Comparisons are simple in that case. But we want you to see the true complexity of working with open data, so we require you to also use data from some other source (for example, you could compare inflation rates to population growth data that is available from the Census bureau.

One last requirement is that you may not use any of the exact questions that are offered here as examples. For example, you could compare the NASA budget to something, but not to bubble gum sales in France, and you could compare inflation statistics to something, but not to population growth data from the census department. We want you to be creative and to explore questions that are of interest to you.

Submit your assignment as a Word document on ELMS. You may also submit additional documents (e.g., a spreadsheet) if you wish, but that is not required.


Doug Oard
Last modified: Sun Mar 27 15:27:08 2016