[CLIS logo]

INST 301 - Introduction to Information Science
Spring 2016
Assignment H4 - Exploring Networks


This homework is due before the start of the class session indicated on the syllabus. Partial credit may be awarded.

For this assignment your task is to look for paths between things that are described by Wikipedia pages. For the links, we will use links actually found on those Wikipedia pages.

As an example of what we will be doing, imagine trying to get from the page for the University of Maryland to the page for the White House by clicking only on links that are found each Wikipedia page. Here's one possible sequence of pages that you might visit:

  1. University of Maryland, College Park
  2. Maryland
  3. Martin O'Malley
  4. White House
That is a path with four nodes and three links, so we can say that the University of Maryland is no more than three degrees of separation from the White House in Wikipedia. Note the careful wording there -- the University of Maryland may have more or fewer degrees of separation from the White House (i.e., there may be more or fewer links in the shortest path) in some other network. Note that we don't know if this is the shortest path; just that it is one possible path. So we can claim that the University of Maryland has no more than three degrees of separation from the White House, but we do not know whether that's the shortest path. Indeed, here's a shorter one: Knowing that, we can now say that the University of Maryland is no more than two degrees of separation from the White House in Wikipedia. Which means that if someone creates a Wikipedia page for you and mentions that you are a student at the University of Maryland, you would have no more than three degrees of separation from the White House in Wikipedia. Read that again ... every word is important ... no more than ... in Wikipedia. Note also that Wikipedia is a network, but it is not a social network since it does contain people, but it also contain things that are not people. So we are using Wikipedia as a convenient way of exploring networks, but we are not actually exploring a social network in this case.

The problem of finding the shortest path in a network is well studied in computer science, so there are many systems for doing this. One that works with Wikipedia pages, which you should use for this assignment, is http://beta.degreesofwikipedia.com/. To get a feel for how it works, try out the example above. You'll quickly discover that you have to type the title of the Wikipedia page in exactly in the form that it appears on the Wikipedia page itself, so you will also want Wikipedia open as you do this. Capitalization is sometimes important, so if a search is not working, make sure you have capitalized things in the same way the title of the Wikipedia page does. Some Wikipedia pages contain errors, so once the system finds a path you should probably check any Wikipedia pages that seem to you like odd choices to see if the Wikipedia page itself is correct. One more note: don't check either of the two checkboxes (side links and date pages) because they allow the system to use links that are not very interesting.

Try out a few of your own paths -- just pick two things that should be closely related (e.g., peanut butter and chocolate) and some things you would expect not to be related (e.g., yoga and Tierra del Fuego). Try looking at the path between the same Wikipedia pages in the opposite direction -- the number of links in the path may not be the same, and the pages along the path will often be different even when the number of links is the same. We call the Wikipedia graph a "directed graph" because links from one Web page to another are often by reciprocated by a back link.

OK -- now you are ready to do the actual assignment.

  1. Find (and write down in your answer) the shortest correct path between each of the following entities (in the direction shown):
    1. From Jennifer Golbeck to Kojo Namdi
    2. From Kojo Namdi to Jennifer Golbeck
    3. From Park Geun-hye to Sergey Brin
    4. From Robert Taylor (computer scientist) [the one who worked at ARPA and helped to invent the Internet] to Robert Saxton Taylor [the one who studied information seeking behavior]
  2. Describe at least three ways in which the design of the system that we have used in this homework could be improved. At least one of these should address some improvement to the user interface design, and at least one of these should address some improvement to the functions that can be performed using the system.
Note that when I did this the path found between the two Robert Taylors is actually wrong because of an error in Wikipedia, and if that is still true when you do your assignment you would need to use the Skip Articles box to prevent the system from using the page that contains the error. Or you could simply fix the error on Wikipedia and make things easier for everyone else.

Submit your assignment using ELMS.