INST 301 - Introduction to Information Science
Spring 2016
Assignment H4 - Exploring Networks
This homework is due before the start of the class session indicated
on the syllabus. Partial credit may be awarded.
For this assignment your task is to look for paths between things that
are described by Wikipedia pages. For the links, we will use links
actually found on those Wikipedia pages.
As an example of what we will be doing, imagine trying to get from the
page for the University of Maryland to the page for the White House by
clicking only on links that are found each Wikipedia page. Here's one
possible sequence of pages that you might visit:
- University of Maryland, College Park
- Maryland
- Martin O'Malley
- White House
That is a path with four nodes and three links, so we can say that the
University of Maryland is no more than three degrees of separation
from the White House in Wikipedia. Note the careful wording there --
the University of Maryland may have more or fewer degrees of
separation from the White House (i.e., there may be more or fewer
links in the shortest path) in some other network. Note that we don't
know if this is the shortest path; just that it is one possible path.
So we can claim that the University of Maryland has no more than three
degrees of separation from the White House, but we do not know whether
that's the shortest path. Indeed, here's a shorter one:
- University of Maryland, College Park
- The New York Times
- White House
Knowing that, we can now say that the University of Maryland is no
more than two degrees of separation from the White House in Wikipedia.
Which means that if someone creates a Wikipedia page for you and
mentions that you are a student at the University of Maryland, you
would have no more than three degrees of separation from the White
House in Wikipedia. Read that again ... every word is important
... no more than ... in Wikipedia. Note also that Wikipedia is
a network, but it is not a social network since it does contain people,
but it also contain things that are not people. So we are using
Wikipedia as a convenient way of exploring networks, but we are not
actually exploring a social network in this case.
The problem of finding the shortest path in a network is well studied
in computer science, so there are many systems for doing this. One
that works with Wikipedia pages, which you should use for this
assignment,
is http://beta.degreesofwikipedia.com/.
To get a feel for how it works, try out the example above. You'll
quickly discover that you have to type the title of the Wikipedia page
in exactly in the form that it appears on the Wikipedia page itself,
so you will also want Wikipedia open as you do this. Capitalization
is sometimes important, so if a search is not working, make sure you
have capitalized things in the same way the title of the Wikipedia page
does. Some Wikipedia pages contain errors, so once the system finds a
path you should probably check any Wikipedia pages that seem to you
like odd choices to see if the Wikipedia page itself is correct. One
more note: don't check either of the two checkboxes (side links and
date pages) because they allow the system to use links that are not
very interesting.
Try out a few of your own paths -- just pick two things that should be
closely related (e.g., peanut butter and chocolate) and some things
you would expect not to be related (e.g., yoga and Tierra del Fuego).
Try looking at the path between the same Wikipedia pages in the
opposite direction -- the number of links in the path may not be the
same, and the pages along the path will often be different even when
the number of links is the same. We call the Wikipedia graph a
"directed graph" because links from one Web page to another are often
by reciprocated by a back link.
OK -- now you are ready to do the actual assignment.
- Find (and write down in your answer) the shortest correct path between each of the following entities (in the direction shown):
- From Jennifer Golbeck to Kojo Namdi
- From Kojo Namdi to Jennifer Golbeck
- From Park Geun-hye to Sergey Brin
- From Robert Taylor (computer scientist) [the one who worked at ARPA and helped to invent the Internet] to Robert Saxton Taylor [the one who studied information seeking behavior]
- Describe at least three ways in which the design of the
system that we have used in this homework could be improved. At
least one of these should address some improvement to the user
interface design, and at least one of these should address some
improvement to the functions that can be performed using the system.
Note that when I did this the path found between the two Robert
Taylors is actually wrong because of an error in Wikipedia, and if
that is still true when you do your assignment you would need to use
the Skip Articles box to prevent the system from using the page that
contains the error. Or you could simply fix the error on Wikipedia
and make things easier for everyone else.
Submit your assignment using ELMS.