edu.umd.cloud9.demo
Class DemoWordCondProbTuple

java.lang.Object
  extended by edu.umd.cloud9.demo.DemoWordCondProbTuple

public class DemoWordCondProbTuple
extends Object

Demo of how to compute conditional probabilities using Tuples as intermediate keys. Input comes from Bible+Shakespeare sample collection. See also DemoWordCondProbJSON. Sample of final output:

 ...
 (admirable, *)   15.0
 (admirable, 0)   0.6
 (admirable, 1)   0.4
 (admiral, *)     6.0
 (admiral, 0)     0.33333334
 (admiral, 1)     0.6666667
 (admiration, *)  16.0
 (admiration, 0)  0.625
 (admiration, 1)  0.375
 (admire, *)      8.0
 (admire, 0)      0.625
 (admire, 1)      0.375
 (admired, *)     19.0
 (admired, 0)     0.6315789
 (admired, 1)     0.36842105
 ...
 

The first field of the key tuple contains a token. If the second field contains the special symbol '*', then the value indicates the count of the token in the collection. Otherwise, the value indicates p(EvenOrOdd|Token), the probability that a line is odd-length or even-length, given the occurrence of a token.

Expected output:

 Map input records=156215
 Map output records=3468596
 Map input bytes=9068074
 Map output bytes=163645442
 Combine input records=3468596
 Combine output records=324085
 Reduce input groups=101013
 Reduce input records=3468596
 Reduce output records=101013
 


Method Summary
static void main(String[] args)
          Runs the demo.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws IOException
Runs the demo.

Throws:
IOException