Hadoop Example Program in Java

This tutorial mirrors the Pythonic example of multifetch, but accomplishes the same task using the Hadoop Java API.

Prereqs

Same as for the Pythonic example.

What you Will Create

Again, same as the Pythonic example, except in Java.

Let's Get Right to the Code

View the source code for MultiFetch.java (opens in new window).

Notes

How to Build

mkdir multifetch_classes
javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar \
      -d multifetch_classes MultiFetch.java
jar -cvf $HOME/proj/hadoop/MultiFetch.jar -C multifetch_classes/ .

How to Initialize Data

Input URLs into the DFS in the same way described in the Pythonic example.

How to Run

bin/hadoop jar $HOME/proj/MultiFetch.jar               \
               edu.brandeis.cs147a.examples.MultiFetch \
               urls/*                                  \
               titles

What's Next?

Set up a real Hadoop cluster, or go back to the Python version of the example.