Hadoop Single Node "Cluster" Setup

These instructions will show you how to run Hadoop on a single machine, simulating a cluster by running multiple Java VMs. This setup is great for developing and testing Hadoop applications. The Hadoop website has an excellent tutorial on installing and setting up Hadoop on a single node. This document will supplement that tutorial with some tips and gotchas. You will also write a small sample program that leverages hadoop to fetch titles from web pages.

Prereqs

SSH Client and Server

Java

Java 1.5x is required. You may already have it installed; try typing java -version at the command line to see if you have it installed. It should say version "1.5.x" (e.g., 1.5.0_14).

Once Java is installed, go to the command line and type the following command:

java -version
You should see output similar to this:
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-241)
Java HotSpot(TM) Client VM (build 1.5.0_13-121, mixed mode, sharing)
If you see something to the effect of "command not found" then Java may not be on your PATH. If you see a version other than "1.5.x" then you do not have the correct version of Java installed.

If you have any trouble with Java installation or otherwise getting your computer set up for use with Hadoop, please ask the TA for help as soon as possible.

Installation

Notes and Gotchas

Interacting with HDFS

You can see the commands that HDFS allows by typing this at the command line:

bin/hadoop dfs
A non-exhaustive list of important commands to remember:

Shutting Down Hadoop

When you are done working with Hadoop, you should always shut it down. To do this, execute this command:

bin/stop-all.sh

What's Next?

Write and test a small Hadoop program.