Setting up a Single-Node Hadoop "Cluster" on Windows XP

This tutorial will help you set up Hadoop to run on your personal Windows computer.

Setting up a single-node hadoop cluster on Windows XP is very similar to setting it up on a linux machine as described in the Hadoop Quickstart.

The only difference is you will have to simulate a linux system on your windows machine. In order to achieve this, you will first need to install Cygwin and openSSH on your windows. You can find the Cygwin setup program here. Download the setup and click to run. Make sure you have selected to install the openSSH package before it starting downloading packages — when you come to the package selecting step click the "View" button until it says "Full" and scroll down the list to find "openSSH" and select it for installing (its default option might be already selected for installing, but do this step to make sure it will be installed on your machine); for the other packages you can go with default.

Make sure you have installed JDK version 5. If not, you may find it and download from here.

Suppose you have installed JDK 5 and its installation path is c:\Program Files\java1.5, you need to let hadoop know this by setting the value of JAVA_HOME in conf\hadoop-env.sh to /cygdrive/c/"Program Files"/java1.5, that is

JAVA_HOME = /cygdrive/c/"Program Files"/java1.5

As another example, If your JDK installation path is d:\java1.5, then you will need to set JAVA_HOME like this

JAVA_HOME = /cygdrive/d/java1.5

You'd better do this editing work in the Cygwin environment because if you edit the hadoop-env.sh file and then save it using some editor in windows, it will automatically change the new line character "\n" to "\r\n". The "\r\n" will not be correctly parsed when the Cygwin is trying to execute the hadoop-env.sh

After setting up the JAVA_HOME in conf\hadoop-env.sh, you will need to make sure two things:

sshd is running in the background; and,
you can ssh to the localhost without inputting a passphrase.

If you have never run sshd before on your Cygwin, you may need to generate a config file by running

ssh-host-config

When ssh-host-config is being executed, it might or might not pause and prompt you to enter a value for "CYGWIN=", enter "ntsec tty" (without quotation marks) if it prompts you to enter the value. It will pause and prompt you to answer "Should privilege separation be used?", to save tons of trouble I recommend you to answer "no". It may also ask if you want to install ssh as a service, say "yes".

Now you can run the following command to start sshd

/usr/sbin/sshd

To make sure sshd is running, check the process status and see if sshd is listed

ps | grep sshd

If sshd is running, then you can try to ssh to your localhost

ssh localhost

If you cannot ssh to your localhost, close your antivirus software or at least unblock port 22 for ssh and try again

If it asks for a passphrase to ssh to the localhost, press "ctrl + c" and type the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now at this point for the rest part you can go with the tutorial at the quickstart.

In our experience, you may find that Hadoop reports that there is no free space on the disk if you have less than 2GB actually free.

This tutorial has only been tested on Windows XP SP2. If you have another version of Windows, please let us know if this tutorial works for you.

Additional Documentation

You may want to look at the following documentations of openSSH in Cygwin:

/usr/share/doc/Cygwin/openssh.README
/usr/share/doc/openssh/README.privsep

This webpage also provides a good tutorial for setting up ssh and sshd on Cygwin.

Please contact the TA if you are having trouble running Hadoop.

What's Next?

Go back to the main Single-Node setup page if you also want to set up Hadoop on a Berry patch machine or other non-Windows box. Otherwise, continue on to write and test a small Hadoop program.