Following are some pitfalls and bugs that we have run into while running Hadoop. If you have a problem that isn't here please let the TA know so that we can help you out and share the solution with the rest of the class.
Symptom | Possible Problem | Possible Solution |
---|---|---|
You get an error that you cluster is in "safe mode" | Your cluster enters safe mode when it hasn't been able to verify that all the data nodes necessary to replicate your data are up and responding. Check the documentation to learn more about safe mode. |
|
You get a NoRouteToHostException in your logs or in stderr output from a command. | One of your nodes cannot be reached correctly. This may be a firewall issue, so you should report it to me. | The only workaround is to pick a new node to replace the unreachable one. Currently, I think that creusa is unreachable, but all other Linux boxes should be okay. None of the Macs will currently work in a cluster. |
You get an error that "remote host identification has changed" when you try to ssh to localhost. | You have moved your single node cluster from one machine in the Berry Patch to another. The name localhost thus is pointing to a new machine, and your ssh client thinks that it might be a man-in-the-middle attack. |
You can ask your login to skip checking the validity of
localhost. You do this by
setting NoHostAuthenticationForLocalhost
to yes
in ~/.ssh/config. You can
accomplish this with the following command:
echo "NoHostAuthenticationForLocalhost yes" >>~/.ssh/config |
Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put). | Creating directories is only a function of the NameNode, so your DataNode is not exercised until you actually want to put some bytes into a file. If you are sure that the DataNode is started, then it could be that your DataNodes are out of disk space. |
|
You try to run the grep example
from the QuickStart but you get an error message like
this:
java.io.IOException: Not a file: hdfs://localhost:9000/user/ross/input/conf |
You may have created a directory inside the input directory in the HDFS. For example, this might happen if you run bin/hadoop dfs -put conf input twice in a row (this would create a subdirectory in input... why?). |
The easiest way to get the example run is to just start
over and make the input anew.
bin/hadoop dfs -rmr input bin/hadoop dfs -put conf input |
Your DataNodes won't start, and you see something like
this
in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data |
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS. |
You need to do something like this:
bin/stop-all.sh rm -Rf /tmp/hadoop-your-username/* bin/hadoop namenode -formatBe VERY careful with rm -Rf |
When you try the grep example in
the QuickStart, you get an error like the following:
org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /user/ross/input |
You haven't created an input directory containing one or more text files. |
bin/hadoop dfs -put conf input |
When you try the grep example in
the QuickStart, you get an error like the following:
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/ross/output already exists |
You might have already run the example once, creating an output directory. Hadoop doesn't like to overwrite files. |
Remove the output directory before rerunning the example:
bin/hadoop dfs -rmr outputAlternatively you can change the output directory of the grep example, something like this: bin/hadoop jar hadoop-*-examples.jar \ grep input output2 'dfs[a-z.]+' |
You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work. | You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster. |
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \ -mapper $HOME/proj/hadoop/multifetch.py \ -reducer $HOME/proj/hadoop/reducer.py \ -input urls/* \ -output titles |