Hadoop cluster setup : Firewall issues
Expectations: This blog entry is not a step-by-step guide to setup hadoop cluster. There are numerous articles on setting up hadoop cluster. The intent of this blog is to provide a solution for a couple of issues that i have faced while setting-up the cluster (unfortuantely, i couldnt able to find a direct answer for these issues in google, so blogging over here)
Recently, i was tasked to create a new hadoop cluster on our new CentOS machines. The first time when i created cluster, i could able to create them succesfully. But with the new machines, i ran into few problems.
Issue 1 # DataNode cannot connect to NameNode
Call to master/192.168.143.xxx:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
1) Configured everything & when i started the namenodes & datanodes using
# cd $HADOOP_HOME
# ./bin/start-dfs.sh
NameNode logs:
INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode up at: master/192.168.143.211:54310
2012-06-25 19:27:40,338 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting
Namenode has been started succesfully
DataNode logs:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Call to master/192.168.xxx.xxx:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1063)
at org.apache.hadoop.ipc.Client.call(Client.java:1031)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
... 13 more
Caused by: java.net.NoRouteToHostException: No route to host
This clearly says that datanode machines cannot connect to namenode. So i tried hitting the namenode UI in the browser
http://192.168.143.xxx:50070 (masked IP)
Ans: Failed to connect to UI. Got timedout. Seems something wrong with the namenode.
but when i did a telnet to that (namenode) port
# telnet 192.168. xxx .xxx 50070
Trying 192.168. xxx .xxx...
Connected to 192.168. xxx .xxx.
Escape character is '^]'.
So this tells that namenode is up & running but its not available to outside. so, it seems the problem is with firewall. So tried to disable firewall on my namenode machine.
Login as a root to the namenode machine & execute the following commands.
# service iptables save
# service iptables stop
# chkconfig iptables off
After disabling the firewall, restarted the dfs. Now my datanodes can connect to my namenode & Namenode UI is also working fine.
Issue2:
Error: java.io.IOException: File /tmp/hadoop-hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
The main cause of this problem is config (99%). This is mainly due to your conf/slaves files or your /etc/hosts entries. There are many blogs on addressing this issue. But the remaining 1% of the time, this is due to firewall issues from your datanodes. So run the above the commands to disable the firewall on your datanode machines.
Restarted my mapred.sh & everything looks good now.
Issue3: If you are seeing the following exception while pushing a file to HDFS, then you need to disable the firewall on the slave machine.
INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
INFO hdfs.DFSClient: Abandoning block blk_3519823924710640125_1087
INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
INFO hdfs.DFSClient: Abandoning block blk_3519823924710640125_1087
INFO hdfs.DFSClient: Excluding datanode 192.168.xxx.xxx:50010