Monday, June 25, 2012

Hadoop cluster setup : Firewall issues


Hadoop cluster setup : Firewall issues

Expectations: This blog entry is not a step-by-step guide to setup hadoop cluster. There are numerous articles on setting up hadoop cluster. The intent of this blog is to provide a solution for a couple of issues that i have faced while setting-up the cluster (unfortuantely, i couldnt able to find a direct answer for these issues in google, so blogging over here)


Recently, i was tasked to create a new hadoop cluster on our new CentOS machines. The first time when i created cluster, i could able to create them succesfully. But with the new machines, i ran into few problems.

Issue 1 # DataNode cannot connect to NameNode
Call to master/192.168.143.xxx:54310 failed on local exception: java.net.NoRouteToHostException: No route to host

1) Configured everything & when i started the namenodes & datanodes using
# cd $HADOOP_HOME
# ./bin/start-dfs.sh

NameNode logs:

 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode up at: master/192.168.143.211:54310
2012-06-25 19:27:40,338 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting


Namenode has been started succesfully


DataNode logs:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: 

Call to master/192.168.xxx.xxx:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1063)

        at org.apache.hadoop.ipc.Client.call(Client.java:1031)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
        ... 13 more     
Caused by: java.net.NoRouteToHostException: No route to host


This clearly says that datanode machines cannot connect to namenode. So i tried hitting the namenode UI in the browser

http://192.168.143.xxx:50070 (masked IP)

Ans: Failed to connect to UI. Got timedout. Seems something wrong with the namenode.

but when i did a telnet to that (namenode) port
# telnet 192.168. xxx .xxx  50070

Trying 192.168. xxx .xxx...
Connected to 192.168. xxx .xxx.
Escape character is '^]'.

So this tells that namenode is up & running but its not available to outside. so, it seems the problem is with firewall. So tried to disable firewall on my namenode machine.

Login as a root to the namenode machine & execute the following commands.
# service iptables save
# service iptables stop
# chkconfig iptables off

After disabling the firewall, restarted the dfs. Now my datanodes can connect to my namenode & Namenode UI is also working fine.


Issue2: 
Error: java.io.IOException: File /tmp/hadoop-hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

The main cause of this problem is config (99%). This is mainly due to your conf/slaves files or your /etc/hosts entries. There are many blogs on addressing this issue. But the remaining 1% of the time, this is due to firewall issues from your datanodes. So run the above the commands to disable the firewall on your datanode machines. 

Restarted my mapred.sh & everything looks good now.

Issue3: If you are seeing the following exception while pushing a file to HDFS, then you need to disable the firewall on the slave machine.

INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host
INFO hdfs.DFSClient: Abandoning block blk_3519823924710640125_1087
INFO hdfs.DFSClient: Excluding datanode 192.168.xxx.xxx:50010


No comments:

Post a Comment