blog: Managing Hadoop Cluster for newbies

When i installed hadoop, i started with all default settings & everything was making use of /tmp directories. The default locations are non-advisable & should be immediately changed for real practical use.

Here is the table which lists the dedault location & suggested location (assuming you have already created hadoop user).

Directory	Description	Default location	Suggested location
HADOOP_LOG_DIR	Output location for log files from daemons	${HADOOP_HOME}/logs	/var/log/hadoop
hadoop.tmp.dir	A base for other temporary directories	/tmp/hadoop-${user.name}	/tmp/hadoop
dfs.name.dir	Where the NameNode metadata should be stored	${hadoop.tmp.dir}/dfs/name	/home/hadoop/dfs/name
dfs.data.dir	Where DataNodes store their blocks	${hadoop.tmp.dir}/dfs/data	/home/hadoop/dfs/data
mapred.system.dir	The in-HDFS path to shared MapReduce system files	${hadoop.tmp.dir}/mapred/system	/hadoop/mapred/system

Majority of hadoop settings resides in xml configuration files. Prior to hadoop 0.20, everything is part of hadoop-default.xml and hadoop-site.xml. As the name itself conveys, hadoop-default xml contains all default settings & if you want to override anything; then hadoop-site.xml is the file to work on.

If you are like me (running hadoop 1.x) running later versions (any thing > 0.20), this hadoop-site.xml has been sepearted into
- core-site.xml : We specify the hostname and port of the Namenode
- hdfs-site.xml : We specify the hostname and port of the JobTracker
- mapred-site.xml : We specify the replication factor for Hdfs.

So, we can add these namenode & datanode dirctories in hdfs-site.xml

<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>

blog

Friday, June 1, 2012

Managing Hadoop Cluster for newbies

No comments:

Post a Comment