When i installed hadoop, i started with all default settings & everything was making use of /tmp directories. The default locations are non-advisable & should be immediately changed for real practical use.
Here is the table which lists the dedault location & suggested location (assuming you have already created hadoop user).
Directory | Description | Default location | Suggested location |
---|---|---|---|
HADOOP_LOG_DIR | Output location for log files from daemons | ${HADOOP_HOME}/logs | /var/log/hadoop |
hadoop.tmp.dir | A base for other temporary directories | /tmp/hadoop-${user.name} | /tmp/hadoop |
dfs.name.dir | Where the NameNode metadata should be stored | ${hadoop.tmp.dir}/dfs/name | /home/hadoop/dfs/name |
dfs.data.dir | Where DataNodes store their blocks | ${hadoop.tmp.dir}/dfs/data | /home/hadoop/dfs/data |
mapred.system.dir | The in-HDFS path to shared MapReduce system files | ${hadoop.tmp.dir}/mapred/system | /hadoop/mapred/system |
Majority of hadoop settings resides in xml configuration files. Prior to hadoop 0.20, everything is part of hadoop-default.xml and hadoop-site.xml. As the name itself conveys, hadoop-default xml contains all default settings & if you want to override anything; then hadoop-site.xml is the file to work on.
If you are like me (running hadoop 1.x) running later versions (any thing > 0.20), this hadoop-site.xml has been sepearted into
- core-site.xml : We specify the hostname and port of the Namenode
- hdfs-site.xml : We specify the hostname and port of the JobTracker
- mapred-site.xml : We specify the replication factor for Hdfs.
So, we can add these namenode & datanode dirctories in hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
No comments:
Post a Comment