This is tested on hadoop-2.7.3.
Improvement on Hadoop documentation : http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
Step 1
Make sure Java is installed
Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html
Step 2
Install pre-requisites
$ sudo apt-get install ssh
$ sudo apt-get install rsync
Step 3
Setup Hadoop
$ gedit hadoop-2.7.3/etc/hadoop/core-site.xml
Add (replace {user-name} with system username, E.g "foo" for /home/foo/)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.{user-name}.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.{user-name}.hosts</name>
<value>*</value>
</property>
</configuration>
$ gedit hadoop-2.7.3/etc/hadoop/hdfs-site.xml
Add
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Step 4
Run
$ ssh localhost
If it requested for password, run:
# The java implementation to use.
export JAVA_HOME={path}/jdk1.8.0_111
Step 7
Start Hadoop
$ ./hadoop-2.7.3/sbin/start-all.sh
The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode;
http://localhost:50070/
Step 8
Check processors running by running:
$ jps
Output:
xxxxx NameNode
xxxxx ResourceManager
xxxxx DataNode
xxxxx NodeManager
xxxxx SecondaryNameNode
Step 9
Make HDFS directories for MapReduce jobs:
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user/{user-name}
Improvement on Hadoop documentation : http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
Step 1
Make sure Java is installed
Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html
Step 2
Install pre-requisites
$ sudo apt-get install ssh
$ sudo apt-get install rsync
Step 3
Setup Hadoop
$ gedit hadoop-2.7.3/etc/hadoop/core-site.xml
Add (replace {user-name} with system username, E.g "foo" for /home/foo/)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.{user-name}.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.{user-name}.hosts</name>
<value>*</value>
</property>
</configuration>
$ gedit hadoop-2.7.3/etc/hadoop/hdfs-site.xml
Add
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Step 4
Run
$ ssh localhost
If it requested for password, run:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Try ssh localhost again.
If it still asks for password, run following and try again:
$ ssh-keygen -t rsa
#Press enter for each line
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod og-wx ~/.ssh/authorized_keys
Try ssh localhost again.
If it still asks for password, run following and try again:
$ ssh-keygen -t rsa
#Press enter for each line
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod og-wx ~/.ssh/authorized_keys
Step 5
Clean namenode
$ ./hadoop-2.7.3/bin/hdfs namenode -format
Step 6 * Not provided in Hadoop Documentation
Replace ${JAVA_HOME} with hardcoded path in hadoop-env.sh
$ gedit hadoop-2.7.3/etc/hadoop/hadoop-env.sh
Edit the file as
export JAVA_HOME={path}/jdk1.8.0_111
Step 7
Start Hadoop
$ ./hadoop-2.7.3/sbin/start-all.sh
The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode;
http://localhost:50070/
Step 8
Check processors running by running:
$ jps
Output:
xxxxx NameNode
xxxxx ResourceManager
xxxxx DataNode
xxxxx NodeManager
xxxxx SecondaryNameNode
Step 9
Make HDFS directories for MapReduce jobs:
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user/{user-name}