Tuesday, November 8, 2016

Setting up Hadoop to run on Single Node in Ubuntu 15.04

This is tested on hadoop-2.7.3.

Improvement on Hadoop documentation : http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

Step 1 

Make sure Java is installed

Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html

Step 2

Install pre-requisites

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Step 3

Setup Hadoop

$ gedit hadoop-2.7.3/etc/hadoop/core-site.xml

Add (replace {user-name} with system username, E.g "foo" for /home/foo/)

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
<name>hadoop.proxyuser.{user-name}.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.{user-name}.hosts</name>
        <value>*</value>
    </property>
</configuration>

$ gedit hadoop-2.7.3/etc/hadoop/hdfs-site.xml 

Add 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Step 4

Run

$ ssh localhost 

If it requested for password, run:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Try ssh localhost again.
If it still asks for password, run following and try again:

$ ssh-keygen -t rsa
#Press enter for each line
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod og-wx ~/.ssh/authorized_keys 

Step 5

Clean namenode

$ ./hadoop-2.7.3/bin/hdfs namenode -format

Step 6 * Not provided in Hadoop Documentation 

Replace ${JAVA_HOME} with hardcoded path in hadoop-env.sh

gedit hadoop-2.7.3/etc/hadoop/hadoop-env.sh

Edit the file as 

# The java implementation to use.
export JAVA_HOME={path}/jdk1.8.0_111

Step 7

Start Hadoop 

$ ./hadoop-2.7.3/sbin/start-all.sh

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode;

http://localhost:50070/

Step 8

Check processors running by running:

$ jps

Output: 

xxxxx NameNode
xxxxx ResourceManager
xxxxx DataNode
xxxxx NodeManager
xxxxx SecondaryNameNode

Step 9

Make HDFS directories for MapReduce jobs:

$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user/{user-name}


4 comments:

  1. Thanks for sharing the information very useful info about Hadoop and

    keep updating us, Please........

    ReplyDelete
  2. Great post! I am actually getting ready to across this information, is very helpful my friend. Also great blog here with all of the valuable information you have Keep up the good work you are doing here.Well, got a good knowledge.

    Hadoop Training in Chennai

    Dot Net Training in Chennai

    ReplyDelete
  3. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Hadoop Training in Chennai

    ReplyDelete