Tuesday, November 8, 2016

Setup Hive to run on Ubuntu 15.04

This is tested on hadoop-2.7.3, and apache-hive-2.1.0-bin.

Improvement on Hive documentation : https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Step 1

Make sure Java is installed

Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html

Step 2

Make sure Hadoop is installed & running

Instruction : http://suhothayan.blogspot.com/2016/11/setting-up-hadoop-to-run-on-single-node_8.html

Step3 

Add Hive and Hadoop home directories and paths

Run

$ gedit ~/.bashrc

Add flowing at the end (replace {hadoop path} and {hive path} with proper directory locations)

export HADOOP_HOME={hadoop path}/hadoop-2.7.3

export HIVE_HOME={hive path}/apache-hive-2.1.0-bin
export PATH=$HIVE_HOME/bin:$PATH

Run

$ source ~/.bashrc

Step4

Create /tmp and hive.metastore.warehouse.dir and set executable permission create tables in Hive. (replace {user-name} with system username)

hadoop-2.7.3/bin/hadoop fs -mkdir /tmp
$ hadoop-2.7.3/bin/hadoop fs -mkdir /user
$ hadoop-2.7.3/bin/hadoop fs -mkdir /user/{user-name}
$ hadoop-2.7.3/bin/hadoop fs -mkdir /user/{user-name}/warehouse
$ hadoop-2.7.3/bin/hadoop fs -chmod 777 /tmp
$ hadoop-2.7.3/bin/hadoop fs -chmod 777 /user/{user-name}/warehouse

Step5

Create hive-site.xml 

$ gedit apache-hive-2.1.0-bin/conf/hive-site.xml

Add following (replace {user-name} with system username):

<configuration>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/{user name}/warehouse</value>
  </property>
</configuration>


Copy hive-jdbc-2.1.0-standalone.jar to lib

cp apache-hive-2.1.0-bin/jdbc/hive-jdbc-2.1.0-standalone.jar apache-hive-2.1.0-bin/lib/

Step6

Initialise Hive with Derby, run:

$ ./apache-hive-2.1.0-bin/bin/schematool -dbType derby -initSchema

Step7

Run Hiveserver2:

$ ./apache-hive-2.1.0-bin/bin/hiveserver2

View hiveserver2 logs: 

tail -f /tmp/{user name}/hive.log

Step8

Run Beeline on another terminal:

$ ./apache-hive-2.1.0-bin/bin/beeline -u jdbc:hive2://localhost:10000

Step9

Enable fully local mode execution: 

hive> SET mapreduce.framework.name=local;

Step10

Create table :

hive> CREATE TABLE pokes (foo INT, bar STRING);

Brows table 

hive> SHOW TABLES;

Setting up Hadoop to run on Single Node in Ubuntu 15.04

This is tested on hadoop-2.7.3.

Improvement on Hadoop documentation : http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

Step 1 

Make sure Java is installed

Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html

Step 2

Install pre-requisites

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Step 3

Setup Hadoop

$ gedit hadoop-2.7.3/etc/hadoop/core-site.xml

Add (replace {user-name} with system username, E.g "foo" for /home/foo/)

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
<name>hadoop.proxyuser.{user-name}.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.{user-name}.hosts</name>
        <value>*</value>
    </property>
</configuration>

$ gedit hadoop-2.7.3/etc/hadoop/hdfs-site.xml 

Add 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Step 4

Run

$ ssh localhost 

If it requested for password, run:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Try ssh localhost again.
If it still asks for password, run following and try again:

$ ssh-keygen -t rsa
#Press enter for each line
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod og-wx ~/.ssh/authorized_keys 

Step 5

Clean namenode

$ ./hadoop-2.7.3/bin/hdfs namenode -format

Step 6 * Not provided in Hadoop Documentation 

Replace ${JAVA_HOME} with hardcoded path in hadoop-env.sh

gedit hadoop-2.7.3/etc/hadoop/hadoop-env.sh

Edit the file as 

# The java implementation to use.
export JAVA_HOME={path}/jdk1.8.0_111

Step 7

Start Hadoop 

$ ./hadoop-2.7.3/sbin/start-all.sh

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode;

http://localhost:50070/

Step 8

Check processors running by running:

$ jps

Output: 

xxxxx NameNode
xxxxx ResourceManager
xxxxx DataNode
xxxxx NodeManager
xxxxx SecondaryNameNode

Step 9

Make HDFS directories for MapReduce jobs:

$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user/{user-name}