How to Setup and Configure Hadoop CDH5 on Ubuntu 14.0.4

219

This document describes how to install and configure a Hadoop cluster on a single node on Ubuntu OS. Single machine Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The steps and procedure given in this document to install Hadoop cluster are very simple and to the point, so that you can install Hadoop very easily and within some minutes of time. Once the installation is done you can play with Hadoop and its components like MapReduce for data processing and HDFS for data storage.

Install Hadoop Cluster on a Single Node on Ubuntu OS

1 Recommended Platform

I. Platform Requirements

Operating system: Ubuntu 14.04 or later, other Linux flavors like CentOS, Redhat, etc.
Hadoop: Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

II. Configure & Setup Platform

If you are using Windows/Mac OS you can create virtual machine and install Ubuntu using VMWare Player, alternatively, you can create virtual machine and install Ubuntu using Oracle Virtual Box

2. Prerequisites

I. Install Java 8

a. Install Python Software Properties

To add the java repositories we need to download python-software-properties. To download and install python software properties run below command in terminal:

1
$ sudo apt-get install python-software-properties

NOTE: After you press “Enter”. It will ask for your password since we are using “sudo” command to provide root privileges for the installation. For any installation or configuration, we always need root privileges.

b. Add Repository

Now we will add a repository manually from where Ubuntu will install the Java. To add repository type the below command in terminal:

1
$ sudo add-apt-repository ppa:webupd8team/java

Now it will ask you to Press [Enter] to continue. Press “Enter”.

c. Update the source list

It is recommended to update the source list periodically. If you want to update, install a new package, always update the source list. The source list is a location from where Ubuntu can download and install the software. To update source list type the below command in terminal:

1
$ sudo apt-get update

When you run the above command Ubuntu updates its source list.

d. Install Java

Now we will download and install the Java. To download and install Java type the below command in terminal:

1
$ sudo apt-get install oracle-java8-installer

When you will press enter it will start downloading and installing Java.

To confirm Java installation has successfully completed or not and to check the version of your Java type the below command in terminal:

1
$ java –version

II. Configure SSH

SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure password less SSH. Password-less SSH means without a password we can login to a remote machine. Password-less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.

Read more at Data Flair