This document describes how to install and configure a Hadoop cluster on a single node on Ubuntu OS. Single machine Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The steps and procedure given in this document to install Hadoop cluster are very simple and to the point, so that you can install Hadoop very easily and within some minutes of time. Once the installation is done you can play with Hadoop and its components like MapReduce for data processing and HDFS for data storage.
Install Hadoop Cluster on a Single Node on Ubuntu OS
1 Recommended Platform
I. Platform Requirements
Operating system: Ubuntu 14.04 or later, other Linux flavors like CentOS, Redhat, etc.
Hadoop: Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)
II. Configure & Setup Platform
If you are using Windows/Mac OS you can create virtual machine and install Ubuntu using VMWare Player, alternatively, you can create virtual machine and install Ubuntu using Oracle Virtual Box
2. Prerequisites
I. Install Java 8
a. Install Python Software Properties
To add the java repositories we need to download python-software-properties. To download and install python software properties run below command in terminal:
1
|
$ sudo apt-get install python-software-properties |
NOTE: After you press “Enter”. It will ask for your password since we are using “sudo” command to provide root privileges for the installation. For any installation or configuration, we always need root privileges.
b. Add Repository
Now we will add a repository manually from where Ubuntu will install the Java. To add repository type the below command in terminal:
1
|
$ sudo add-apt-repository ppa:webupd8team/java |
Now it will ask you to Press [Enter] to continue. Press “Enter”.
c. Update the source list
It is recommended to update the source list periodically. If you want to update, install a new package, always update the source list. The source list is a location from where Ubuntu can download and install the software. To update source list type the below command in terminal:
1
|
$ sudo apt-get update |
When you run the above command Ubuntu updates its source list.
d. Install Java
Now we will download and install the Java. To download and install Java type the below command in terminal:
1
|
$ sudo apt-get install oracle-java8-installer |
When you will press enter it will start downloading and installing Java.
To confirm Java installation has successfully completed or not and to check the version of your Java type the below command in terminal:
1
|
$ java –version |
II. Configure SSH
SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure password less SSH. Password-less SSH means without a password we can login to a remote machine. Password-less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.
Read more at Data Flair