Skip to content

thespecguy/hadoop_cluster_using_ansible

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop Setup Using Ansible

Ansible_Hadoop

Content

  • About Ansible
  • About Hadoop
  • Project Understanding


About Ansible

What is Ansible ?

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.

Why use Ansible ?

  1. Simple
    • Human readable automation
    • No special coding skills needed
    • Tasks executed in order
    • Get productive quickly

  2. Powerful
    • App deployment
    • Configuration management
    • Workflow orchestration
    • Orchestrate the app lifecycle

  3. Agentless
    • Agentless architecture
    • Uses OpenSSH and WinRM
    • No agents to exploit or update
    • Predictable, reliable and secure

For Ansible Documentation, visit the link mentioned below:
https://docs.ansible.com/

About Hadoop

  • Hadoop is the one of the software used for implementation of Distributed Storage and the topology used is one Master & multiple Slaves, and here protocol used between Master & Slave is known as HDFS(Hadoop Distributed File System) for file distribution among multiple file system.
  • In Hadoop, Master Node is also known as NameNode whereas Slave Node is also known as DataNode, also cluster involving single node is known as Single Node Cluster whereas in case of Multiple Node Cluster, it involves multiple nodes.

Hadoop_Cluster
For Hadoop Documentation, visit the link mentioned below:
https://hadoop.apache.org/docs/r1.2.1/

Project Understanding

Versions

  • JDK: 1.8.0_171
  • Hadoop: 1.2.1
  • Ansible: 2.9.11

Some Important Points To Be Noted:point_left:

  1. In main.yml, for both NameNode and DataNode, the change in Directory Creation executes the task(s) present within the handler.
  2. In case of NameNode, handlers stops the existing NameNode process, formats it and starts it again whereas in case of DataNode, the handler stops the existing DataNode process and starts it again.
  3. Also a Dummy Host has been created using "add_host" module to pass the NameNode's IP Address to the hosts acting as DataNode, as it needs to be specified in the DataNode's configuration file in order to set up the cluster.
  4. Template files for the configuration file of both NameNode and DataNode has been created. The files are hdfs-site.xml and core-site.xml respectively.
  5. For Playbook to be dynamic in nature, variable file i.e., vars.yml has been created that consist of variables namenode_dir and datanode_dir.
  6. Firewall and SELinux could be a hindrance for cluster setup using the above code, thereby it should be modified accordingly.
  7. The above code sets up the cluster which involves Single NameNode and Multiple DataNodes.
  8. More DataNodes could be added easily using the above code.

Output

Hadoop_Dfsadmin_Report

Hadoop_Web_Interface

Thank You 😃


LinkedIn Profile

https://www.linkedin.com/in/satyam-singh-95a266182