Hello All,

While writing the Series of Big Data Extension Deployment we must need to know the Node which will  deploy as a Big Data Component like there working nomenclature Description etc. So lets check it out Today.

Node Architecture

DataMaster Node Group

The DataMaster node is a virtual machine that runs the Hadoop NameNode service. This node manages HDFS data and assigns tasks to Hadoop JobTracker services deployed in the worker node group. Select a resource template from the drop-down menu, or select Customize to customize a resource template. For the master node, use shared storage so that you protect this virtual machine with vSphere HA and vSphere FT.

ComputeMaster Node Group

The ComputeMaster node is a virtual machine that runs the Hadoop JobTracker service. This node assigns tasks to Hadoop TaskTracker services deployed in the worker node group. Select a resource template from the drop-down menu, or select Customize to customize a resource template. For the master node, use shared storage so that you protect this virtual machine with vSphere HA and vSphere FT.

HBaseMaster Node Group (HBase cluster only)

The HBaseMaster node is a virtual machine that runs the HBase master service. This node orchestrates a cluster of one or more RegionServer slave nodes. Select a resource template from the drop-down menu, or select Customize to customize a resource template. For the master node, use shared storage so that you protect this virtual machine with vSphere HA and vSphere FT.

Worker Node Group

Worker nodes are virtual machines that run the Hadoop DataNode, TaskTracker, and HBase HRegionServer services. These nodes store HDFS data and execute tasks. Select the number of nodes and the resource template from the drop-down menu, or select Customize to customize a resource template. For worker nodes, use local storage. NOTE You can add nodes to the worker node group by using Scale Out Cluster. You cannot reduce the number of nodes.

Client Node Group

A client node is a virtual machine that contains Hadoop client components. From this virtual machine you can access HDFS, submit MapReduce jobs, run Pig scripts, run Hive queries, and HBase commands. Select the number of nodes and a resource template from the drop-down menu, or select Customize to customize a resource template. NOTE You can add nodes to the client node group by using Scale Out Cluster. You cannot reduce the number of nodes.

The Serengeti Management Server clones the template virtual machine to create the nodes in the cluster. When each virtual machine starts, the agent on that virtual machine pulls the appropriate Big Data Extensions software components to that node and deploys the software.

Hope This information will be useful to do the Future Deployment. Happy Reading.


2 thoughts on “Big Data Extension :Node Nomenclature, Description & Architecture

  1. Pingback: Part3 : Preparation for Big Data Cluster Deployment – VMwareMinds

  2. Pingback: Part 4– Final Big Data Cluster Deployment Step-By-Step – VMwareMinds

Leave a Reply