n
To prevent having to open a network firewall port to access Hadoop
services, log into the Hadoop client node, and from that node you can
access your cluster.
n
To connect to the internet (for example, to create an internal yum
repository from which to install Hadoop distributions), you may use a
proxy.
n
To enable communications, be sure that firewalls and web filters do not
block the Serengeti Management Server or other Serengeti nodes.
Direct Attached Storage
Attach and configure direct attached storage on the physical controller to
present each disk separately to the operating system. This configuration is
commonly described as Just A Bunch Of Disks (JBOD). Create VMFS
datastores on direct attached storage using the following disk drive
recommendations.
n
8-12 disk drives per host. The more disk drives per host, the better the
performance.
n
1-1.5 disk drives per processor core.
n
7,200 RPM disk Serial ATA disk drives.
Do not use
Big Data Extensions in
conjunction with
vSphere Storage DRS
Big Data Extensions places virtual machines on hosts according to available
resources, Hadoop best practices, and user defined placement policies prior
to creating virtual machines. For this reason, you should not deploy
Big Data Extensions on vSphere environments in combination with Storage
DRS. Storage DRS continuously balances storage space usage and storage I/O
load to meet application service levels in specific environments. If Storage
DRS is used with Big Data Extensions, it will disrupt the placement policies
of your Big Data cluster virtual machines.
Migrating virtual
machines in vCenter
Server may disrupt the
virtual machine
placement policy
Big Data Extensions places virtual machines based on available resources,
Hadoop best practices, and user defined placement policies that you specify.
For this reason, DRS is disabled on all the virtual machines created within
the Big Data Extensions environment. While this prevents virtual machines
from being automatically migrated by vSphere, it does not prevent you from
inadvertently moving virtual machines using the vCenter Server user
interface. This may break the Big Data Extensions defined placement policy.
For example, this may disrupt the number of instances per host and group
associations.
Resource Requirements
for the vSphere
Management Server and
Templates
n
Resource pool with at least 27.5GB RAM.
n
40GB or more (recommended) disk space for the management server
and Hadoop template virtual disks.
Resource Requirements
for the Hadoop Cluster
n
Datastore free space is not less than the total size needed by the Hadoop
cluster, plus swap disks for each Hadoop node that is equal to the
memory size requested.
n
Network configured across all relevant ESXi hosts, and has connectivity
with the network in use by the management server.
n
vSphere HA is enabled for the master node if vSphere HA protection is
needed. To use vSphere HA or vSphere FT to protect the Hadoop master
node, you must use shared storage.
Chapter 2 Installing Big Data Extensions
VMware, Inc. 19