2 (Optional) Install and configure a distribution other than Apache Hadoop for use with
Big Data Extensions.
Apache Hadoop is included in the Serengeti Management Server, but you can use any Hadoop
distribution that Big Data Extensions supports.
What to do next
After you have successfully installed and configured your Big Data Extensions environment, you can
perform the following additional tasks, in any order.
n
Stop and start the Serengeti services, create user accounts, manage passwords, and log in to cluster
nodes to perform troubleshooting.
n
Manage the vSphere resource pools, datastores, and networks that you use to create Hadoop and HBase
clusters.
n
Create, provision, and manage big data clusters.
n
Monitor the status of the clusters that you create, including their datastores, networks, and resource
pools, through the vSphere Web Client and the Serengeti Command-Line Interface.
n
On your Big Data clusters, run HDFS commands, Hive and Pig scripts, and MapReduce jobs, and access
Hive data.
n
If you encounter any problems when using Big Data Extensions, see Chapter 11, “Troubleshooting,” on
page 109.
Big Data Extensions and Project Serengeti
Big Data Extensions runs on top of Project Serengeti, the open source project initiated by VMware to
automate the deployment and management of Hadoop and HBase clusters on virtual environments such as
vSphere.
Big Data Extensions and Project Serengeti provide the following components.
Project Serengeti
An open source project initiated by VMware, Project Serengeti lets users
deploy and manage big data clusters in a vCenter Server managed
environment. The major components are the Serengeti Management Server,
which provides cluster provisioning, software configuration, and
management services; an elastic scaling framework; and command-line
interface. Project Serengeti is made available under the Apache 2.0 license,
under which anyone can modify and redistribute Project Serengeti according
to the terms of the license.
Serengeti Management
Server
Provides the framework and services to run Big Data clusters on vSphere.
The Serengeti Management Server performs resource management, policy-
based virtual machine placement, cluster provisioning, software
configuration management, and environment monitoring.
VMware vSphere Big Data Extensions Administrator's and User's Guide
10 VMware, Inc.