n
Verify that the Big Data Extensions features that you want to use, such as data-compute separated
clusters and elastic scaling, are supported by Big Data Extensions for the Hadoop distribution that you
want to use. See “Big Data Extensions Support for Hadoop Features By Distribution,” on page 14.
n
Understand which features are supported by your Hadoop distribution. See “Hadoop Feature Support
By Distribution,” on page 16.
Procedure
1 Do one of the following.
n
Install Big Data Extensions for the first time. Review the system requirements, install vSphere, and
install the Big Data Extensions components: Big Data Extensions vApp, Big Data Extensions plug-
in for vCenter Server, and Serengeti Remote Command-Line Interface Client. See Chapter 2,
“Installing Big Data Extensions,” on page 19.
n
Upgrade Big Data Extensions from a previous version. Perform the upgrade steps. See Chapter 3,
“Upgrading Big Data Extensions,” on page 33.
2 (Optional) Install and configure a distribution other than Apache Hadoop for use with Big Data
Extensions.
Apache Hadoop is included in the Serengeti Management Server, but you can use any Hadoop
distribution that Big Data Extensions supports. See Chapter 4, “Managing Hadoop Distributions,” on
page 43.
What to do next
After you have successfully installed and configured your Big Data Extensions environment, you can
perform the following additional tasks, in any order.
n
Stop and start the Serengeti services, create user accounts, manage passwords, and log in to cluster
nodes to perform troubleshooting. See Chapter 5, “Managing the Big Data Extensions Environment,” on
page 63.
n
Manage the vSphere resource pools, datastores, and networks that you use to create Hadoop and HBase
clusters. See Chapter 6, “Managing vSphere Resources for Hadoop and HBase Clusters,” on page 69.
n
Create, provision, and manage Hadoop and HBase clusters. See Chapter 7, “Creating Hadoop and
HBase Clusters,” on page 75 and Chapter 8, “Managing Hadoop and HBase Clusters,” on page 83.
n
Monitor the status of the clusters that you create, including their datastores, networks, and resource
pools, through the vSphere Web Client and the Serengeti Command-Line Interface. See Chapter 9,
“Monitoring the Big Data Extensions Environment,” on page 97.
n
On your Big Data clusters, run HDFS commands, Hive and Pig scripts, and MapReduce jobs, and access
Hive data. See Chapter 10, “Using Hadoop Clusters from the Serengeti Command-Line Interface,” on
page 103.
n
If you encounter any problems when using Big Data Extensions, see Chapter 12, “Troubleshooting,” on
page 111.
VMware vSphere Big Data Extensions Administrator's and User's Guide
12 VMware, Inc.