Dell High Performance Computing Solution Resources Owner's manual

  • Hello! I am an AI chatbot trained to assist you with the Dell High Performance Computing Solution Resources Owner's manual. I’ve already reviewed the document and can help you find the information you need or explain it in simple terms. Just ask your questions, and providing more details will help me assist you more effectively!
REFERENCE ARCHITECTURES OF DELL EMC
READY BUNDLE FOR HPC LIFE SCIENCES
Refresh with 14
th
Generation servers
ABSTRACT
Dell EMC’s flexible HPC architecture for Life Sciences has been through a dramatic
improvement with new Intel
®
Xeon
®
Scalable Processors. Dell EMC Ready Bundle for
HPC Life Sciences equipped with better 14G servers, faster CPUs, and more memory
bring a much higher performance in terms of throughput compared to the previous
generation especially in genomic data processing.
March 2018

2
TABLE OF CONTENTS
EXECUTIVE SUMMARY ...........................................................................................................3
AUDIENCE ........................................................................................................................................ 3
INTRODUCTION ........................................................................................................................4
SOLUTION OVERVIEW ............................................................................................................4
Architecture ....................................................................................................................................... 4
Compute and Management Components ..................................................................................................... 5
Storage Components .................................................................................................................................... 6
Network Components ................................................................................................................................. 10
Reference Architecture Case 1 ................................................................................................................... 11
Reference Architecture Case 2 ................................................................................................................... 14
Reference Architecture Case 3 ................................................................................................................... 16
Software Components ................................................................................................................................ 17
PERFORMANCE EVALUATION AND ANALYSIS ................................................................ 18
Genomics/NGS data analysis performance .................................................................................... 18
The throughput of Dell HPC Solution for Life Sciences .............................................................................. 18
Molecular dynamics simulation software performance .................................................................... 20
Molecular dynamics application test configuration ...................................................................................... 20
Amber benchmark suite .............................................................................................................................. 20
LAMMPS ..................................................................................................................................................... 22
Cryo-EM performance ..................................................................................................................... 22
CONCLUSION ........................................................................................................................ 23
REFERENCES ........................................................................................................................ 24
3
EXECUTIVE SUMMARY
Since Dell EMC announced Dell EMC HPC solution for Life Science in September 2016, the current Dell EMC Ready Bundle for HPC
Life Sciences can process 485 genomes per day
i
with 64x C6420s and Dell EMC Isilon F800 in our benchmarking. This is roughly a
two-fold improvement from Dell EMC HPC System for Life Science v.1.1 due to the introduction of Dell EMC’s14
th
generation servers
which include the latest Intel
®
Xeon
®
Scalable processors (code name: Skylake), updated server portfolio, improved memory, and
storage subsystem performance (1).
This whitepaper describes the architectural changes and updates to the follow-on of Dell EMC HPC System for Life Science v1.1. It
explains new features, demonstrates the benefits, and shows the improved performance.
AUDIENCE
This document is intended for organizations interested in accelerating genomic research with advanced computing and data
management solutions. System administrators, solution architects, and others within those organizations constitute the target audience.

4
INTRODUCTION
Although the successful completion of the Human Genome Project was announced on April 14, 2003 after a 13-year-long endeavor and
numerous exciting breakthroughs in technology and medicine, there’s still a lot of work ahead for understanding and using the human
genome. As an iterative advance in sequencing technology has accumulated, these cutting-edge technologies allow us to look at many
different perspectives or levels of genetic organization such as whole genome sequencing, copy number variations, chromosomal
structural variations, chromosomal methylation, global gene expression profiling, differentially expressed genes, and so on. However,
despite the scores of data we generated, we still face the challenge of understanding the basic biology behind the human genome and
the mechanisms of human diseases. There will not be a better time than now to evaluate if we have overlooked the current approach,
“analyze large numbers of diverse samples with the highest resolution possible” because analyzing a large number of sequencing data
alone has been unsuccessful to identify key genes/variants in many common diseases where majority of genetic variations seem to be
involved randomly. This is the main reason that data integration becomes more important than before. We have no doubt that genome
research could help fight human disease; however, combining proteomics and clinical data with genomics data will increase the
chances of winning the fight. As the life science community is gradually moving toward to data integration, Extract, Transform, Load
(ETL) process will be a new burden to an IT department (2).
Dell EMC Ready Bundle for HPC Life Sciences has been evolving to cover various needs for Life Science data analysis from a system
only for Next Generation Sequencing (NGS) data processing to a system that can be used for Molecular Dynamics Simulations (MDS),
Cryo-EM data analysis and De Novo assembly in addition to DNA sequencing data processing. Further, we are planning to cover more
applications from other areas of Life Sciences and re-design the system suitable for data integration. In this project, we tested an
additional two Dell EMC Isilon storages as the importance of finding suitable storages for variable purposes grows. This is an additional
step to build a HPC system as a tool to integrate all the cellular data, biochemistry, genomics, proteomics and biophysics into a single
frame of work and to be ready for the high-demanding era of ETL process.
SOLUTION OVERVIEW
Especially, HPC in Life Sciences requires a flexible architecture to accommodate various system requirements. Dell EMC Ready
Bundle for HPC Life Sciences was created to meet this need. It is a pre-integrated, tested, tuned, and leverages the most relevant of
Dell EMC’s high-performance computing line of products and best-in-class partner products (3). It encompasses all of the hardware
resources required for various life sciences data analysis, while providing an optimal balance of compute density, energy efficiency and
performance.
ARCHITECTURE
Dell EMC Ready Bundle for HPC Life Sciences provides high flexibility. The platform is available in three variants which are determined
by the cluster interconnect selected for the storages. In the current version, the following options are available:
Dell EMC PowerEdge™ C6420 compute subsystem with Intel
®
Omni-Path (OPA) fabric or Mellanox InfiniBand
®
(IB) EDR
fabric
o Storage choice:
Dell EMC Ready Bundle for HPC Lustre Storage as a performance scratch space
Dell EMC Ready Bundle for HPC NFS Storage as a home directory space
PowerEdge C6420 compute subsystem with 10/40GbE fabric
o Storage choice:
Either Dell EMC Isilon F800 or Dell EMC Isilon H600 as a performance scratch space
Dell EMC Ready Bundle for HPC NFS Storage as a home directory space
Add-on compute nodes:
o Dell EMC PowerEdge R940
This server covers large memory applications such as De Novo Assembly
o Dell EMC PowerEdge C4130
A server for accelerators like NVIDIA GPUs for Molecular Dynamics Simulations
In addition to the compute, network, and storage options, there are several other components that perform different functions in the Dell
EMC Ready Bundle for HPC Life Sciences. These include CIFS gateway, fat node, acceleration node and other management
components. Each of these components is described in detail in the subsequent section.

5
The solutions are nearly identical for Intel
®
OPA and IB EDR versions except for a few changes in the switching infrastructure and
network adapter. The solution ships in a deep and wide 48U rack enclosure, which helps to make PDU mounting and cable
management easier. Figure 8 shows the components of two fully loaded racks using 64x Dell EMC PowerEdge C6420 rack server
chassis as a compute subsystem, Dell EMC PowerEdge R940 as a fat node, Dell EMC PowerEdge C4130 as an accelerator node, Dell
EMC Ready Bundle for HPC NFS Storage, Dell EMC Ready Bundle for HPC Lustre Storage and Intel
®
OPA as the cluster’s high-speed
interconnect.
Compute and Management Components
There are several considerations when selecting the servers for master node, login node, compute node, fat node and accelerator
node. For master, 1U form factor Dell EMC PowerEdge R440 is recommended. The master node is responsible for managing the
compute nodes and optimizing the overall compute capacity. The login node (Dell EMC PowerEdge R640 is recommended) is used for
user access, compilations and job submissions. Usually, master and login nodes are the only nodes that communicate with the outside
world, and they act as a middle point between the actual cluster and the outside network. For this reason, high availability can be
provided for master and login nodes. An example solution is illustrated in Figure 8, and the high-speed interconnect is configured to 2:1
blocking fat tree topology with Intel
®
OPA fabric.
Ideally, the compute nodes in a cluster should be as identical as possible since the performance of parallel computation is bounded by
the slowest component in the cluster. Heterogeneous clusters do work, but it requires careful execution to achieve the best
performance; however, for Life Sciences applications, heterogeneous clusters make perfect sense to handle completely independent
workloads such as DNA-Seq, De Novo assembly or Molecular Dynamics Simulations. These workloads require quite different hardware
components. Hence, we recommend Dell EMC PowerEdge C6420 as a compute node to handle NGS data processing due to its
density, a wide choice of CPUs, and high maximum memory capacity. Dell EMC PowerEdge R940 is an optional node with 6TB of
RDIMM /LRDIMM memory and is recommended for customers who need to run applications requiring large memory such as De Novo
assembly. Accelerators are used to speed up computationally intensive applications, such as molecular dynamics simulation
applications. We tested configurations G and K for this solution.
The compute and management infrastructure consists of the following components.
Compute
o Dell EMC PowerEdge C6400 enclosure with 4x C6420 servers
CPU: 2x Intel
®
Xeon
®
Gold 6148 @2.4GHz 20 cores (Skylake)
Memory: 12 x 32GB @2666 MHz
Disk: 6x 480GB 12Gbps SAS Mixed Used SSDs
Interconnect: Intel
®
Omni-Path, IB EDR or/and 10/40GbE
BIOS System Profile: Performance Optimized
Logical Processor: Disabled
ii
Virtualization Technology: Disabled
OS: RHEL 7.4
o Dell EMC PowerEdge C4130 with up to four NVIDIA
®
Tesla™ P100 or V100 GPUs (K configuration recommended, Figure
1)
CPU: 2x Intel
®
Xeon
®
E5-2690 v4 @2.6GHz 14 cores(Broadwell)
Memory: 16x 16GB @2400 MHz
Disk: 9TB HDD
BIOS System Profile: Performance Optimized
Logical Processor: Disabled
Virtualization Technology: Disabled
OS: RHEL 7.4
o Dell EMC PowerEdge R940
CPU: 4x Intel
®
Xeon
®
Platinum 8168 @2.7GHz 24 cores (Skylake)
Memory: 48x 32GB @2666 MHz
Disk: 12x 1.2TB 10K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard Drive in RAID 0
BIOS System Profile: Performance Optimized
Logical Processor: Disabled
Virtualization Technology: Disabled
OS: RHEL 7.4

6
Management
o Dell EMC PowerEdge R440 for master node
CPU: 2x Intel
®
Xeon
®
Gold 5118 @2.3GHz 12 cores (Skylake)
Memory: 12x 8GB @2666 MHz
Disk: 10x 1.8TB 10K RPM SAS 12Gbps 512e 2.5in Hot-plug Hard Drive in RAID 10
BIOS System Profile: Performance Optimized
Logical Processor: Enabled
Virtualization Technology: Disabled
OS: RHEL 7.4
o Dell EMC PowerEdge R640 for login node and CIFS gateway (optional)
CPU: 2x Intel Xeon Gold 6132 @2.6GHz 14 cores (Skylake)
Memory: 12x 16GB @2666 MHz
Disk: 4x 1.2TB 10K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard Drive
BIOS System Profile: Performance Optimized
Logical Processor: Enabled
Virtualization Technology: Disabled
OS: RHEL 7.4
Storage Components
The storage infrastructure consists of the following components and the details of the configurations can be found from the references:
Dell EMC Ready Bundle for HPC NFS Storage (NSS7.0-HA) (4)
Dell EMC Ready Bundle for HPC Lustre Storage (4)
Dell EMC Isilon Scale-out NAS Product Family; Dell EMC Isilon F800 All-flash (5)
or Dell EMC Isilon Hybrid Scale-out NAS H600 (5)
Dell EMC Ready Bundle for HPC NFS Storage (NSS7.0-HA)
NSS 7.0 HA is designed to enhance the availability of storage services to the HPC cluster by using a pair of Dell EMC PowerEdge
servers, Dell EMC network S3048-ON and Dell EMC PowerVault storage arrays along with Red Hat HA software stack (Figure 2). The
two PowerEdge servers have shared access to disk-based Dell EMC PowerVault storage in a variety of capacities, and both servers
can be directly connected to the HPC cluster by using OPA, IB or 10GbE. The two servers are equipped with two fence devices:
iDRAC8 Enterprise, and an APC Power Distribution Unit (PDU). If failures such as storage disconnection, network disconnection, and
system stopping from functioning, etc., occur on one server, the HA cluster will failover the storage service to the healthy server with the
Figure 1 K configuration

7
assistance of the two fence devices; and ensure that the failed server does not return to life without the administrator’s knowledge or
control.
Dell EMC PowerEdge R730
o CPU: 2x Intel Xeon E5-2660 v4 @2.0 GHz 14 cores (Broadwell)
o Memory: 8x 16GB @2400 Mhz
o Disks: PERC H730 with five 300GB 15K SAS hard drives
Two drives are configured in RAID 1 for the OS
Two drives are configured in RAID 0 for swap space
The fifth drive is hot-spare for RAID 1 disk group
Dell EMC PowerVault MD3460 and Dell EMC PowerVault MD3460e
o Disk in storage array: 60x 4 TB 7.2K NL SAS hard drives
o Virtual disk configuration
RAID6 8+2, a virtual disk is created across all five drawers in a storage array, two drives per drawer
o Storage configurations
Number of stripes: 6
Figure 2 NSS7.0-HA 480TB configuration; recommended PDU is APC switched Rack
PDUs, model AP7921

8
240 TB: One PowerVault MD3460 with 6 virtual disks
480 TB: One PowerVault MD3460 + One PowerVault MD3060e with 12 virtual disks
Dell EMC Ready Bundle for HPC Lustre Storage
The Dell EMC Ready Bundle for HPC Lustre Storage Solution, referred to as Dell HPC Lustre Storage is designed for academic and
industry users who need to deploy a fully-supported, easy-to-use, high-throughput, scale-out and cost-effective parallel file system
storage solution. The solution uses the Intel
®
Enterprise Edition (EE) for Lustre
®
software v.3.0. It is a scale-out storage solution
appliance capable of providing a high performance and high availability storage system. Utilizing an intelligent, extensive and intuitive
management interface, the Intel Manager for Lustre (IML), the solution greatly simplifies deploying, managing and monitoring all of the
hardware and storage system components. It is easy to scale in capacity, performance or both, thereby providing a convenient path to
grow in the future. Figure 2shows the relationship of the MDS, MDT, MGS, OSS and OST components of a typical Lustre configuration.
Clients in the figure are the HPC cluster’s compute nodes. The solution is 120 drive system configuration (60 drives per PowerVault
MD3460) with 12 OSTs. Each OST consists of 10x 4TB drives. Total raw storage size is 480TB.
Dell EMC PowerEdge R630
Figure 3 Dell EMC Ready Bundle for HPC Lustre Storage; Components Overvie
w
and Network Configurations

9
o CPU: 2x Intel Xeon™ E5-2697 v4 @ 2.3GHz 18 cores
o Memory: 8x 16 GB @2400 MHz
2x Dell EMC PowerEdge R730 for OSS nodes
o CPU: 2x Intel Xeon™ E5-2630V4 @ 2.20GHz 10 cores
o Memory: 16x 16 GB @2133 MHz
o OSS SAS controller: 4x SAS 12 Gbps HBA LSI 9300-8e
o OSS Storage Array: 4x PowerVault MD3460
Disks: 240 3.5” 4 TB 7.2K RPM NL SAS
2x Dell EMC PowerEdge R730 for MDS nodes
o CPU: 2x Intel Xeon™ E5-2630V4 @ 2.20GHz 10 cores
o Memory: 16x 16 GB @2133 MHz
o MDS SAS controller: 2x SAS 12 Gbps HBA LSI 9300-8e
o MDS Storage Array: 1x PowerVault MD3420
Disks: 24 2.5” 300GB NL SAS
Dell EMC Isilon Scale-out NAS Product Family
The storage configurations tested are shown in Table 1.
Table 1 Tested Dell EMC Isilon Configurations
Dell EMC Isilon F800 Dell EMC Isilon H600
NUMBER OF
NODES
4 4
CPU PER
NODE
Intel
®
Xeon™ CPU E5-2697A v4 @2.60 GHz Intel
®
Xeon™ CPU E5-2680 v4 @2.40GHz
MEMORY PER
NODE
256GB 256GB
STORAGE
CAPACITY
Total usable space: 166.8 TB, 41.7 TB per node Total usable space: 126.8 TB, 31.7 TB per node
SSD L3 CACHE N/A 2.9 TB per node
NETWORK
Front end network: 40GbE
Back end network: 40GbE
Front end network: 40GbE
Back end network: IB QDR
OS Isilon OneFS v8.1.0 DEV.0 Isilon OneFS v8.1.0.0 B_8_1_0_011

10
Network Components
Dell Networking H1048-OPF
Intel
®
Omni-Path Architecture (OPA) is an evolution of the Intel
®
True Scale Fabric Cray Aries interconnect and internal Intel
®
IP [9]. In
contrast to Intel
®
True Scale Fabric edge switches that support 36 ports of InfiniBand QDR-40Gbps performance, the new Intel
®
Omni-
Path fabric edge switches support 48 ports of 100Gbps performance. The switching latency for True Scale edge switches is 165ns-
175ns. The switching latency for the 48-port Omni-Path edge switch has been reduced to around 100ns-110ns. The Omni-Path host
fabric interface (HFI) MPI messaging rate is expected to be around 160 million messages per second (MMPS) with a link bandwidth of
100Gbps.
Mellanox SB7700
This 36-port Non-blocking Managed InfiniBand EDR 100Gb/s Switch System provides the highest performing fabric solution in a 1U
form factor by delivering up to 7.2Tb/s of non-blocking bandwidth with 90ns port-to-port latency.
Dell Networking Z9100-ON
The Dell EMC Networking Z9100-ON is a 10/25/40/50/100GbE fixed switch purpose-built for applications in high-performance data
center and computing environments. 1RU high-density 10/25/40/50/100GbE fixed switch with choice of up to 32 ports of 100GbE
(QSFP28), 64 ports of 50GbE (QSFP+), 32 ports of 40GbE (QSFP+), 128 ports of 25GbE (QSFP+) or 128+2 ports of 10GbE (using
breakout cable).
Dell Networking S3048-ON
Management traffic typically communicates with the Baseboard Management Controller (BMC) on the compute nodes using IPMI. The
management network is used to push images or packages to the compute nodes from the master nodes and for reporting data from
client to the master node. Dell EMC Networking S3048-ON is recommended for management network.
Figure 4 Dell Networking H1048-OPF
Figure 6 Dell Networking Z9100-ON
Figure 7 Dell Networking S3048-ON
Figure 5 Mellanox SB7700

11
Reference Architecture Case 1
This reference architecture is configured with Intel
®
OPA fabric, Dell EMC Ready Bundle for HPC NFS Storage and Dell EMC Ready
Bundle for HPC Lustre Storage. Dell EMC Ready Bundle for HPC NFS Storage, referred to as Dell NFS Storage Solution-High
Availability configuration (NSS7.0-HA) is configure for a general usage such as a home directory while Dell EMC Ready Bundle for
HPC Lustre Storage servers as a high-performance scratch storage.
Figure 9 illustrates the details of management network. It requires three Dell Networking S3048-ON switches. Two switches in Rack 1
(Figure 8) make up the internal management network for deploying, provisioning, managing, and monitoring the cluster. One of the
switches in Rack 2 has 48 ports that are split into multiple untagged virtual LANs to accommodate multiple internal networks.
Figure 8 Dell EMC Ready Bundle for HPC Life Sciences with Intel
®
OPA fabric

12
Figure 10 shows high-speed interconnect with Intel
®
OPA. The network topology is 2:1 blocking fat tree which requires three Dell
Network H1048-OPF 48 port switches.
Figure 10 Interconnection with Intel
®
OPA; 2:1 blocking fat tree topology
Figure 9 Management Network

13

14
Reference Architecture Case 2
Dell EMC Isilon storage, F800 All-flash is configured into Dell EMC Ready Bundle for HPC Life Sciences. In this reference architecture,
the interconnect for the F800 has 8 ports of 40GbE connections. Since two 40GbE external connections in each node of the F800 can
be aggregated, the Z9100-ON switch is configured to support two-port aggregation on each node. Hence, as shown in Figure 12, two
40GbE ports are aggregate to one port channel which connects to a node in the F800.
Eight ports on each Z9100-ON switch in both racks are used to connect between the switches. All eight ports are aggregated to a single
port-channel.
The Z9100-ON switches are configured with multiple untagged virtual LANs to accommodate the backend network of the F800. Vlan2s
in the both switches are private to the backend networks of the F800. A total of eight 40GbE ports are assigned to this private network.
Figure 11 Dell EMC Ready Bundle for HPC Life Sciences with 10GbE fabric and F800

15
The management network is identical to case 1 except for the connections to the Lustre storage (Figure 13).
Figure 13 Management Network
Figure 12 Interconnection with 10GbE (1:1 Non-blocking)

16
Reference Architecture Case 3
This example shows Dell EMC Ready Bundle for HPC Life Sciences with Dell EMC Isilon Storage H600. The H600 is one of Dell EMC
Isilon Hybrid Scale-out NAS storages. The details of the storage description can be found in the next sections.
The management network for this reference architecture is identical to the one shown in Figure 13. Also, 10/40GbE and IB QDR
interconnect networks are illustrated in Figure 15. The H600 uses IB QDR as the backend network. Two 8 ports IS 5022 switches are
used to connect interconnect four storage nodes in the H600. Like the F800’s frontend network, the H600 supports 2x 40GbE
aggregate external link per node.
Figure 14 Dell EMC Ready Bundle for HPC Life Sciences with 10GbE fabric and H600

17
Software Components
Along with the hardware components, the solution includes the following software components:
Bright Cluster Manager
®
BioBuilds
Bright Cluster Manager
Bright Computing is a commercial software that provides comprehensive software solutions for deploying and managing HPC clusters,
big data clusters and OpenStack in the data center and in the cloud (6). Bright cluster Manager can be used to deploy complete
clusters over bare metal and manage them effectively. Once the cluster is up and running, the graphical user interface monitors every
single node and reports if it detects any software or hardware events.
BioBuilds
BioBuilds is a well maintained, versioned, and continuously growing collection of open-source bio-informatics tools from Lab7 (7). They
are prebuilt and optimized for a variety of platforms and environments. BioBuilds solves most software challenges faced by the life
sciences domain.
Imagine a newer version of a tool being released. Updating it may not be straight forward and would probably involve updating
all of the dependencies the software has as well. BioBuilds includes the software and its supporting dependencies for ease of
deployment.
Using BioBuilds among all of the collaborators can ensure reproducibility since everyone is running the same version of the
software. In short, it is a turnkey application package.
Figure 15 Interconnection with 10GbE (1:1 Non-blocking) and IB QDR

18
PERFORMANCE EVALUATION AND ANALYSIS
GENOMICS/NGS DATA ANALYSIS PERFORMANCE
A typical variant calling pipeline consists of three major steps 1) aligning sequence reads to a reference genome sequence; 2)
identifying regions containing SNPs/InDels; and 3) performing preliminary downstream analysis. In the tested pipeline, BWA 0.7.2-
r1039 is used for the alignment step and Genome Analysis Tool Kit (GATK) is selected for the variant calling step. These are
considered standard tools for aligning and variant calling in whole genome or exome sequencing data analysis. The version of GATK
for the tests is 3.6, and the actual workflow tested was obtained from the workshop, ‘GATK Best Practices and Beyond’. In this
workshop, they introduce a new workflow with three phases.
Best Practices Phase 1: Pre-processing
Best Practices Phase 2A: Calling germline variants
Best Practices Phase 2B: Calling somatic variants
Best Practices Phase 3: Preliminary analyses
Here we tested phase 1, phase 2A and phase 3 for a germline variant calling pipeline. The details of commands used in the benchmark
are in APPENDIX A. GRCh37 (Genome Reference Consortium Human build 37) was used as a reference genome sequence, and 30x
whole human genome sequencing data from the Illumina platinum genomes project, named ERR091571_1.fastq.gz and
ERR091571_2.fastq.gz were used for a baseline test (11).
It is ideal to use non-identical sequence data for each run. However, it is extremely difficult to collect non-identical sequence data
having more than 30x depth of coverage from the public domain. Hence, we used a single sequence data set for multiple simultaneous
runs. A clear drawback of this practice is that the running time of Phase 2, Step 2 might not reflect the true running time as researchers
tend to analyze multiple samples together. Also, this step is known to be less scalable. The running time of this step increases as the
number of samples increases. A subtle pitfall is a storage cache effect. Since all of the simultaneous runs will read/write roughly at the
same time, the run time would be shorter than real cases. Despite these built-in inaccuracies, this variant analysis performance test can
provide valuable insights to estimating how much resources are required for an identical or even similar analysis pipeline with a defined
workload.
The throughput of Dell HPC Solution for Life Sciences
Total run time is the elapsed wall time from the earliest start of Phase 1, Step 1 to the latest completion of Phase 3, Step 2. Time
measurement for each step is from the latest completion time of the previous step to the latest completion time of the current step as
illustrated in Figure 16.
Feeding multiple samples into an analytical pipeline is the simplest way to increase parallelism, and this practice will improve the
throughput of a system if a system is well designed to accommodate the sample load. In Figure 17, the throughputs in total number of
genomes per day for all tests with various numbers of 30x whole genome sequencing data are summarized. The tests performed here
are designed to demonstrate performance at the server level, not for comparisons on individual components. At the same time, the
tests were also designed to estimate the sizing information of Dell EMC Isilon F800/H600 and Dell EMC Lustre Storage. The data
points in Figure 17 are calculated based on the total number of samples (X axis in the figure) that were processed concurrently. The
number of genomes per day metric is obtained from total running time taken to process the total number of samples in a test. The
smoothed curves are generated by using a polynomial spline with the piecewise polynomial degree of 3 generating B-spline basis
matrix. The details of BWA-GATK pipeline information can be obtained from the Broad Institute web site (10).
Figure 16 Running time measurement method

19
The number of compute nodes used for the tests are 64x C6420s and 63x C6320s (64x C6320s for testing H600). The number of
samples per node was increased to get the desired total number of samples processed concurrently. For C6320 (13G), 3 samples per
node was the maximum number of samples each node can process. 64, 104, and 126 test results for 13G system (blue) were with 2
samples per node while 129, 156, 180, 189 and 192 sample test results were obtained from 3 samples per node. For C6420 (14G), the
tests were performed with maximum 5 samples per node. The plot for 14G was generated by processing 1, 2, 3, 4, and 5 samples per
node. The number of samples per node is limited by the amount of memory in a system. 128 GB and 192 GB of RAM were used in 13G
and 14G system, respectively. C6420s show a better scaling behavior than C6320s. 13G server with Broadwell CPUs seems to be
more sensitive to the number of samples loaded onto system as shown from the results of 126 vs 129 sample tests on all the storages
tested in this study.
Dell EMC PowerEdge C6420 has at least a 12% performance gain compared to the previous generation. Each C6420 compute node
with 192 GB RAM can process about seven 30x whole human genomes per day. This number could be increased if the C6420
compute node is configured with more memory. In addition to the improvement on the 14G server side, four Isilon F800 nodes in a 4U
chassis can support 64x C6420s and 320 30x whole human genomes concurrently.
Figure 17 Performances in 13/14 generation servers with Isilon and Lustre

20
MOLECULAR DYNAMICS SIMULATION SOFTWARE PERFORMANCE
Over the past decade, GPUs has become popular in scientific computing because of their great ability to exploit a high degree of
parallelism. NVIDIA has a handful of life sciences applications optimized and to run on their general-purpose GPUs. Unfortunately,
these GPUs can only be programmed with CUDA, OpenACC and the OpenCL framework. Most of the life sciences community is not
familiar with these frameworks, and so few biologists or bioinformaticians can make efficient use of GPU architectures. However, GPUs
have been making inroads into the molecular dynamics and electron microscopy fields. These fields require heavy computational work
to simulate biomolecular structures or their interactions and reconstruct 3D images from millions of 2D images generated from an
electron microscope.
We selected two different molecular dynamics applications to run tests on a PowerEdge C4130 with P100 and V100. The applications
are Amber and LAMMPS (12) (13).
Molecular dynamics application test configuration
The Dell EMC PowerEdge C4130 with Intel
®
Xeon
®
Dual E5-2690 v4 with 256 GB DDR4 2400MHz and P100 and V100 in G and K
configurations (14) (15). Unfortunately, we were not able to complete integration tests with Dell EMC Ready Bundle for HPC Life
Sciences with various interconnect settings. However, we believe the integration tests will not pose any problem since molecular
dynamics applications are not bounded by either storage I/O or inter-communication bandwidth.
The NVIDIA
®
Tesla
®
V100 accelerator is one of the most advanced accelerators available in the market right now and was launched
within one year of the P100 release. In fact, Dell EMC is the first in the industry to integrate Tesla V100 and bring it to market. As was
the case with the P100, V100 supports two form factors: V100-PCIe and the mezzanine version V100-SXM2. The Dell EMC
PowerEdge C4130 server supports both types of V100 and P100 GPU cards.
Amber benchmark suite
This suite includes the Joint Amber-Charmm (JAC) benchmark considering dihydrofolate reductase (DHFR) in an explicit water bath
with cubic periodic boundary conditions. The major assumptions are that the DHFR molecule presents in water without surface effect
and its movement assumed to follow microcanonical (NVE) ensemble which assumes constant amount of substance (N), volume (V),
and energy (E). Hence, the sum of kinetic (KE) and potential energy (PE) is conserved, in other words, Temperature (T) and Pressure
(P) are unregulated. JAC benchmark repeats simulations with Isothermal–isobaric (NPT) ensemble that assumes N, P and T are
conserved. It corresponds most closely to laboratory conditions with a flask open to ambient temperature and pressure. Beside these
settings, Particle mesh Ewald (PME) is the choice of algorithm to calculate electrostatic forces in molecular dynamics simulations. Other
biomolecules simulated in this benchmark suite are Factor IX (one of the serine proteases of the coagulation system), cellulose and
Satellite Tobacco Mosaic Virus (STMV). Here, we report the results from DHFR and STMV data.
Figure 18 V100 and P100 Topologies on C4130 configuration K
/