Dell High Performance Computing Solution Resources Owner's manual

  • Hello! I am an AI chatbot trained to assist you with the Dell High Performance Computing Solution Resources Owner's manual. I’ve already reviewed the document and can help you find the information you need or explain it in simple terms. Just ask your questions, and providing more details will help me assist you more effectively!
Ready Solutions Engineering Test Results
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
the property of their respective owners. Published in the USA. Dell EMC believes the information in this document is accurate as of its publication date. The information is
subject to change without notice.
1
NAMD Performance Analysis on Skylake
Architecture
Author: Joseph Stanfield
The purpose of this blog is to provide a comparative performance analysis of the Intel® Xeon® Gold 6150 processor and the previous
generation Xeon® E5-2697 v4 processors using the NAMD benchmark. The Xeon® Gold 6150 CPU features 18 physical cores or 36
logical cores when utilizing hyper threading. This processor is based on Intel’s new micro-architecture codenamed “Skylake”. Intel
significantly increased the L2 cache per core from 256 KB on Broadwell to 1 MB on Skylake. The 6150 also touts 24.75 MB of L3 cache
and a six channel DDR4 memory interface.
Nanoscale Molecular Dynamics (NAMD) is an application developed using the Charm++ parallel programming model for molecular
dynamics simulation. It is popular due to its parallel efficiency, scalability, and the ability to simulate millions of atoms.
.
Test Cluster Configurations:
Dell EMC PowerEdge
C6420
Dell EMC PowerEdge C6320
CPU
2x Xeon® Gold 6150 18c 2.7 GHz
(Skylake)
2x Xeon® E5-2697 v4 16c 2.3 GHz
(Broadwell)
RAM
12x 16GB @2666 MHz 8x 16GB @2400 MHz
HDD
1TB SATA 1 TB SATA
RHEL 7.3 RHEL 7.3
InfiniBand
EDR ConnectX-4 EDR ConnectX-4
CHARM++
6.7.1
NAMD
2.12_Source
BIOS Options
Settings
System Profile
Performance Optimized
Logical Processor
Disabled
Virtualization Technology
Disabled
The benchmark dataset selected for this series of tests was the Satellite Tobacco Mosaic Virus, or STMV. STMV contains 1,066,628
atoms, which makes it ideal for demonstrating scaling to large clustered environments. The performance is measured in nanoseconds
per day (ns/day), which is the number of days required to simulate 1 nanosecond of real-time. A larger value indicates faster performance.
The first series of benchmark tests conducted were to measure the CPU performance. The test environment consisted of a single node,
two nodes, four nodes, and eight nodes with the NAMD STMV dataset run three times for each configuration. The network interconnect
between the nodes used was EDR InfiniBand as noted in the table above. Average results from a single node showed 0.70 ns/day. While
for a two-node run performance increased by 80% to 1.25 ns/days. The trend of an average of 80% increase in performance for each
doubling of node count remained relatively consistent as the environment was scaled to eight nodes, as seen in Figure 1.
Ready Specs
2
Figure 1.
The second series of benchmarks were run to compare the Xeon® Gold 6150 against the previous generation Xeon® E5-2697v4. The
same dataset, STMV was used for both benchmark environments. As you can see below in Figure 2, the Xeon® Gold CPU results
surpass the Xeon E5 V4 by 111% on a single node, and the relative performance advantage decreases to 63% at eight nodes.
Figure 2.
0.7
1.3
2.3
4.3
80%
87%
83%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 Node 2 Node 4 Node 8 Node
0.0
1.0
2.0
3.0
4.0
5.0
6.0
PERFORMANCE INCREASE SCALED
NS/DAYS (HIGHERIS BETTER)
Xeon® Gold 6150
NAMD stmv
0.3
0.7
1.3
2.6
0.7
1.3
2.3
4.3
112%
82%
78%
63%
0%
20%
40%
60%
80%
100%
120%
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
1Node 2Node 4Node 8 Node
PERFORMANCE INCREASE OVER E5-2697 V4
NS/DAYS (HIGHER IS BETTER)
Xeon® E5 v4 VS Xeon® Gold
NAMD stmv
E5-2697 v4 Xeon Gold 6150
Ready Specs
3
Summary
In this blog, we analyzed and presented the performance of a Dell EMC PowerEdge C6420 cluster scaling from a single node to eight
nodes running NAMD with the STMV dataset. Results show that performance of NAMD scales linearly with the increased number of
nodes.
At the time of publishing this blog, there is an issue with the Intel Parallel Studio v, 2017.x and NAMD compilation. Intel recommends
using Parallel Studio 2016.4 or 2018 (which is still in beta) with -xCORE-AVX512 under the FLOATOPS variable for best performance.
A comparative analysis was also conducted with the previous generation Dell EMC PowerEdge C6320 server and Xeon® E5 v4
(Broadwell) processor. The Xeon® Gold outperformed the E5 V4 by 111% and maintained a linear performance increase as the cluster
was scaled and the number of nodes multiplied.
Resources
Intel NAMD Recipe: https://software.intel.com/en-us/articles/building-namd-on-intel-xeon-and-intel-xeon-phi-processor
Intel Fabric Tuning & Application Performance: https://www.intel.com/content/www/us/en/high-performance-computing-
fabrics/omni-path-architecture-application-performance-mpi.html
/