Eurotech Aurora Hive Development Kit Owner's manual

Type
Owner's manual
© 2015 Eurotech
Trademarks
All trademarks both marked and unmarked appearing in this document are the property of their respective
owners.
Revision history
Revision
Description
Date
Revision 1.0
First release
27 August 2015
Aurora Hive Development Kit Installation and Operation Manual Table of contents
3
AUHPC-30-20-00-DK0_InstMan_En_1.0
Table of contents
Trademarks .................................................................................................................................................................. 2
Revision history ........................................................................................................................................................... 2
Table of contents .......................................................................................................................................................... 3
List of Figures................................................................................................................................................................ 5
List of Tables ................................................................................................................................................................. 5
Important user information .......................................................................................................................................... 6
Alerts that can be found throughout this manual ......................................................................................................... 6
Safety notices and warnings ........................................................................................................................................ 7
Do not operate in an explosive atmosphere .......................................................................................................... 7
Antistatic precautions ............................................................................................................................................. 7
Connection to power supply or other devices ........................................................................................................ 7
Installation .............................................................................................................................................................. 8
Ventilation............................................................................................................................................................... 8
Maintenance ........................................................................................................................................................... 8
Cleaning ................................................................................................................................................................. 8
Life support policy ........................................................................................................................................................ 8
Warranty ...................................................................................................................................................................... 9
WEEE .......................................................................................................................................................................... 9
RoHS ........................................................................................................................................................................... 9
Technical assistance ................................................................................................................................................... 9
Transportation ........................................................................................................................................................ 9
1 Overview of the Aurora HiVe systems ............................................................................................................. 10
2 Aurora HiVe Development Kit content ............................................................................................................. 11
2.1 Development Kit shipping content .................................................................................................................. 11
2.2 Additional Peripherals ..................................................................................................................................... 12
3 DK Setup ............................................................................................................................................................. 13
3.1 Preparing the Koolance Cooling Unit ............................................................................................................. 13
3.2 Connecting the Server to the Koolance Cooling unit ...................................................................................... 14
3.3 Switching on the KCU and starting the water cycles ...................................................................................... 15
3.3.1 Setting fan speed on the KCU ................................................................................................................. 16
3.4 Connecting peripherals to the Server ............................................................................................................. 17
3.5 Setting up the electrical connections and switching on the Server ................................................................ 18
3.6 Switching off the Server and the KCU ............................................................................................................ 18
4 Connecting 2 DK ................................................................................................................................................ 20
4.1 NVIDIA GPUDirect™ functionality .................................................................................................................. 20
4.1.1 NVIDIA GPUDirect™ version 2: P2P ...................................................................................................... 20
4.1.2 NVIDIA GPUDirect™ version 3: RDMA .................................................................................................. 20
5 Accessing and managing the DK ..................................................................................................................... 22
5.1 Accessing the DK ........................................................................................................................................... 22
6 DK management ................................................................................................................................................. 22
6.1 Network access .............................................................................................................................................. 22
6.2 Out of band management ............................................................................................................................... 22
6.3 Boot ................................................................................................................................................................. 23
6.4 Network configuration ..................................................................................................................................... 23
Table of contents Aurora Hive Development Kit Installation and Operation Manual
4
AUHPC-30-20-00-DK0_InstMan_En_1.0
7 DK software ........................................................................................................................................................ 24
7.1 Software installed from CentOs repository ..................................................................................................... 24
7.2 X Window System ........................................................................................................................................... 24
7.3 Development and scientific Software ............................................................................................................. 25
7.3.1 Default runtime environment ................................................................................................................... 25
7.3.2 Compilers ................................................................................................................................................ 25
7.3.3 Nvidia - CUDA ......................................................................................................................................... 25
7.3.4 Nvidia K-40 tuning ................................................................................................................................... 27
7.3.5 OFED - Mellanox ..................................................................................................................................... 27
8 Benchmarks for CPU and GPU ......................................................................................................................... 28
8.1.1 HPL .......................................................................................................................................................... 28
8.1.2 XHPL ....................................................................................................................................................... 29
8.1.3 STREAM2 ................................................................................................................................................ 31
8.2 Molecular Dynamics software ......................................................................................................................... 32
8.2.1 GROMACS .............................................................................................................................................. 32
8.2.2 HOOMD ................................................................................................................................................... 33
8.2.3 LAMMPS ................................................................................................................................................. 35
8.2.4 AMBER .................................................................................................................................................... 37
9 Performance Analysis tools .............................................................................................................................. 38
9.1.1 LIKWID .................................................................................................................................................... 38
9.1.2 Open|SpeedShop .................................................................................................................................... 38
10 Installing additional software............................................................................................................................ 39
11 Aurora HiVe Development Kit acceptance procedure ................................................................................... 40
12 DK system specifications.................................................................................................................................. 42
12.1 Aurora HiVe Server ..................................................................................................................................... 42
12.1.1 Midplane .............................................................................................................................................. 42
12.1.2 CPU card ............................................................................................................................................. 43
12.1.3 Accelerators (Nvidia K40) .................................................................................................................... 44
12.1.4 Network Interface Card ........................................................................................................................ 44
12.1.5 Hard Disk ............................................................................................................................................. 45
12.1.6 Server internal cabling ......................................................................................................................... 45
12.2 System interfaces and LEDs ....................................................................................................................... 46
12.2.1 I/O (Input/Output) ................................................................................................................................. 46
12.2.2 LEDs .................................................................................................................................................... 46
12.2.3 Electrical specifications and power supply units .................................................................................. 47
12.2.4 Cooling unit (Koolance) specifications ................................................................................................. 47
12.2.5 Liquid for KCU specifications ............................................................................................................... 47
12.2.6 SMC Connectors specifications ........................................................................................................... 48
12.3 Cooling ........................................................................................................................................................ 48
12.3.1 Maintenance of the cooling loop .......................................................................................................... 48
13 Technical assistance ......................................................................................................................................... 51
14 Notes ................................................................................................................................................................... 53
Aurora Hive Development Kit Installation and Operation Manual List of Figures
5
AUHPC-30-20-00-DK0_InstMan_En_1.0
List of Figures
Figure 1: Node Configuration .......................................................................................................................................... 10
Figure 2: Development Kit shipping content ................................................................................................................... 11
Figure 3: Suggested setup for the DK and its peripherals .............................................................................................. 13
Figure 4: Filling the tank of the Koolance cooling unit with coolant fluid. ....................................................................... 14
Figure 5: Connectors on the KCU and on the Server ..................................................................................................... 15
Figure 6: Connecting the KCU to the Server .................................................................................................................. 15
Figure 7: Switching on and configuring the KCU ............................................................................................................ 16
Figure 8: Connecting peripherals to the Server .............................................................................................................. 17
Figure 9: Switching on the Server ................................................................................................................................... 18
Figure 10: Server and KCU connected. The system is switched off. .............................................................................. 19
Figure 11: Connecting 2 Development Kits using an IB cable. ....................................................................................... 20
Figure 12: NVIDIA GPU Direct - Versions 2 and 3. ........................................................................................................ 21
Figure 13: Midplane connections .................................................................................................................................... 43
Figure 14: The CPU card and Midplane schematics ...................................................................................................... 44
Figure 15: Server internal cabling (rear view) ................................................................................................................. 45
Figure 16: Server I/O ....................................................................................................................................................... 46
Figure 17: Nodes assembly and cooling ......................................................................................................................... 48
Figure 18: Connectors on the Server .............................................................................................................................. 49
Figure 19: Maintenance of the cooling loop .................................................................................................................... 49
Figure 20: Label on the DK ............................................................................................................................................. 51
List of Tables
Table 1: System specifications ........................................................................................................................................ 42
Table 2: CPU card ........................................................................................................................................................... 43
Table 3: GPU cards ......................................................................................................................................................... 44
Table 4: Power supply electric specifications ................................................................................................................. 47
Table 5: KCU technical specifications ............................................................................................................................. 47
Important user information Aurora Hive Development Kit Installation and Operation Manual
6
AUHPC-30-20-00-DK0_InstMan_En_1.0
Important user information
Carefully read and understand the instructions in this manual before using this device.
Whenever you have any doubt regarding the operation of this device, consult this manual or contact your local
Eurotech Technical Support Team (see the last page of this manual for details).
Keep this manual for future reference.
To lower the risk of personal injury, electric shock, fire or damage to equipment, observe the following
precautions, as well as use good technical judgment, whenever installing or using the device.
Eurotech has made every effort to ensure the accuracy of this document; however, Eurotech assumes no liability
resulting from any error/omission in this document, or from the use of the information contained herein.
Eurotech reserves the right to revise this document or to make changes to its content at any time without any
obligation to notify any person of such revision or changes.
Alerts that can be found throughout this manual
MEANING
DANGER!
Information highlighting potential electrical shock hazards:
Personal injury or death could occur.
Damage to the system, connected peripheral devices, or software could occur.
Always use appropriate safety precautions. Also ensure that the installation meets all the requirements as
set out for the environment that the equipment will be deployed in.
WARNING!
Information highlighting potential hazards:
Personal injury or death could occur.
Damage to the system, connected peripheral devices, or software could occur.
Always use appropriate safety precautions. Also ensure that the installation meets all the requirements as
set out for the environment that the equipment will be deployed in.
NOTE
These will highlight important features or instructions.
Aurora Hive Development Kit Installation and Operation Manual Important user information
7
AUHPC-30-20-00-DK0_InstMan_En_1.0
Safety notices and warnings
Observe the following safety precautions during all phases of operation, service, and repair of the device.
Failure to comply with these precautions or with specific warnings elsewhere in this manual violates safety
standards of design, manufacture, and intended use of the device.
Eurotech assumes no liability for the customer’s failure to comply with these requirements.
The safety precautions listed below represent warnings of certain dangers of which Eurotech is aware. You, as
the user of the device, should follow these warnings and all other safety precautions necessary for the safe
operation of the device in your operating environment.
Do not operate in an explosive atmosphere
WARNING!
Do not operate the equipment in the presence of flammable gases or fumes. Operation of any
electrical equipment in such an environment constitutes a definite safety hazard.
Antistatic precautions
WARNING!
To avoid ESD (Electro Static Discharge) damage, always use appropriate antistatic precautions
when handing any electronic equipment.
Connection to power supply or other devices
DANGER!
Before applying power to the system, thoroughly review all installation, operation, and safety
instructions.
Failure to install the system power supply correctly or to follow all operating instructions correctly
may create an electrical shock hazard, which can result in personal injury or loss of life, and/or
damage to equipment or other property
To avoid injuries, always disconnect power and discharge circuits before touching them.
Only start the device with a power supply that meets the requirements stated on the voltage label. In case of
uncertainties about the required power supply, please contact the Eurotech Technical Support Team or the
electricity authority
Before connecting other equipment carefully read any supplied instructions
Always disconnect the power before connecting or disconnecting cables
Do not perform connections with wet hands
Check any power cords for damage before use
Use certified power cables. The power cable must meet the requirements (voltage and current) of the device.
Position cables with care. Avoid positioning cables in places where they may be trampled on or compressed
by objects placed on them. Take particular care of the plug, power-point and outlet of power cable
Avoid overcharging any power outlets
Only apply power to the device or connected equipment after checking that all the above conditions have
been met
Important user information Aurora Hive Development Kit Installation and Operation Manual
8
AUHPC-30-20-00-DK0_InstMan_En_1.0
Installation
WARNING!
Verify that the mounting location can withstand the added loads caused by the addition of the
device, it should be firmly secured so that it will not cause any potentially hazardous
situations (e.g. falling down due to vibration or shock)
Do not operate the device near heat sources or flames.
NOTE:
If the device must be moved from one place to another with different ambient temperatures, ensure sufficient
time for the temperature of the device to stabilize before repowering.
Ventilation
WARNING!
Ensure adequate ventilation to avoid overheating, Eurotech suggests the following steps:
When installing the device within a cabinet, rack or other enclosed space, be sure to leave
sufficient space to allow adequate air circulation
Do not block any ventilation openings
Maintenance
DANGER!
Never open, dismantle or repair the device!
For your maintenance or repair requirement please contact a qualified Eurotech engineer.
If the device does not function correctly and you are unable to find a solution, feel free to contact
the Eurotech Technical Support Team.
If the equipment does not work properly, especially if smells unusual, unplug it immediately and contact the
Eurotech Technical Support Team (see last page of this manual for further details).
Cleaning
WARNING!
When cleaning the device, remember to:
Ensure sufficient ESD protection during the cleaning process.
Remove any power from the device.
When cleaning an enclosed system or peripheral use a dry cloth on the external casing.
With single boards, use only a low power air brush or soft bristled paintbrush.
Do not use detergents, aerosol sprays, solvents or abrasive sponges.
Life support policy
WARNING!
Users must not use Eurotech products as critical components of life support devices or systems
without the express written approval of Eurotech Spa.
Aurora Hive Development Kit Installation and Operation Manual Important user information
9
AUHPC-30-20-00-DK0_InstMan_En_1.0
Warranty
Please contact your local Eurotech Sales Office for detailed warranty terms and conditions.
Refer to the back covers of this manual for full contact details.
WEEE
The information below complies with the regulations set out in the 2002/96/EC directive, subsequently
superseded by 2003/108/EC. It refers electrical and electronic equipment and the waste management of such
products.
When disposing of a device, including all of its components, subassemblies and materials that are an integral part
of the product, you should consider the WEEE directive.
The use of the following symbol, attached to the equipment, packaging, instruction literature, or the
guarantee sheet, states that the device has been marketed after August 13th 2005, and implies that
you must separate all of its components when possible, and dispose of them in accordance withal
waste disposal legislations:
Because of the substances present in the equipment, improper use or disposal of the refuse can cause
damage to human health and the environment.
With reference to WEEE, it is compulsory not to dispose of the equipment with normal urban refuse; an
arrangement for separate collection and disposal is essential.
To avoid any possible legal implications contact your local waste collection body for full recycling information.
RoHS
This device, including all the components, subassemblies and the consumable materials that are an integral part
of the product, have been manufactured in compliance with the European directive 2002/95/EC known as the
RoHS directive (Restrictions of the use of certain Hazardous Substances). This directive targets the reduction of
certain hazardous substances previously used in electrical and electronic equipment (EEE).
Technical assistance
For any technical questions, or if you cannot isolate a problem with your device, or for any enquiry about repair
and returns policies, feel free to contact your local Eurotech Technical Support Team.
See the back cover for full contact details.
Transportation
When transporting any module or system, for any reason, it should be packed using anti-static material and
placed in a sturdy box with enough packing material to adequately cushion it.
Warning:
Any product returned to Eurotech that is damaged due to inappropriate packaging will not be
covered by the warranty!
Overview of the Aurora HiVe systems Aurora Hive Development Kit Installation and Operation Manual
10
AUHPC-30-20-00-DK0_InstMan_En_1.0
1 Overview of the Aurora HiVe systems
Aurora HiVe is a family of HPC systems optimized for accelerated workloads. The system building block (Aurora
HiVe node) is a highly modular integration of different components, and it forms the heart of your Development
Kit.
The Aurora HiVe node can have different configurations, depending on the kind of CPU card and
accelerators/coprocessors used. Your Development Kit hosts a node which presents the following configuration:
Figure 1: Node Configuration
All of the components are cooled using the second generation of Aurora Direct Hot Water Cooling. Each card is
paired with a light and compact aluminium cold plate, which allows a high packaging density.
Aurora Hive Development Kit Installation and Operation Manual Aurora HiVe Development Kit content
11
AUHPC-30-20-00-DK0_InstMan_En_1.0
2 Aurora HiVe Development Kit content
Dimensions and Weight
Unit description
Length [mm]
Height [mm]
Width [mm]
Weight [kg]
Server
502
190
131
8
Koolance Cooling Unit
490
450
150
13
Total packaged content
502
450
281
21
2.1 Development Kit shipping content
The Aurora HiVe Development Kit (from now on, DK) includes the following elements:
Aurora HiVe Server, which includes:
o Aurora HiVe node
o Delta Power supply (see 12.2.3 Electrical specifications for description and specifications)
Cooling unit: Koolance ERM-3K3U (see 12.2.4 Cooling unit (Koolance) specifications for description and
specifications). From now on indicated as KCU (Koolance Cooling Unit).
Liquid for Cooling Unit: PrimoChill PC-ICE (see 12.2.5 Liquid for KCU specifications for description and
specifications)
SMC Connectors and Tubes: KKA4 series (see 12.2.6 SMC Connectors specifications for description and
specifications)
2 auxiliary discharge tubes, one with a female SMC connector (socket) and one with a male SMC connector
(nipple). You will need these tubes for the cooling loop maintenance (see 12.3.1 Maintenance of the cooling
loop)
Plugs and cables
o 2 x EU Power cord: D04/QT3-W for 200-240V
ac,
50 Hz, 10 A
o 2 x US Power cord: UPH US IEC C13 for 100-120V
ac,
20 Hz, 20 A
Figure 2: Development Kit shipping content
WARNING! It is strictly forbidden to open the Server enclosure. This may void warranty.
Aurora HiVe Development Kit content Aurora Hive Development Kit Installation and Operation Manual
12
AUHPC-30-20-00-DK0_InstMan_En_1.0
2.2 Additional Peripherals
To operate your DK, you are advised to use the following peripherals, NOT included in the kit. (For a detailed
description of the front panel and its ports, see: 12.2 System interfaces and LEDs).
A Keyboard which you can connect to the Server using the USB port located on the front panel
A Screen can connect to the Server using the VGA port located on the front panel
Aurora Hive Development Kit Installation and Operation Manual DK Setup
13
AUHPC-30-20-00-DK0_InstMan_En_1.0
3 DK Setup
The KCU should always be placed in the same room of the Server (or, in a room with identical
temperature). This will avoid condensation issues.
Figure 3: Suggested setup for the DK and its peripherals illustrates the suggested setup for the DK and its
peripherals. Ideally the Server is placed on a desk, whereas the KCU stays under the desk in vertical position.
Please note that, even though the KCU can operate also in horizontal position, it is better to keep it vertical for an
optimal functioning of unit.
Figure 3: Suggested setup for the DK and its peripherals
3.1 Preparing the Koolance Cooling Unit
Once the DK (Aurora HiVe Server, KCU, liquid for KCU, water tubes with their respective SMC connectors, and
power cables) is unpacked - the first thing you need to do is to prepare the KCU.
IMPORTANT! Note that you cannot switch on the Server before the KCU is connected to the
Server and is fully operational. The Server will be damaged if cooling is not provided.
IMPORTANT! To avoid burning the pump of the KCU, please remember to NEVER switch on the
KCU before all the pipes (IN and OUT) have been properly connected to the Server following the
procedure explained below.
DK Setup Aurora Hive Development Kit Installation and Operation Manual
14
AUHPC-30-20-00-DK0_InstMan_En_1.0
The steps to have the KCU ready and working are the following and should be performed exactly in this
sequence:
1. Check that the KCU is off and disconnected from the electricity outlet.
2. Connect the flexible pipes to the KCU, inserting the SMC connectors into the KCU IN and OUT back
connectors (See Figure 5: Connectors on the KCU and on the Server)
3. With an appropriate screwdriver, remove the cap on the top (narrow side) of the KCU
1
.
4. Place a funnel in the hole.
5. Fill up the tank of the KCU with the Coolant Fluid until you start seeing the liquid at the bottom of the funnel.
To correctly fill the KCU, you will typically need approximately 1,5 litres of Coolant Fluid.
6. To verify the correctness of the pre-circulation activities, check that no air bubbles/ are visible in the
reservoir of the KCU.
At this point the KCU is NOT powered on yet.
Figure 4: Filling the tank of the Koolance cooling unit with coolant fluid.
3.2 Connecting the Server to the Koolance Cooling unit
WARNING: It is stricty forbidden to connect the Server to cooling loops that are not the the
Koolance Cooling Unit. This may void the warranty.
Now it is time to connect the water tubes to the Server. The SMC connectors have already been placed on the
tubes. Follow the steps below with reference to Figure 5: Connectors on the KCU and on the Server
1
When we mention the sides of the KCU (front, back, top, bottom…) we consider it to be placed in the position illustrated in Figure 4: Filling the
tank of the Koolance cooling unit with coolant fluid.
1
4
3
2
RESERVOIR
Aurora Hive Development Kit Installation and Operation Manual DK Setup
15
AUHPC-30-20-00-DK0_InstMan_En_1.0
1. Connect the KCU OUT connector (on the back of the device) (RED circle) to the Server UPPER
connector (also on the back of the device) (RED circle). To do so, use one of the two tubes provided (no
matter which one). To insert the female SMC connector into its respective male, you just need to push it
inside, until you hear a “click” sound.
2. Connect the KCU IN connector (on the back of the device) (BLUE circle) to the Server LOWER
connector (also on the back of the device) (BLUE circle). To do so, use one of the two tubes provided (no
matter which one). To insert the male SMC connector into its respective female, you just need to push it
inside, until you hear a “click” sound.
Now you are all set to switch on the KCU and start the first water cycle.
Figure 5: Connectors on the KCU and on the Server
Figure 6: Connecting the KCU to the Server
3.3 Switching on the KCU and starting the water cycles
The first time you use the DK, you need to repeat the water cycle more than once before switching on the Server.
After the first filling liquid needs to be maintained and periodically changed (see 12.3.1 Maintenance of the
cooling loop)
To perform the water cycles, with the Server POWERED OFF:
1. Connect the KCU to an electricity outlet with the power cable.
IN
OUT
DK Setup Aurora Hive Development Kit Installation and Operation Manual
16
AUHPC-30-20-00-DK0_InstMan_En_1.0
2. Press the POWER button (on the front side of the KCU, see Figure 7: Switching on and configuring the
KCU).
3. Check that the fan speed on the KCU are set to 100% (see 3.3.1 Setting fan speed on the KCU)
4. Wait until the water level on the KCU reservoir goes down and stabilizes.
5. Switch off the KCU pressing the POWER button.
6. Refill the water following the procedure described in section 3.1 Preparing the Koolance Cooling Unit
Repeat steps 2 to 4 until you see that the water level in the KCU reservoir does not get any lower. At this
point check the water level. (see Note 1).
Note 1: The best way to check the water level is to verify that the level of the water in the reservoir is 70
- 80% full, the water doesn’t decrease and the and there are little or no air bubbles. The line identifying
the surface of the water should be above half reservoir (at 70-80% of the height of the glass) and should
be firm, so not waved by air bubbles. Conversely, if the level is decreasing you need to add water, or if
there are air bubbles you need to leave the KCU on without attaching the server.
Note 2: If the unit are powered off within 5 seconds of being powered on, the front display may be
locked up or not responding. In this case, reset the display to manufacturer default settings by holding
▼ + ▲ for 5 seconds. See 3.3.1 Setting fan speed on the KCU for detailed instructions.
Figure 7: Switching on and configuring the KCU
3.3.1 Setting fan speed on the KCU
As per default settings, the KCU fans work at 100% air streaming capacity.
To check this please (refer to the numbers in Figure 7: Switching on and configuring the KCU).
1. Press the central button on the right side of the KCU display.
2. Select “FAN SET” (to do so use the up and down button until the selected option is blinking, then press the
central button to select it)
3. Select “ALL FANS 100%”
4. Wait until the blinking stops. In this way the selected value is set
When the HiVe server is not running at full capacity, fan speed can be reduced.
POWER
1
2
3
4
Aurora Hive Development Kit Installation and Operation Manual DK Setup
17
AUHPC-30-20-00-DK0_InstMan_En_1.0
If you need to change fan speed, please refer to the KCU manual.
The automatic KCU temperature measurement system is not integrated into the DK, but it is possible to manually
change the speed of the fans on the basis of the temperature of the GPUs of the DK. To determine the
temperature of the Nvidia GPU units use the command:
/usr/bin/nvidia-smi
from the host OS.
Based on this reading, the fan speed may be reduced or increased as desired.
3.4 Connecting peripherals to the Server
WARNING: The Aurora HiVe Server MUST ALWAYS be in vertical position with the power supply
on top. This is required to minimize electrical shock hazards.
It is possible to connect the following peripherals to your DK (please note that those are NOT included in the kit).
For description of the system interfaces, see chapter 12.2 System interfaces and LEDs.
Ethernet (ETH) network: connect the device to the ETH network through one of the 2 ETH ports with a
RJ45 cable.
Screen: connect the screen to the VGA port with VGA cable.
Keyboard: connect a USB keyboard to one of the USB port just below the VGA port using USB cable.
Mouse: connect a mouse to the other USB port using a USB cable.
Infiniband (IB) network: connect the Server to an IB network using one of the two IB ports and an IB
cable (QDR/FDR supported). If you need to connect two DK to each other back-to-back, see 4
Connecting 2 DK.
Figure 8: Connecting peripherals to the Server
VGA
IB
ETH
USB
DK Setup Aurora Hive Development Kit Installation and Operation Manual
18
AUHPC-30-20-00-DK0_InstMan_En_1.0
3.5 Setting up the electrical connections and switching on the
Server
At this point, the KCU should be connected to the Server, switched on and fully operational.
Remember to always verify that no leakages have occurred and that the cooling liquid level inside
the KCU reservoir is stable and free from bubbles before switching on the Server.
Now you are ready to switch on the Server:
1. Using the supplied power cord, connect the Server to electricity outlet.
The plug (BLUE circle in Figure 9) is on the front side of the Power Supply.
2. To switch on the Server, press the POWER button (RED circle in Figure 9) for 2 seconds (for a
description/illustration of the system interfaces, see 12.2 System interfaces and LEDs).
3. The second LED on the bottom right side of the front panel of the Server turns green. This means
that the Server has been switched on correctly.
4. Now look at the upper-central LED group (5 LEDs in total). When the lower-right LED turns off, the
CPU is booted and the OS starts to run. Booting may take a few minutes.
5. Wait until the boot completes. Now you can login into the system.
Figure 9: Switching on the Server
3.6 Switching off the Server and the KCU
When you want to power off the Server, first shutdown the OS from the command line or from the Desktop
environment. You are then ready to switch off the system. To do so:
1. Press the POWER button (RED circle in Figure 9) for 2 seconds.
2. Verify that all the upper LEDs are off.
3. Unplug the Server.
4. Only after you have unplugged the Server, you can switch off and unplug the KCU as well.
For instructions on how to operate your DK, please go to: 5 Accessing and managing the DK.
3
CPU OFF
CPU ON
2
4
1
Aurora Hive Development Kit Installation and Operation Manual DK Setup
19
AUHPC-30-20-00-DK0_InstMan_En_1.0
Figure 10: Server and KCU connected. The system is switched off.
Power Consumption
Maximum power consumption
Server 1300 Watts
KCU 90 Watts
Connecting 2 DK Aurora Hive Development Kit Installation and Operation Manual
20
AUHPC-30-20-00-DK0_InstMan_En_1.0
4 Connecting 2 DK
You can connect 2 DK back-to-back in an Infiniband network.
You can use an Infiniband cable to connect two Servers back-to-back through FDR IB ports (for a description of
system interfaces and I/O ports see chapter 12.2 System interfaces and LEDs).The cable is inserted correctly
when you hear a “click” sound. Once your Servers are connected, the GPUs can share data through the RDMA
GPUDirect™ functionality (for an explanation of how this works, see 4.1.2 NVIDIA GPUDirect™ version 3:
RDMA). Figure 11 below illustrates how to do this procedure.
Figure 11: Connecting 2 Development Kits using an IB cable.
4.1 NVIDIA GPUDirect™ functionality
The Aurora HiVe architecture supports NVIDIA GPUDirect™ (see NVIDIA website for more details:
https://developer.nvidia.com/gpudirect), thus allowing for both
P2P (Peer 2 Peer) functionality: direct communication between GPUs within the same node
RDMA (Remote Direct memory Access) functionality: direct communication between GPUs in different nodes
across an Infiniband Network.
This means that data transfer time becomes much shorter, increasing performance and decreasing latency. This
is possible because, thanks to this functionality, the data from the first GPU do not need to pass by (and be
copied on) the CPU, before travelling to the second GPU.
Your DK supports all 3 versions of NVIDIA GPUDirect™, therefore including both P2P and RDMA versions.
4.1.1 NVIDIA GPUDirect™ version 2: P2P
In the Aurora HiVe node, all cards share the same PCIe Bus, which is managed by the PLX PCIe Switch
PEX8796. Having already all relevant drivers (see 7 DK software for the complete list) preinstalled, you will be
able to directly transfer data from one GPU to another (on the same shared memory Server) using the P2P
NVIDIA GPUDirect™. Figure 12 below exemplifies how the transfer (red line) takes place.
4.1.2 NVIDIA GPUDirect™ version 3: RDMA
The Aurora HiVe architecture supports GPUDirect™ RDMA version as well. This means that a GPU in node nr. 1
can communicate with a GPU in node nr. 2 across an Infiniband network. As exemplified in Figure 12 (black
line), data travel from one GPU to the other via an Infiniband cable (NOT included in the kit), which is connected
to the IB ports in node nr. 1 and node nr. 2.
  • Page 1 1
  • Page 2 2
  • Page 3 3
  • Page 4 4
  • Page 5 5
  • Page 6 6
  • Page 7 7
  • Page 8 8
  • Page 9 9
  • Page 10 10
  • Page 11 11
  • Page 12 12
  • Page 13 13
  • Page 14 14
  • Page 15 15
  • Page 16 16
  • Page 17 17
  • Page 18 18
  • Page 19 19
  • Page 20 20
  • Page 21 21
  • Page 22 22
  • Page 23 23
  • Page 24 24
  • Page 25 25
  • Page 26 26
  • Page 27 27
  • Page 28 28
  • Page 29 29
  • Page 30 30
  • Page 31 31
  • Page 32 32
  • Page 33 33
  • Page 34 34
  • Page 35 35
  • Page 36 36
  • Page 37 37
  • Page 38 38
  • Page 39 39
  • Page 40 40
  • Page 41 41
  • Page 42 42
  • Page 43 43
  • Page 44 44
  • Page 45 45
  • Page 46 46
  • Page 47 47
  • Page 48 48
  • Page 49 49
  • Page 50 50
  • Page 51 51
  • Page 52 52
  • Page 53 53
  • Page 54 54
  • Page 55 55

Eurotech Aurora Hive Development Kit Owner's manual

Type
Owner's manual

Ask a question and I''ll find the answer in the document

Finding information in a document is now easier with AI