Bull CEC hot add and hot-node repair maintenance Planning Guide

Type
Planning Guide
Planning for CEC hot add and hot-node
repair maintenance
ESCALA Power7
REFERENCE
86 A1 43FF 05
ESCALA Models Reference
The ESCALA Power7 publications concern the following models:
Bull Escala E1-700 / E3-700 (31E/2B ,8231-E2B)
Bull Escala E1-705 (31E/1C, 8231-E1C)
Bull Escala E1-715 (31E/1D, 8231-E1D)
Bull Escala E3-705 (31E/2C, 8231-E2C)
Bull Escala E3-715 (31E/2D, 8231-E2D)
Bull Escala E2-700 / E2-700T (02E/4B, 8202-E4B)
Bull Escala E2-705 / E2-705T (02E/4C, 8202-E4C)
Bull Escala E2-715 / E2-715T (02E/4D, 8202-E4D)
Bull Escala E4-700 / E4-700T (05F/6B, 8205-E6B)
Bull Escala E4-705 (05E/6C, 8205-E6C)
Bull Escala E4-715 (05E/6D, 8205-E6D)
Bull Escala E5-700 (33E/8B, 8233-E8B)
Bull Escala E5-715 (08E/8D, 8408-E8D)
Bull Escala M5-715 (09R/MD, 9109-RMD)
Bull Escala M6-700 (17M/MB, 9117-MMB)
Bull Escala M6-705 (17M/MC, 9117-MMC)
Bull Escala M6-715 (17M/MD, 9117-MMD)
Bull Escala M7-700 (79M/HB, 9179-MHB)
Bull Escala M7-705 (79M/HC, 9179-MHC)
Bull Escala M7-715 (79M/HD, 9179-MHD)
Bull Escala H9-700 (19F/HB, 9119-FHB)
References to 8236-E8C models are irrelevant.
Hardware
February 2013
BULL CEDOC
357 AVENUE PATTON
B.P.20845
49008 ANGERS CEDEX 01
FRANCE
The following copyright notice protects this book under Copyright laws which prohibit such actions as, but not limited to, copying,
distributing, modifying, and making derivative works.
Copyright © Bull SAS 2013
Printed in France
Trademarks and Acknowledgements
We acknowledge the rights of the proprietors of the trademarks mentioned in this manual.
All brand names and software and hardware product names are subject to trademark and/or patent protection.
Quoting of brand and product names is for information purposes only and does not represent trademark misuse.
The information in this document is subject to change without notice. Bull will not be liable for errors contained herein, or
for incidental or consequential damages in connection with the use of this material.
Contents
Safety notices .................................v
Planning for CEC hot-node add and hot-node repair maintenance...........1
What's new in Planning for CEC hot-node add and hot-node repair maintenance ............2
CEC hot-node add and hot-node repair maintenance operations checklists for models 9117-MMB, 9117-MMC,
9179-MHB, and 9179-MHC ..............................2
CEC hot-node add and hot-node repair maintenance operations checklists for model 9119-FHB........6
Model 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC system firmware and HMC levels ........9
Model 9117-MMB and 9179-MHB system firmware and HMC levels................9
Model 9117-MMC and 9179-MHC system firmware and HMC levels ...............11
Model 9119-FHB system firmware and HMC levels ......................12
Preparing for Hot Repair or Upgrade utility ........................14
Model 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC multidrawer external SMP and FSP flex cabling . . . 28
CEC hot-node add and hot-node repair maintenance on a two drawer system with a two socket CPU card -
Front....................................31
CEC hot-node add and hot-node repair maintenance on a two drawer system with a four socket CPU card -
Front....................................32
CEC hot-node add and hot-node repair maintenance on a two drawer system - Rear ..........33
CEC hot-node add and hot-node repair maintenance on a three drawer system with a two socket CPU card -
Drawer 1 front .................................34
CEC hot-node add and hot-node repair maintenance on a three drawer system with a two socket CPU card -
Drawer 2 front .................................34
CEC hot-node add and hot-node repair maintenance on a three drawer system with a two socket CPU card -
Drawer 3 front .................................34
CEC hot-node add and hot-node repair maintenance on a three drawer system with a four socket CPU card -
Drawer 1 front .................................35
CEC hot-node add and hot-node repair maintenance on a three drawer system with a four socket CPU card -
Drawer 2 front .................................36
CEC hot-node add and hot-node repair maintenance on a three drawer system with a four socket CPU card -
Drawer 3 front .................................36
CEC hot-node add and hot-node repair maintenance on a three drawer system - Drawer 1 rear ......37
CEC hot-node add and hot-node repair maintenance on a three drawer system - Drawer 2 rear ......37
CEC hot-node add and hot-node repair maintenance on a three drawer system - Drawer 3 rear ......38
CEC hot-node add and hot-node repair maintenance on a four drawer system with a two or four socket CPU
card - Drawer 1 front ...............................39
CEC hot-node add and hot-node repair maintenance on a four drawer system with a two or four socket CPU
card - Drawer 2 front ...............................40
CEC hot-node add and hot-node repair maintenance on a four drawer system with a two or four socket CPU
card - Drawer 3 front ...............................40
CEC hot-node add and hot-node repair maintenance on a four drawer system with a two or four socket CPU
card - Drawer 4 front ...............................41
CEC hot-node add and hot-node repair maintenance on a four drawer system - Drawer 1 rear ......42
CEC hot-node add and hot-node repair maintenance on a four drawer system - Drawer 2 rear ......42
CEC hot-node add and hot-node repair maintenance on a four drawer system - Drawer 3 rear ......43
CEC hot-node add and hot-node repair maintenance on a four drawer system - Drawer 4 rear ......43
Notices ...................................45
Trademarks ...................................46
Electronic emission notices ..............................46
Class A Notices .................................46
Class B Notices .................................50
Terms and conditions ................................53
© Copyright IBM Corp. 2010, 2013 iii
iv
Safety notices
Safety notices may be printed throughout this guide:
v DANGER notices call attention to a situation that is potentially lethal or extremely hazardous to
people.
v CAUTION notices call attention to a situation that is potentially hazardous to people because of some
existing condition.
v Attention notices call attention to the possibility of damage to a program, device, system, or data.
World Trade safety information
Several countries require the safety information contained in product publications to be presented in their
national languages. If this requirement applies to your country, a safety information booklet is included
in the publications package shipped with the product. The booklet contains the safety information in
your national language with references to the U.S. English source. Before using a U.S. English publication
to install, operate, or service this product, you must first become familiar with the related safety
information in the booklet. You should also refer to the booklet any time you do not clearly understand
any safety information in the U.S. English publications.
German safety information
Das Produkt ist nicht für den Einsatz an Bildschirmarbeitsplätzen im Sinne§2der
Bildschirmarbeitsverordnung geeignet.
Laser safety information
IBM
®
servers can use I/O cards or features that are fiber-optic based and that utilize lasers or LEDs.
Laser compliance
IBM servers may be installed inside or outside of an IT equipment rack.
© Copyright IBM Corp. 2010, 2013 v
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
DANGER
vi
Observe the following precautions when working on or around your IT rack system:
v Heavy equipment–personal injury or equipment damage might result if mishandled.
v Always lower the leveling pads on the rack cabinet.
v Always install stabilizer brackets on the rack cabinet.
v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest
devices in the bottom of the rack cabinet. Always install servers and optional devices starting
from the bottom of the rack cabinet.
v Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top
of rack-mounted devices.
v Each rack cabinet might have more than one power cord. Be sure to disconnect all power cords in
the rack cabinet when directed to disconnect power during servicing.
v Connect all devices installed in a rack cabinet to power devices installed in the same rack
cabinet. Do not plug a power cord from a device installed in one rack cabinet into a power
device installed in a different rack cabinet.
v An electrical outlet that is not correctly wired could place hazardous voltage on the metal parts of
the system or the devices that attach to the system. It is the responsibility of the customer to
ensure that the outlet is correctly wired and grounded to prevent an electrical shock.
CAUTION
v Do not install a unit in a rack where the internal rack ambient temperatures will exceed the
manufacturer's recommended ambient temperature for all your rack-mounted devices.
v Do not install a unit in a rack where the air flow is compromised. Ensure that air flow is not
blocked or reduced on any side, front, or back of a unit used for air flow through the unit.
v Consideration should be given to the connection of the equipment to the supply circuit so that
overloading of the circuits does not compromise the supply wiring or overcurrent protection. To
provide the correct power connection to a rack, refer to the rating labels located on the
equipment in the rack to determine the total power requirement of the supply circuit.
v (For sliding drawers.) Do not pull out or install any drawer or feature if the rack stabilizer brackets
are not attached to the rack. Do not pull out more than one drawer at a time. The rack might
become unstable if you pull out more than one drawer at a time.
v (For fixed drawers.) This drawer is a fixed drawer and must not be moved for servicing unless
specified by the manufacturer. Attempting to move the drawer partially or completely out of the
rack might cause the rack to become unstable or cause the drawer to fall out of the rack.
(R001)
Safety notices vii
CAUTION:
Removing components from the upper positions in the rack cabinet improves rack stability during
relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a
room or building:
v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack
cabinet. When possible, restore the rack cabinet to the configuration of the rack cabinet as you
received it. If this configuration is not known, you must observe the following precautions:
Remove all devices in the 32U position and above.
Ensure that the heaviest devices are installed in the bottom of the rack cabinet.
Ensure that there are no empty U-levels between devices installed in the rack cabinet below the
32U level.
v If the rack cabinet you are relocating is part of a suite of rack cabinets, detach the rack cabinet from
the suite.
v Inspect the route that you plan to take to eliminate potential hazards.
v Verify that the route that you choose can support the weight of the loaded rack cabinet. Refer to the
documentation that comes with your rack cabinet for the weight of a loaded rack cabinet.
v Verify that all door openings are at least 760 x 230 mm (30 x 80 in.).
v Ensure that all devices, shelves, drawers, doors, and cables are secure.
v Ensure that the four leveling pads are raised to their highest position.
v Ensure that there is no stabilizer bracket installed on the rack cabinet during movement.
v Do not use a ramp inclined at more than 10 degrees.
v When the rack cabinet is in the new location, complete the following steps:
Lower the four leveling pads.
Install stabilizer brackets on the rack cabinet.
If you removed any devices from the rack cabinet, repopulate the rack cabinet from the lowest
position to the highest position.
v If a long-distance relocation is required, restore the rack cabinet to the configuration of the rack
cabinet as you received it. Pack the rack cabinet in the original packaging material, or equivalent.
Also lower the leveling pads to raise the casters off of the pallet and bolt the rack cabinet to the
pallet.
(R002)
(L001)
(L002)
viii
(L003)
or
All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class
1 laser products. Outside the U.S., they are certified to be in compliance with IEC 60825 as a class 1 laser
product. Consult the label on each part for laser certification numbers and approval information.
CAUTION:
This product might contain one or more of the following devices: CD-ROM drive, DVD-ROM drive,
DVD-RAM drive, or laser module, which are Class 1 laser products. Note the following information:
v Do not remove the covers. Removing the covers of the laser product could result in exposure to
hazardous laser radiation. There are no serviceable parts inside the device.
v Use of the controls or adjustments or performance of procedures other than those specified herein
might result in hazardous radiation exposure.
(C026)
Safety notices ix
CAUTION:
Data processing environments can contain equipment transmitting on system links with laser modules
that operate at greater than Class 1 power levels. For this reason, never look into the end of an optical
fiber cable or open receptacle. (C027)
CAUTION:
This product contains a Class 1M laser. Do not view directly with optical instruments. (C028)
CAUTION:
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following
information: laser radiation when open. Do not stare into the beam, do not view directly with optical
instruments, and avoid direct exposure to the beam. (C030)
CAUTION:
The battery contains lithium. To avoid possible explosion, do not burn or charge the battery.
Do Not:
v ___ Throw or immerse into water
v ___ Heat to more than 100°C (212°F)
v ___ Repair or disassemble
Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local
regulations. In the United States, IBM has a process for the collection of this battery. For information,
call 1-800-426-4333. Have the IBM part number for the battery unit available when you call. (C003)
Power and cabling information for NEBS (Network Equipment-Building System)
GR-1089-CORE
The following comments apply to the IBM servers that have been designated as conforming to NEBS
(Network Equipment-Building System) GR-1089-CORE:
The equipment is suitable for installation in the following:
v Network telecommunications facilities
v Locations where the NEC (National Electrical Code) applies
The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed
wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the
interfaces that connect to the OSP (outside plant) or its wiring. These interfaces are designed for use as
intrabuilding interfaces only (Type 2 or Type 4 ports as described in GR-1089-CORE) and require isolation
from the exposed OSP cabling. The addition of primary protectors is not sufficient protection to connect
these interfaces metallically to OSP wiring.
Note: All Ethernet cables must be shielded and grounded at both ends.
The ac-powered system does not require the use of an external surge protection device (SPD).
The dc-powered system employs an isolated DC return (DC-I) design. The DC battery return terminal
shall not be connected to the chassis or frame ground.
x
Planning for CEC hot-node add and hot-node repair
maintenance
CEC hot-node add and hot-node repair maintenance operations improve system availability by avoiding
a scheduled system outage. Hardware can be added and repaired while the system is powered on, but
depending on the specific model, all critical applications should be quiesced (stopped or powered off), or
moved to another system using Live Partition Mobility (LPM). In addition, the Electronic Service Agent
(ESA) must be enabled.
CEC hot-add and hot-repair tasks can be performed on IBM Power
®
770 (9117-MMB and 9117-MMC),
IBM Power 780 (9179-MHB and 9179-MHC), and IBM Power 795 (9119-FHB) systems that are managed
by a Hardware Management Console (HMC).
CEC hot-add and hot-repair tasks provide the capability to perform maintenance (repair, add, or
upgrade) on field replaceable units (FRUs) within a CEC while the system is powered on. Even though
the system remains powered on, there may be impacts to the partitions and the performance of the
system. For example, when a CEC node is physically removed from the system to repair a FRU, the
processors, memory, and I/O resources hosted by that node will no longer be available to the partitions.
The impacts on a specific system configuration will vary depending on a number of factors including
spare memory resources, spare processor resources, and I/O redundancy configuration (mirroring or
multipath).
For additional information about the CEC concurrent maintenance functions on Power 770, Power 780,
and Power 795 systems, see the IBM
®
Power 770/780 and 795 Servers CEC Hot Add & Repair
Maintenance Technical Overview white paper.
Prerequisites for CEC hot-add and hot-repair
The following prerequisites apply to CEC hot-node add and hot-node repair maintenance operations.
1. Before performing a CEC hot-add or hot-repair operation, the CEC hot-add or hot-repair operation
distance learning course must be completed.
2. Verify that the HMC level is the same as, or later than, the level shown in “Model 9117-MMB and
9179-MHB system firmware and HMC levels” on page 9, “Model 9117-MMC and 9179-MHC system
firmware and HMC levels” on page 11, or “Model 9119-FHB system firmware and HMC levels” on
page 12. If there are two HMCs attached to the system, both HMCs must be at the same level. If not,
the HMC that is not at the required level must be correctly disconnected from the managed system
and powered off.
For information about verifying the HMC code level and release, see Determining your HMC code
level and release.
For information about disconnecting an HMC from a managed system, see Removing HMC
connection data.
3. Carefully follow all ESD (electrostatic discharge) handing procedures.
4. When you perform multiple hot adds that include a node and GX adapter, the GX adapter must be
installed in the node before the node is installed in the system. If this order is observed, the memory
in the new node will be used for the 128 MB of memory that is required for the GX adapter. (The
node must have approximately 128 MB of memory per GX adapter to support the adapter's
translation control entry (TCE) table.) The GX adapter in the new node will be activated during the
addition of the node (if the 128 MB memory requirement is met). If the GX adapter is installed after
the new node is installed and activated, the plugging rules for GX adapters will be enforced. In this
case, the GX adapter must be installed in another node with another hot add operation.
© Copyright IBM Corp. 2010, 2013 1
Note: All of the hot add and repair activities must be performed by a service representative.
Checklists are available that list the prerequisite planning tasks that are required prior to performing hot
maintenance.
Special considerations for CEC hot-add and hot-repair
The following special considerations need to be taken into account when performing CEC hot-node add
and hot-node repair maintenance operations.
1. Only a single hot add or repair operation can be performed at one time from a single HMC.
2. A second hot add or repair operation cannot be started until the first one has been completed without
failure. If a hot operation fails, the same operation must be restarted and completed before attempting
another operation.
3. Multiple hot add or repair operations must be completed by doing a series of single hot add or repair
operations.
What's new in Planning for CEC hot-node add and hot-node repair
maintenance
Read about new or significantly changed information in Planning for CEC hot-node add and hot-node
repair maintenance since the previous update of this topic collection.
October 2011
The following updates have been made to the content:
v Added new firmware levels for 9117-MMC (IBM Power 770) and 9179-MHC (IBM Power 780) servers:
“Model 9117-MMC and 9179-MHC system firmware and HMC levels” on page 11.
March 2013
The following updates have been made to the content:
v Added new information for 9179-MHC (IBM Power 780) servers with a four socket CPU card: “Model
9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC multidrawer external SMP and FSP flex cabling” on
page 28.
v Added new firmware levels for 9119-FHB (IBM Power 795) servers: “Model 9119-FHB system firmware
and HMC levels” on page 12.
CEC hot-node add and hot-node repair maintenance operations
checklists for models 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC
The planning checklists contain a list of tasks that must be completed before you perform CEC hot-node
add and hot-node repair maintenance operations on IBM Power 770 (9117-MMB and 9117-MMC) and
IBM Power 780 (9179-MHB and 9179-MHC) systems. It also lists which tasks are associated with each
type of operation.
For guidelines that apply to all CEC adapter hot-add and hot-repair maintenance operations, see
“Prerequisites for CEC hot-add and hot-repair” on page 1. These general guidelines must be followed as
well as the items in the following checklist.
2
Table 1. CEC hot-node add and hot-node repair maintenance operations checklists for models 9117-MMB,
9117-MMC, 9179-MHB, and 9179-MHC
Operations Models 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC tasks Responsibility
Hot-node add
Ensure that sufficient space is available in the rack that contains
the system for another node to be installed under the existing
nodes.
Customer
Hot-node add
Adding a node increases the power requirements of the system. If
necessary, the power circuits that supply the system must be
upgraded to handle these additional power requirements. This
upgrade must be complete before the hot-node add begins.
The power requirements of additional nodes are detailed in
Chapter 4, Server Specifications and Chapter 8, Planning for
Power in Site and Hardware Planning.
Customer responsibility to
perform the required site
planning activities
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, and
hot GX adapter
repair
If the system is managed by a Hardware Management Console
(HMC), verify that the HMC level is the same as, or later than,
the level shown in “Model 9117-MMB and 9179-MHB system
firmware and HMC levels” on page 9 or “Model 9117-MMC and
9179-MHC system firmware and HMC levels” on page 11.
For information about disconnecting an HMC from a managed
system, see Removing HMC connection data.
Verify that the system firmware level is the same as, or later than,
the level shown in “Model 9117-MMB and 9179-MHB system
firmware and HMC levels” on page 9 or “Model 9117-MMC and
9179-MHC system firmware and HMC levels” on page 11.
For information about determining firmware levels using the
Hardware Management Console (HMC), see Using the HMC to
view the existing firmware levels.
Customer
Service representative must
verify
Hot-node upgrade
(memory),
hot-node add, and
hot-node repair
CAUTION:
Observe the following with the Symmetric Multiprocessing (SMP)
and the service processor cables on Power 770 and Power 780
systems when performing a CEC hot add or repair operation:
v Verify that the SMP cable (on the front of the system) and the
service processor cable (on the rear of the system) are layered
correctly. See “Model 9117-MMB, 9117-MMC, 9179-MHB, and
9179-MHC multidrawer external SMP and FSP flex cabling” on
page 28. Do not start the CEC hot add or repair operation if the
layering is not correct. If the layering is not correct, schedule a
system outage with the customer to correct the cabling and
perform the CEC hot add or repair operation with the power
off.
v Do not disturb the SMP and the service processor cables on
active nodes. If a node repair is being performed, remove only
the cables on the node that is being repaired (powered off)
unless otherwise directed by WCII instructions or the removal
and replacement procedure.
Service representative must
verify
Hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
If there is only one partition running on the system, the system
can be powered down if that partition must be shut down during
the CEC hot add or repair operation. To verify or change the
setting that controls this behavior, see System power off policy in
Powering the system on and off.
Service representative
Planning for CEC hot-node add and hot-node repair maintenance 3
Table 1. CEC hot-node add and hot-node repair maintenance operations checklists for models 9117-MMB,
9117-MMC, 9179-MHB, and 9179-MHC (continued)
Operations Models 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC tasks Responsibility
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, and
hot GX adapter
repair
Verify that service processor redundancy is enabled, except on a
system with a single node.
To enable service processor redundancy on the managed system
on an HMC, in the navigation area, select Serviceability > FSP
Failover > Setup. Failover status is displayed in the contents area
and can be enabled or disabled.
Note: The server must be power cycled for the enabling of service
processor redundancy to take effect. If power cycling is required,
the node can be added nonconcurrently (with power off).
For information about redundancy on the HMC, see FSP Failover.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, and
hot GX adapter
repair
If the system is managed by an HMC, verify the network
connections between the system service processors and the HMC
before the service representative arrives by using the service
processor status function on the HMC. For information about
testing the network connections on the HMC, see Testing the
connection between the HMC and the managed system.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, and
hot GX adapter
repair
Ensure that Electronic Service Agent
(ESA) is enabled to call
home.
For additional details about the call home function on the HMC,
see Manage Systems Call-Home.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
For 9117-MMB and 9179-MHB systems, you should quiesce the
server before the hot-node operation. This means that all critical
application should be halted or moved to another server using
Live Partition Mobility (LPM) before the operation begins.
Noncritical applications can be left running. The partitions can be
left running at the operating system command prompt.
For 9117-MMC and 9179-MHC systems, you must quiesce the
server before the hot-node operation. This means that all critical
applications must be halted or moved to another server using
Live Partition Mobility (LPM) before the operation begins.
Noncritical applications can be left running. The partitions can be
left running at the operating system command prompt.
For additional information on a system managed by an HMC, see
Mobility in the Systems management for partitions topic in
Managing the Hardware Management Console.
Customer
Hot-node add and
hot-node upgrade
(memory)
Obtain and apply Capacity Upgrade on Demand (CUoD)
activation codes prior to the operation so that the memory
resources are available immediately after the hot add operation.
For information about CUoD activation codes, see Capacity
Upgrade on Demand activation codes.
Customer
4
Table 1. CEC hot-node add and hot-node repair maintenance operations checklists for models 9117-MMB,
9117-MMC, 9179-MHB, and 9179-MHC (continued)
Operations Models 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC tasks Responsibility
Hot-node add and
hot-node upgrade
(memory)
If the customer wants to dynamically add the new processors and
memory resources to activated partitions after the Hot-node add,
the profiles that were used to activate the partitions must have the
maximums for memory set to the appropriate values. When the
Hot-node add is complete, the processor and memory resources
can be added to the partitions using dynamic logical partitions
(DLPAR). For information about DLPAR, see Dynamic logical
partitioning.
Customer
Hot-node upgrade
(memory)
Verify that more than one node is present in the system. Hot-node
upgrade (memory) cannot be done if the system has only one
node.
Customer
Service representative must
verify
Hot-node repair,
hot GX adapter
repair, and
hot-node upgrade
(memory)
Run the Preparing for Hot Repair or Upgrade utility on the HMC
to determine if the system is prepared for a node upgrade
(memory). For more information about the utility, see Preparing
for hot repair or upgrade.
Customer
Service representative must
verify
Hot-node add and
concurrent GX
adapter add
System firmware enforces the node and GX adapter plugging
order. Only the next GX adapter slot or node position based on
the plugging order is available
Customer
Service representative must
verify
Concurrent GX
adapter add
Ensure that there is a GX adapter slot reservation for each GX
adapter that is being added. Each installed GX adapter and GX
adapter slot reservation consumes approximately 128 MB of
memory. Additional GX adapter slots can be reserved using the
Advanced System Management Interface (ASMI). New GX
adapter slot reservations take effect the next time the system is
powered on.
Customer
Concurrent GX
adapter add
The GX adapter cannot be added to a node that has been
deconfigured. The node must be repaired and reconfigured before
another GX adapter can be added. To verify that all nodes in the
system are configured, see Using the ASMI to view deconfigured
resources.
Customer
Service representative must
verify onsite
1
Hot-node add
When multiple hot adds are planned that include node and GX
adapter adds, the GX adapter(s) should be installed in the node
prior to the node being installed in the system.
Service representative
Hot-node add and
concurrent GX
adapter add
For multiple upgrades that include node or GX adapter adds, as
well as I/O drawer adds, the node or GX adapter add must be
completed first. The I/O drawer can then be added as a separate
concurrent I/O drawer add.
Service representative
Hot-node repair
and hot GX
adapter repair
Ensure that the replacement parts are the same type as the parts
that are being removed.
Service representative
Hot-node repair
and hot GX
adapter repair
Replace only the existing parts. New hardware must not be added
during a Hot-node repair or a hot GX adapter repair.
Service representative
Hot-node repair
Ensure that existing node hardware is not moved during the
Hot-node repair.
Service representative
Planning for CEC hot-node add and hot-node repair maintenance 5
Table 1. CEC hot-node add and hot-node repair maintenance operations checklists for models 9117-MMB,
9117-MMC, 9179-MHB, and 9179-MHC (continued)
Operations Models 9117-MMB, 9117-MMC, 9179-MHB, and 9179-MHC tasks Responsibility
1
Hardware in a node can fail at any time. When this happens a node can become unconfigured during system IPL
to isolate the failure. The failure and unconfigured event can occur after the hardware upgrade package has arrived
at the customer site and the service representative has been dispatched. The service representative must ask the
customer if the system has any outstanding deferred repairs.
Note: The hot-node add function adds and enables the I/O adapters only in the new node.
CEC hot-node add and hot-node repair maintenance operations
checklists for model 9119-FHB
The planning checklists contain a list of tasks that must be completed before you perform CEC hot-node
add and hot-node repair maintenance operations on an IBM Power 795 (9119-FHB) system. It also lists
which tasks are associated with each type of operation.
For guidelines that apply to all CEC adapter hot-add and hot-repair maintenance operations, see
“Prerequisites for CEC hot-add and hot-repair” on page 1. These general guidelines must be followed as
well as the items in the following checklist.
Table 2. CEC hot-node add and hot-node repair maintenance operations checklists for model 9119-FHB
Operations Model 9119-FHB tasks Responsibility
Hot-node add,
hot-node upgrade
(memory), and
hot-node repair
Engineering change announcement (ECA) 256 must be installed
on the system before starting a CEC hot-node add and hot-node
repair maintenance operation. This ECA ensures that the system
controllers are at the minimum hardware EC level that is required
to perform one of these operations on a model 9119-FHB. Contact
your authorized service provider for more information on ECA
256.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, and
hot GX adapter
repair
If the system is managed by a Hardware Management Console
(HMC), verify that the HMC level is the same as, or later than,
the level shown in “Model 9119-FHB system firmware and HMC
levels” on page 12.
For information about disconnecting an HMC from a managed
system, see Removing HMC connection data.
Verify that the system firmware level is the same as, or later than,
the level shown in “Model 9119-FHB system firmware and HMC
levels” on page 12.
For information about determining firmware levels using the
Hardware Management Console (HMC), see Using the HMC to
view the existing firmware levels.
Customer
Service representative must
verify
Hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
If the system has one or more partitions running IBM i, PTF
MF52971 must be must be downloaded from the Fix Central
website (http://www-933.ibm.com/support/fixcentral) and
installed.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory), and
concurrent GX
adapter add
All serviceable hardware events must be repaired and closed
before starting a hot add or upgrade operation. This eliminates
the possibility that an existing hardware failure will cause the hot
add operation to fail. Prior to a hot-node add operation, the
existing nodes in the system must be in a functional state.
Service representative
6
Table 2. CEC hot-node add and hot-node repair maintenance operations checklists for model 9119-FHB (continued)
Operations Model 9119-FHB tasks Responsibility
Hot-node repair
and hot GX
adapter repair
Other hardware serviceable events outside of the CEC must be
repaired before starting a hot repair operation. This eliminates the
possibility of other hardware failures interfering with the hot
repair operation.
Service representative
Hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
v The Prepare for Hot Repair or Upgrade (PHRU) utility must be
run by the system administrator to determine the processor,
memory and I/O resources that need to be freed.
v All resources identified by the PHRU utility must be freed up
by the system administrator prior to the start of hot upgrade or
repair procedure by the SSR.
Customer
Service representative must
verify
Hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
If there is only one partition running on the system, the system
can be powered down if that partition must be shut down during
the CEC hot add or repair operation. To verify or change the
setting that controls this behavior, see System power off policy in
Controlling the system power using the ASMI at Managing the
Advanced System Management Interface.
Service representative
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, hot
GX adapter repair
Verify that service processor redundancy is enabled.
To enable service processor redundancy on the managed system
on an HMC, in the navigation area, select Serviceability > FSP
Failover > Setup. Failover status is displayed in the contents area
and can be enabled or disabled.
Note: The server must be power cycled for the enabling of service
processor redundancy to take effect. If power cycling is required,
the node can be added nonconcurrently (with power off).
For information about redundancy on the HMC, see FSP Failover.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, hot
GX adapter repair
If the system is managed by an HMC, verify the network
connections between the system service processors and the HMC
before the service representative arrives by using the service
processor status function on the HMC. For information about
testing the network connections on the HMC, see Testing the
connection between the HMC and the managed system.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
concurrent GX
adapter add, hot
GX adapter repair
Ensure that Electronic Service Agent (ESA) is enabled to call
home.
For additional details about the call home function on the HMC,
see Manage Systems Call-Home.
Customer
Service representative must
verify
Hot-node add,
hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
For 9119-FHB systems, you should quiesce the server before the
hot-node operations. This means that all critical applications must
be halted or moved to another server using Live Partition
Mobility (LPM) before the operation begins. Noncritical
applications can be left running. The partitions can be left running
at the operating system command prompt.For additional
information on a system managed by an HMC, see Mobility in
the Systems management for partitions topic in Managing the
Hardware Management Console.
Customer
Hot-node upgrade
(memory)
Verify that more than one node is present in the system. Hot-node
upgrade (memory) cannot be done if the system has only one
node.
Customer
Service representative must
verify
Planning for CEC hot-node add and hot-node repair maintenance 7
Table 2. CEC hot-node add and hot-node repair maintenance operations checklists for model 9119-FHB (continued)
Operations Model 9119-FHB tasks Responsibility
Hot-node add and
hot-node upgrade
(memory)
Obtain and apply Capacity Upgrade on Demand (CUoD)
activation codes prior to the operation so that the memory
resources are available immediately after the hot add operation.
For information about CUoD activation codes, see Capacity
Upgrade on Demand activation codes.
Customer
Hot-node upgrade
(memory)
If the customer wants to dynamically add the new processors and
memory resources to activated partitions after the hot-node add,
the profiles that were used to activate the partitions must have the
maximums for memory set to the appropriate values. When the
hot-node add is complete, the processor and memory resources
can be added to the partitions using dynamic logical partitions
(DLPAR). For information about DLPAR, see Dynamic logical
partitioning.
Customer
Hot-node upgrade
(memory),
hot-node repair,
and hot GX
adapter repair
Features and capabilities that do not support resource relocation,
which is required for a hot-node upgrade (memory), hot-node
repair, and GX adapter repair:
v Systems clustered using RIO-SAN technology. This technology
is used only by IBM i customers with clustered systems using
switchable towers and virtual OptiConnect technologies.
v Systems clustered using InfiniBand technology. This capability
is typically used by high performance computing customers
using an InfiniBand switch.
v I/O Processors (IOPs) used by IBM i partitions do not support
node evacuation. Any IBM i partition that has an IOP assigned
must either have the IOP powered off, or the partition must be
powered off.
v Memory pages of 16 GB, also known as huge pages, do not
support memory relocation. Partitions with 16 GB pages must
be powered off to allow CEC hot-node add and hot-node repair
maintenance operations
Customer
Hot-node add and
concurrent GX
adapter add
System firmware enforces the node and GX adapter plugging
order. Only the next GX adapter slot or node position based on
the plugging order is available
Customer
Service representative must
verify
Concurrent GX
adapter add
Ensure that there is a GX adapter slot reservation for each GX
adapter that is being added. Each installed GX adapter and GX
adapter slot reservation consumes approximately 128 MB of
memory. Additional GX adapter slots can be reserved using the
Advanced System Management Interface (ASMI). New GX
adapter slot reservations take effect the next time the system is
powered on.
Customer
Hot-node add and
concurrent GX
adapter add
When multiple hot adds are planned that include node and GX
adapter adds, the GX adapters should be installed in the node
prior to the node being installed in the system.
Service representative
Hot-node add and
concurrent GX
adapter add
For multiple upgrades that include node or GX adapter adds, as
well as I/O drawer adds, the node or GX adapter add must be
completed first. The I/O drawer can then be added as a separate
concurrent I/O drawer add.
Service representative
8
  • Page 1 1
  • Page 2 2
  • Page 3 3
  • Page 4 4
  • Page 5 5
  • Page 6 6
  • Page 7 7
  • Page 8 8
  • Page 9 9
  • Page 10 10
  • Page 11 11
  • Page 12 12
  • Page 13 13
  • Page 14 14
  • Page 15 15
  • Page 16 16
  • Page 17 17
  • Page 18 18
  • Page 19 19
  • Page 20 20
  • Page 21 21
  • Page 22 22
  • Page 23 23
  • Page 24 24
  • Page 25 25
  • Page 26 26
  • Page 27 27
  • Page 28 28
  • Page 29 29
  • Page 30 30
  • Page 31 31
  • Page 32 32
  • Page 33 33
  • Page 34 34
  • Page 35 35
  • Page 36 36
  • Page 37 37
  • Page 38 38
  • Page 39 39
  • Page 40 40
  • Page 41 41
  • Page 42 42
  • Page 43 43
  • Page 44 44
  • Page 45 45
  • Page 46 46
  • Page 47 47
  • Page 48 48
  • Page 49 49
  • Page 50 50
  • Page 51 51
  • Page 52 52
  • Page 53 53
  • Page 54 54
  • Page 55 55
  • Page 56 56
  • Page 57 57
  • Page 58 58
  • Page 59 59
  • Page 60 60
  • Page 61 61
  • Page 62 62
  • Page 63 63
  • Page 64 64
  • Page 65 65
  • Page 66 66
  • Page 67 67
  • Page 68 68

Bull CEC hot add and hot-node repair maintenance Planning Guide

Type
Planning Guide

Ask a question and I''ll find the answer in the document

Finding information in a document is now easier with AI