Huawei FusionServer Pro G5500 Routine Maintenance

  • Hello! I am an AI chatbot trained to assist you with the Huawei FusionServer Pro G5500 Routine Maintenance. I’ve already reviewed the document and can help you find the information you need or explain it in simple terms. Just ask your questions, and providing more details will help me assist you more effectively!
www.huawei.com
Copyright © Huawei Technologies Co., Ltd. 2020
FusionServer Pro
G5500
Server Routine
Maintenance
Page 2
Copyright © Huawei Technologies Co., Ltd. 2020
About This Document
This document describes the routine maintenance and
troubleshooting of the FusionServer Pro G5500 server.
Page 3
Copyright © Huawei Technologies Co., Ltd. 2020
Objectives
Upon completion of this course, you will understand:
How to implement routine inspection and maintenance on the
FusionServer Pro G5500 server
Server fault diagnosis methods
Server log collection methods
Server troubleshooting methods
Process and precautions for server component replacement
How to obtain help to solve common problems
Page 4
Copyright © Huawei Technologies Co., Ltd. 2020
Contents
1. Server Routine Maintenance
1.1 Maintenance Preparations
1.2 Routine Inspection
2. Server Troubleshooting
Page 5
Copyright © Huawei Technologies Co., Ltd. 2020
1.1 Maintenance Preparations-Hardware
Tools
The following table lists the hardware tools required for routine
maintenance of the server. (Prepare them on demand in advance.)
Name Description
Floating nut hook
Used to guide floating nuts to the holes in the mounting bars of a rack.
Screwdriver
Used to tighten and loosen screws. A screwdriver can be a flat
-head, Phillips, or hex
screwdriver.
Diagonal pliers
Used to trim insulation tubes and cable ties.
Multimeter
Used to measure resistance and voltage and to check conductivity.
ESD wrist strap
Used to prevent ESD damage when you touch or operate devices or components.
ESD gloves
Used to prevent ESD damage when you insert, remove, and hold a board or hold a
precision instrument.
Cable tie
Used to bind cables.
Ladder
Used to install devices at heights.
Portable computer
Used to access the management network port or service network port over the network to
capture data. (Prepare a network cable yourself to connect the portable computer to a
server.)
Serial cable
Used to connect the serial port on the server. The serial port is usually a DB9 or RJ45 port.
Thermometer and
hygrometer
Used to measure the equipment room temperature and relative humidity.
Page 6
Copyright © Huawei Technologies Co., Ltd. 2020
1.1 Maintenance Preparations-Software
Tools
The following table lists the software tools required for routine
maintenance of the server. (Prepare them on demand in advance.)
Name Description
SSH client
Used to access the Linux system and transfer files through the CLI of the Windows client.
(The SSH client is an open source tool.)
uTest tool
Used to detect the drives, DIMMs, SSD cards, BBUs, and used for the burn
-in factory
test of the server.
Inspection tool
Used for remote batch inspection and out
-of-band log collection of the server.
Fusion upgrade tool
Used to upgrade the server firmware intelligent baseboard management controller
(iBMC
)/BIOS and configure the BIOS in batches.
Decompression software
Used to compress and decompress files. Prepare the third
-party decompression
software yourself.
Office software
Used to edit Word and Excel documents. Prepare the third
-
party Office software yourself.
bmc_collect.sh
mm_collect.sh
Used to collect out
-of-band logs of the server.
Contact Technical Assistance Center (TAC)
engineers to obtain these two files.
Collection.sh
WinInfoCollection.bat
Used to collect Linux/Windows logs. Contact TAC engineers to obtain these two files.
Page 7
Copyright © Huawei Technologies Co., Ltd. 2020
1.1 Maintenance Preparations-Essential
Materials
The following table lists the materials that you must read before routine
maintenance of the server.
Name Description Obtaining Documents
User Guide
Describes the server structure,
specifications, installation, removal,
configuration, parts replacement, and
standards compliance. Each Huawei
server has a user guide.
Visit
http://support.huawei.com/enterprise/en/index.ht
ml
, choose Support > Product Support > IT
>
Server
, and go to the corresponding server
directory.
Maintenance Guide
Describes the server structure,
specifications, installation, removal,
configuration, parts replacement, and
standards compliance. Each Huawei
server has a maintenance guide.
Alarm Reference
Describes the common alarms reported
to the server
iBMC or Hyper
Management Module (HMM) and alarm
handling suggestions. Each Huawei
server has an alarm reference.
Equipment Room
Management
Regulations
Describes the regulations for equipment
room management.
Comply with the customer's equipment room
management regulations during onsite
maintenance.
Page 8
Copyright © Huawei Technologies Co., Ltd. 2020
Contents
1. Server Routine Maintenance
1.1 Maintenance Preparations
1.2 Routine Inspection
2. Server Troubleshooting
2.1 Troubleshooting Flowchart
2.2 Fault Information Collection Methods
2.3 Fault Diagnosis and Locating
2.4 Parts Replacement Process
2.5 Typical Cases
2.6 Help-seeking Channels
Page 9
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Purposes
The purposes of routine maintenance and inspection are as follows:
Find and remove the potential defects or hazards during device operating in a
timely manner. Take measures to rectify the faults to ensure normal device
operating and reduce the device fault rate.
Monitor the running status and trends of devices and networks in real time, which
improves the maintenance personnel's efficiency of handling emergencies.
Periodically maintain devices to ensure that devices run properly and that the
system runs safely, stably, and reliably.
Periodically check, test, and clean devices and back up data. This helps you find
the defects (such as natural aging, function disabling, and performance
deterioration) that occur during device operating. Take measures to handle the
defects to rectify potential faults and avoid accidents.
Page 10
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Basic Rules
Name a device uniquely.
Create logs for recording the troubleshooting.
Make one change at a time, and record the change result.
Use the tools, resources, and software provided by Huawei.
Be aware of the latest updates of the operating system (OS) and
applications.
Make a reliable backup plan.
Prepare spare parts onsite. Once a component is faulty, replace it with a
new one in a timely manner.
Keep the latest network topology, which helps rectify network faults.
Page 11
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Onsite inspection includes inspection of the equipment room environment
and device running status.
Icon Description
Indicates a hazard. To prevent an electric shock, do not remove the cover of the component.
Warning: All components with this icon have electric shock risks and there are no serviceable parts
inside these components.
Indicates a hazard. Operation of the component may cause an electric shock. There are no
serviceable parts inside the component, and therefore do not remove the cover of the component.
Warning: To prevent an electric shock, do not remove the cover of the component.
Indicates high temperatures.
Warning: Be careful and do not touch the component before it cools down. Otherwise, you may get
burnt.
Indicates a hazard. Misoperations may damage the device or cause personal injury.
Indicates the external grounding of the device. Both ends of a ground cable are connected to
different devices, and the devices must be grounded by connecting to ground points. This ensures
proper device operating and personal safety.
Indicates the grounding inside the device. Both ends of a ground cable are connected to different
components of a devices, and the device must be grounded by connecting to ground points. This
ensures proper device operating and personal safety.
Indicates an ESD-sensitive area. Do not touch the device with bare hands. When you operate a
device in an ESD-sensitive area, take ESD measures, such as wearing an ESD wrist strap or ESD
gloves.
Page 12
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Equipment room environment inspection covers temperature, humidity,
and power supply.
No. Item Result Reference
1
Operating
temperature
10
°
C to 35
°
C (41
°
F to 95
°
F)
2
Storage
temperature
40
°
C to +65
°
C (40
°
F to +149
°
F)
3
Maximum
fluctuation rate
15
°
C/h (59
°
F/h)
4 Operating humidity
8% to 90% RH (non
-condensing)
5 Storage humidity
5% to 95% RH (non
-condensing)
6 Operating altitude
≤ 3000 m
(9842 ft.)
7 PSUs
AC input: 100 V to 240 V AC at 50 Hz or 60 Hz
DC input:
48 V DC (nominal voltage), 38.4 V to
57.6 V DC (voltage range)
Page 13
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Refer to the following table for cable layout inspection. Obtain customer
authorization before removing or connecting cables.
No.
Check Item
Check
Result
Remarks
1
General cable
layout
2
Power cable
layout
3
Service cable
layout
4
Ground cable
connection
5 Cable labels
6
Power cable
connector
are securely connected to power sockets.
7
Signal cable
connector
Page 14
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Refer to the following table for checking the server running status.
No. Item Remarks
1
Server
indicator
inspection
The front and rear panels of Huawei servers provide multiple indicators and
buttons such as the UID button/indicator, health indicator, network port
status indicator, and power button/indicator. You can observe the indicators
on a server to determine the server status.
For description of the indicator
status, see the server product documentation.
2
iBMC health
inspection
Use the onsite management network for
preventive maintenance
inspection (PMI) or connect the portable computer to the
iBMC network
port.
Log in to the iBMC WebUI and query the health status. For details
about the alarms, see
iBMC Alarm Reference.
3
HMM health
inspection
Run the
ipmcget -d healthevents command on the CLI to obtain the
health information about the HMM. For details about the alarms, see
HMM
Alarm Reference
. Most alarms are converged to the iBMC.
Page 15
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Refer to the following table for preparing a report after onsite inspection.
XXX Server Preventive Maintenance Report
Inspected
By/Contact
Information
Inspection Time
Inspection Address
Customer's
Maintenance
Personnel
Onsite
Coordinator
Primary Fault
Coordinator
Huawei On-duty Site
Engineer
Maintenance Hotline
Enterprise China region: 4008229999
Enterprise global TAC: http://e.huawei.com/en/service-hotline
Carrier China region TAC: (customers) 400830218/800830218/02986360000;
(engineers and partners) 8008303118/02981770177
Carrier global TAC: 02981770999
Host SN/Server SN
Device Location
Page 16
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Inspection Item Subitem Description
Inspection
Result
Remarks
Health indicator on
the front panel
Status of the
system fault
indicator
If the indicator is steady red or blinking red,
the system is abnormal. If the indicator is
green, the system is operating properly.
Normal
Abnormal
Power
button/indicator on
the front panel
Status of the
system power
indicator
If the button or indicator is steady green,
the system is operating properly.
Normal
Abnormal
Drive indicator on the
front panel
Status of the
drive indicator
If the indicator is steady green or blinking
green, the drive is operating properly. If
the indicator is yellow or off, the drive is
abnormal.
Normal
Abnormal
Indicator on the rear
panel
Status of the
AC power
indicator (on
the PSU)
If the indicator is steady green, the system
is running properly. If the indicator is off,
the system is not powered on.
Normal
Abnormal
Fan module
Fan operating
status
If loud or abnormal noise is generated, the
fans are not operating properly. Otherwise,
the fans are operating properly.
Normal
Abnormal
Page 17
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Onsite Inspection
Inspection Item Subitem Description
Inspection
Result
Remarks
Network cable and
other cables
Cable
connection
status
Check whether the network cables and
optical cables are securely installed, and
whether port status indicators are on.
Normal
Abnormal
iBMC health
information
Server health
status and
alarm logs on
the iBMC
WebUI
Check the health status logs of the
server and whether alarms are
generated for thermal management and
power supply management.
Normal
Abnormal
HMM health
information
HMM health
information
query
Run the ipmcget -d healthevents
command on the CLI to obtain the health
information about the HMM. For details
about the alarms, see HMM Alarm
Reference. Most alarms are converged
to the iBMC.
Normal
Abnormal
Other Other parts
For other hardware exceptions, contact
the on-duty site engineers.
Normal
Abnormal
Remarks
For details about the indicators and how to query health and alarm information
on the iBMC and HMM WebUIs, see server product documentation shipped with
the product or obtained from http://support.huawei.com/enterprise/en/index.html.
Page 18
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Remote Inspection
Remotely access the server out-of-band management software (iBMC or
HMM) over the customer network, and use the inspection tool to check
the server health status.
The inspection tool has the following features:
Supports the GUI and CLI.
Supports 32-bit and 64-bit OSs.
Inspects one server or servers in batches.
Exports the inspection report.
Collects the server iBMC logs in batches.
Page 19
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Remote Inspection
Preparations for Using the Inspection Tool
1. Console Requirements
Used to run the inspection tool, the console refers to the customer's PC or
laptop. A Windows system or a Linux system later than SUSE Linux
Enterprise Server (SLES) 11 SP1 is recommended. (For details about the
clients supported by the tool, see the user guide.)
Other auxiliary tools, such as the Excel component used to edit
configuration files in batches, the SSH tool used to upload tools to the
Linux system console, and the compression tool (such as WinRAR) used
to decompress logs.
2. Configuration Information About Servers to Be Inspected
BMC IP address, root user password, SNMP version, and port number of
each server to be inspected
For batch inspection, the list of servers to be inspected must be edited.
The .xls, .xlsx, and .xml formats are supported. This mode also supports
single-node server inspection.
Page 20
Copyright © Huawei Technologies Co., Ltd. 2020
1.2 Routine Inspection-Remote Inspection
Log in to the remote management port iBMC of the server and perform
inspection in a remote manner.
Inspecting devices using the WebUI or CLI
/