H3C S6890 Series Troubleshooting Manual

H3C S6890 Switch Series

Troubleshooting Guide

Document version: 6W100-20190725

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New

H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are

the property of their respective owners.

The information in this document is subject to change without notice

Contents

Introduction ·····················································································1

General guidelines ······················································································································· 1

Collecting log and operating information ··························································································· 1

Collecting common log messages ···························································································· 2

Collecting diagnostic log messages ·························································································· 2

Collecting operating statistics ·································································································· 3

Contacting technical support ·········································································································· 4

Removing deployment errors ······························································4

Troubleshooting hardware ································································ 10

Switch reboot failure ··················································································································· 10

Symptom ··························································································································· 10

Troubleshooting flowchart ····································································································· 10

Solution ····························································································································· 10

Operating power module failure ···································································································· 11

Symptom ··························································································································· 11

Solution ····························································································································· 11

Newly installed power module failure ····························································································· 12

Symptom ··························································································································· 12

Solution ····························································································································· 12

Fan tray failure ·························································································································· 13

Symptom ··························································································································· 13

Solution ····························································································································· 13

Related commands ···················································································································· 15

Troubleshooting system management ················································ 15

High CPU utilization ···················································································································· 15

Symptom ··························································································································· 15

Troubleshooting flowchart ····································································································· 15

Solution ····························································································································· 15

High memory utilization ··············································································································· 17

Symptom ··························································································································· 17

Troubleshooting flowchart ····································································································· 18

Solution ····························································································································· 18

Temperature alarms ··················································································································· 20

Symptom ··························································································································· 20

Troubleshooting flowchart ····································································································· 20

Solution ····························································································································· 20

Related commands ···················································································································· 21

Troubleshooting ports ····································································· 21

10-Gigabit SFP+ fiber port fails to come up ····················································································· 21

Symptom ··························································································································· 21

Troubleshooting flowchart ····································································································· 22

Solution ····························································································································· 22

100-GE QSFP28 fiber port fails to come up ····················································································· 23

Symptom ··························································································································· 23

Troubleshooting flowchart ····································································································· 24

Solution ····························································································································· 24

Non-H3C transceiver module error message ··················································································· 25

Symptom ··························································································································· 25

Troubleshooting flowchart ····································································································· 25

Solution ····························································································································· 25

Transceiver module does not support digital diagnosis ······································································ 26

Symptom ··························································································································· 26

Troubleshooting flowchart ····································································································· 26

Solution ····························································································································· 27

Error frames (for example, CRC errors) on a port ············································································· 27

Symptom ··························································································································· 27

Troubleshooting flowchart ····································································································· 28

Solution ····························································································································· 28

Failure to receive packets ············································································································ 29

Symptom ··························································································································· 29

Troubleshooting flowchart ····································································································· 30

Solution ····························································································································· 30

Failure to send packets ··············································································································· 31

Symptom ··························································································································· 31

Troubleshooting flowchart ····································································································· 31

Solution ····························································································································· 32

Incorrect port information ············································································································· 32

Symptom ··························································································································· 32

Troubleshooting flowchart ····································································································· 33

Solution ····························································································································· 33

A port fails to come up ················································································································ 34

Symptom ··························································································································· 34

Troubleshooting flowchart ····································································································· 34

Solution ····························································································································· 34

A port flaps ······························································································································· 35

Symptom ··························································································································· 35

Troubleshooting flowchart ····································································································· 36

Solution ····························································································································· 36

CRC error packets on a port ········································································································· 37

Symptom ··························································································································· 37

Troubleshooting flowchart ····································································································· 38

Solution ····························································································································· 38

Related commands ···················································································································· 39

Troubleshooting IRF ······································································· 39

IRF fabric setup failure ················································································································ 39

Symptom ··························································································································· 39

Troubleshooting flowchart ····································································································· 40

Solution ····························································································································· 40

IRF split ··································································································································· 42

Symptom ··························································································································· 42

Troubleshooting flowchart ····································································································· 42

Solution ····························································································································· 43

BFD MAD failure ························································································································ 43

Symptom ··························································································································· 43

Troubleshooting flowchart ····································································································· 44

Solution ····························································································································· 44

LACP MAD failure ······················································································································ 47

Symptom ··························································································································· 47

Troubleshooting flowchart ····································································································· 47

Solution ····························································································································· 47

Related commands ···················································································································· 48

Troubleshooting QoS and ACL ·························································· 49

ACL application failure for unsupported ACL rules or insufficient resources ··········································· 49

Symptom ··························································································································· 49

Troubleshooting flowchart ····································································································· 50

Solution ····························································································································· 50

ACL application failure without an error message ············································································· 51

Symptom ··························································································································· 51

Troubleshooting flowchart ····································································································· 51

Solution ····························································································································· 51

Related commands ···················································································································· 52

Introduction

This document provides information about troubleshooting common software and hardware issues

with S6890 Switch Series.

This document is not restricted to specific software or hardware versions.

General guidelines

IMPORTANT:

To prevent a

n issue from causing loss of configuration, save the configuration each time you finish

configuring a feature. For configuration recovery, regularly back up the configuration to a remote

server.

When you troubleshoot S6890 switches, follow these general guidelines:

•

To help identify the cause of the issue, collect system and configuration information, including:

 Symptom, time of failure, and configuration.

 Network topology information, including the network diagram, port connections, and points

of failure.

 Log messages and diagnostic information. For more information about collecting this

information, see "Collecting log and operating information."

 Physical evidence of failure:

− Photos of the hardware.

− Status of the LEDs.

 Steps you have taken, such as reconfiguration, cable swapping, and reboot.

 Output from the commands executed during the troubleshooting process.

•

To ensure safety, wear an ESD wrist strap when you replace or maintain a hardware

component.

Collecting log and operating information

IMPORTANT:

By default, the information center is enabled. If the feature

is disabled, you must use the

info-center enable command to enable the feature for collecting log messages.

Table 1 shows the types of files that the system uses to store operating log and status information.

You can export these files by using FTP, TFTP, or USB.

In an IRF system, these files are stored on the master device. Multiple MPUs will have log files if

master/subordinate switchovers have occurred. You must collect log files from all these devices. To

more easily locate log information, use a consistent rule to categorize and name files. For example,

save log files to a separate folder for each member device, and include their slot numbers in the

folder names.

Table 1 Log and operating information

Category

File name format

Content

following items:

• Parameter settings in effect when an error occurs.

• Information about a device startup error.

• Handshaking information between member devices when

a communication error occurs.

Operating

statistics

file-basename

.gz

Current operating statistics for feature modules, including the

following items:

• Device status.

• CPU status.

• Memory status.

• Configuration status.

• Software entries.

• Hardware entries.

Collecting common log messages

1. Save common log messages from the log buffer to a log file:

By default, the log file is saved in the logfile directory of the flash memory on each member

device.

<Sysname> logfile save

The contents in the log file buffer have been saved to the file

flash:/logfile/logfile.log

2. Identify the log file on each member device:

# Display the log file on the master device.

<Sysname> dir flash:/logfile/

Directory of flash:/logfile

0 -rw- 21863 Jul 11 2013 16:00:37 logfile.log

1048576 KB total (38812 KB free)

# Display the log file on each subordinate device:

<Sysname> dir slot2#flash:/logfile/

Directory of flash:/logfile

0 -rw- 21863 Jul 11 2013 16:00:37 logfile.log

1048576 KB total (38812 KB free)

3. Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)

Collecting diagnostic log messages

1. Save diagnostic log messages from the diagnostic log file buffer to a diagnostic log file:

By default, the diagnostic log file is saved in the diagfile directory of the flash memory on each

member device.

<Sysname> diagnostic-logfile save

The contents in the diagnostic log file buffer have been saved to the file

flash:/diagfile/diagfile.log

2. Identify the log file on each member device:

# Display the log file on the master device.

<Sysname> dir flash:/diagfile/

Directory of flash:/diagfile

0 -rw- 161321 Jul 11 2013 16:16:00 diagfile.log

1048576 KB total (38812 KB free)

# Display the log file on each subordinate device:

<Sysname> dir slot2#flash:/diagfile/

Directory of flash:/diagfile

0 -rw- 161321 Jul 11 2013 16:16:00 diagfile.log

1048576 KB total (38812 KB free)

3. Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)

Collecting operating statistics

You can collect operating statistics by saving the statistics to a file or displaying the statistics on the

screen.

When you collect operating statistics, follow these guidelines:

•

possible. Network and management ports are faster than the console port.

•

Do not execute commands while operating statistics are being collected.

•

As a best practice, save operating statistics to a file to retain the information.

To collect operating statistics:

1. Disable pausing between screens of output if you want to display operating statistics on the

screen. Skip this step if you are saving statistics to a file.

<Sysname> screen-length disable

2. Collect operating statistics for multiple feature modules.

<Sysname> display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N] :

3. At the prompt, choose to save or display operating statistics:

# To save operating statistics, enter y at the prompt and then specify the destination file path.

Save or display diagnostic information (Y=save, N=display)? [Y/N] : Y

Please input the file name(*.tar.gz)[flash:/diag.tar.gz] :

Diagnostic information is outputting to flash:/diag.tar.gz.

Please wait...

Save successfully.

<Sysname> dir flash:/

Directory of flash:

…

6 -rw- 898180 Jun 26 2013 09:23:51 diag.tar.gz

1021808 KB total (259072 KB free)

# To display operating statistics on the monitor terminal, enter n at the prompt. The output from

this command varies by software version.

Save or display diagnostic information (Y=save, N=display)? [Y/N] :n

===============================================

===============display clock===============

00:08:02.487 UTC Sat 07/06/2019

=================================================

===============display version===============

H3C Comware Software, Version 7.1.070, Release 2712

H3C S6890-54HF uptime is 0 weeks, 0 days, 0 hours, 8 minutes

Last reboot reason : User reboot

Boot image: flash:/S6890-CMW710-BOOT-R2712.bin

Boot image version: 7.1.070P2214, Release 2712

Compiled Jul 18 2018 14:00:00

System image: flash:/S6890-CMW710-SYSTEM-R2712.bin

System image version: 7.1.070, Release 2712

Compiled Jul 18 2018 14:00:00

……

Contacting technical support

If you cannot resolve an issue after using the troubleshooting procedures in this document, contact

H3C Support. When you contact an authorized H3C support representative, be prepared to provide

the following information:

•

Information described in "General guidelines."

•

Product serial numbers.

This information will help the support engineer assist you as quickly as possible.

The following is the contact information for H3C Support:

•

Telephone number—400-810-0504.

•

E-mail—servi[email protected]m.

Removing deployment errors

Use the deployment checklist in Table 2 to eliminate issues that might be introduced at the

deployment stage. Select items that are suitable for your site.

Table 2 Deployment checklist

Question

Command or method

Result

Remarks

Environment and device

hardware status

Is the sensor temperature

betwee

n the

low-

temperature and

high-temperature warning

thresholds?

display environment

□OK

□Not OK

□Not related

Make sure the temperature

of each sensor is between

the low-

temperature and

high-temperature warning

thresholds.

Are the fan trays operating

correctly?

display fan

□OK

□Not OK

□Not related

Make sure the fan trays are

operating correctly.

Are sufficient power

modules installed

and are

they operating correctly?

display power

□OK

□Not OK

□Not related

Make sure the following

conditions are met:

•

You have installed

sufficient power

modules

to provide

power redundancy.

Question

Command or method

Result

Remarks

• The power modules

are operating

correctly. The

display power

command shows that

their state is Normal.

Are the LEDs all displaying

correct statuses?

Visually check the status of

LEDs on each device.

□OK

□Not OK

□Not related

The LED shows the status

of the device:

• Steady green—The

switch is operating

correctly.

• Steady red—The

switch has

failed to

pass POST

or has

problems such as fan

failure.

• Off—

The switch is

powered off

or has

failed to start up.

CPU and memory usage

Does the CPU usage

change rate exceed 10%?

Does the sustained CPU

usage exceed 60%?

display cpu-usage

□OK

□Not OK

□Not related

Execute the

display

cpu-usage

command

repeatedly.

If the CPU sustains a

usage level of over 60% or

has a change rate higher

than 10%, e

xecute the

debugging ip packet

command

to view the

packets delivered to the

CPU for analysis.

Does the memory usage

exceed 60%?

display memory

□OK

□Not OK

□Not related

If memory usage exceeds

60%, execute the

display

memory

command to

identify the module that is

using the most memory.

Ports

Is half duplex used in port

negotiation?

display interface

brief

□OK

□Not OK

□Not related

If the duplex mode of a port

is half, verify that the peer

port uses the same duplex

mode.

Is flow control

unnecessarily

enabled on

ports?

Verify the port settings.

□OK

□Not OK

□Not related

Disable flow control on the

ports.

Are large numbers of error

packets generated

continuously

in the

outbound or inbound

direction of the port?

display interface

□OK

□Not OK

□Not related

If an error counter displays

a non-zero value and the

value is increasing, check

for the following errors:

• L

ink and

optical-electrical

converter errors.

•

Port setting

inconsistencies with

the peer port.

Question

Command or method

Result

Remarks

Does the port change

between an up and down

state frequently?

display logbuffer

□OK

□Not OK

□Not related

If the port state flaps, check

for the following errors:

• L

ink and

optical-electrical

converter errors.

• O

ptical power

threshold crossing

events if the port is a

fiber port.

•

Port setting

inconsistencies with

the peer port.

Fiber ports

Do the ports at

the two

ends use the same port

settings?

display

current-configuration

interface

□OK

□Not OK

□Not related

When you connect an H3C

device to a device from

another vendor, set the

same port rate and duplex

mode settings at

the two

ends as a best practice.

Are CRC errors present on

any

fiber port? Is the

number of CRC errors

increasing?

display interface

□OK

□Not OK

□Not related

If CRC errors persist,

eplace the transceiver

module or pigtail fiber, or

lean the transceiver

module connector.

Trunk port configuration

Do the peer trunk ports use

the same PVID?

display

current-configuration

interface

□OK

□Not OK

□Not related

Make sure the same PVID

is configured on the trunk

ports between two devices.

Are the

peer ports

assigned to the same

VLANs?

display

current-configuration

interface

□OK

□Not OK

□Not related

Make sure the trunk ports

between two devices are

assigned to the same

VLANs.

For example, if you assign

a trunk port to all VLANs,

also assign its peer port to

all VLANs.

Are the peer ports

set to

the same link type?

display

current-configuration

interface

□OK

□Not OK

□Not related

Make sure the ports

between

two devices use

the same link type.

Is a loop present

in VLAN

loopback-detection

global enable vlan 1

□OK

□Not OK

□Not related

Remove ports from VLAN 1

as needed.

Spanning tree feature

Is the

timeout factor

correctly set?

display

current-configuration

□OK

□Not OK

□Not related

As a best practice, set a

timeout factor in the range

of 5 to 7

on a stable

network to avoid

unnecessary

recalculations.

Are ports

connected to

end-user devices

display

current-configuration

□OK

Verify that the output from

Question

Command or method

Result

Remarks

configured as edge ports?

interface

□Not OK

□Not related

the

display

current-configuratio

n interface

command

contains the "

stp

edged-port enable

" string

for ports connected to

end-user devices.

As a best practice,

configure ports connected

to end-user devices (PCs,

for example) as edge ports,

or disable

the spanning

tree feature on the ports.

Is the spanning tree

feature disa

bled on ports

connected to devices that

do not support spanning

tree protocols?

display

current-configuration

interface

□OK

□Not OK

□Not related

Disable

the spanning tree

feature on ports connected

to devices that do not

support

spanning tree

protocols.

Make sure the

output from the

display

current-configuratio

n interface

command

contains the "

undo stp

enable

" string for these

ports.

Is the device running

MSTP, STP, or RSTP, and

working with a Cisco

PVST+ device?

display stp

□OK

□Not OK

□Not related

As a best practice to avoid

interoperability issues, set

up a Layer 3 connection to

the Cisco device.

Do the topologies of MSTIs

meet the design?

Are there

as few

overlapping paths as

possible among MSTIs?

display

current-configuration

interface

□OK

□Not OK

□Not related

If the topologies deviate

from the design, reassign

ports to VLANs and revise

the

VLAN and instance

mappings.

For optimal load balancing,

lan VLANs and

VLAN-to-instance

mappings

to minimize

overlapping paths among

different MSTIs.

Does a TC a

ttack exist to

cause frequent STP status

changes on any ports?

display stp tc

display stp history

□OK

□Not OK

□Not related

Examine the following

items in the command

output for TC attacks:

•

Incoming and

outgoing TC/TCN

BPDU statistics.

• H

istorical port role

calculation

information.

There is a risk of TC attack

if frequent STP status

changes occur on a stable

network.

Make sure

you have

configured

the following

settings:

• Configure ports

connected to end-user

devices as edge ports,

Question

Command or method

Result

Remarks

and enable BPDU

guard. Alternatively,

disable

the spanning

tree feature

on the

ports.

• Disable the spanning

tree feature on ports

connected to devices

that do not support

spanning tree

protocols.

•

Do not disable

TC-BPDU guard.

VRRP

Is the handshake interval

correctly set?

Are the handshake

intervals of the two ends

the same?

display vrrp

□OK

□Not OK

□Not related

Change the handshake

interval to 3 seconds if the

number of VRRP groups is

less than five.

If five or more VRRP

groups exist, assign three

or five VRRP groups into

one group, and configure

the handshake interval as 3

seconds, 5 seconds, and 7

seconds for each group.

ARP

Are there ARP conflicts?

display logbuffer

□OK

□Not OK

□Not related

If the log contains ARP

conflict records, verify that

the hosts in conflict are

legitimate, and remove the

conflicts.

OSPF

Is the router ID of the

device unique on the

network?

display ospf peer

□OK

□Not OK

□Not related

Change the router ID if it is

not unique on the network.

To restart route learning

after you remove the router

ID conflict, you must

execute the

reset ospf

process

command.

Are there a lot of errors in

the output from the

display

ospf statistics error

command?

display ospf

statistics error

□OK

□Not OK

□Not related

If a large number of OSPF

errors has occurred and

the number

continues to

increase,

collect the error

information

for further

analysis.

Are there

severe route

flappings?

display ip

routing-table

statistics

□OK

□Not OK

□Not related

Examine the statistics for

added and

deleted routes

during the system uptime.

If route flapping occurs,

locate the flapping route

and the source device to

Question

Command or method

Result

Remarks

analyze the cause. You

can use the

display

ospf lsdb

command

multiple times to view the

age

of routes and locate

the flapping route.

Is the OSPF status stable?

display ospf peer

□OK

□Not OK

□Not related

View the up time of the

OSPF neighbor.

Routes

Is the default route

correct?

Are

there any routing

loops?

tracert

debug ip packet

□OK

□Not OK

□Not related

Use the

tracert

command to trace the path

to a nonexistent network

(1.1.1.1

, for example) to

check for routing loops. If a

routing loop exists, check

the configuration of the

involved devices for errors.

Adjust the route to remove

the loop.

Use the

debug ip packet

command to check for

packets with TTL 0 or 1. If

TTL exceeded packets are

received, check for network

route errors.

CPU security

Are there packet attacks

on CPU?

debug rxtx softcar show

□OK

□Not OK

□Not related

Execute the

debug rxtx

softcar show

command

in probe view to view

packet rate limit

information for cards.

The CPU is under attack if

the number of packets of a

type keeps increasing

unusually.

Records in the local log

buffer

Does the local log buffer

contain exception records?

•

In standalone mode:

local logbuffer

slot slot-number

display

•

In IRF mode:

local logbuffer

chassis

chassis-number

slot slot-

number

display

□OK

□Not OK

□Not related

Execute the

local

logbuffer display

command in probe view.

the local log buffer

contains exception

records, contact H3C

Support to troubleshoot the

exceptions.

Use the following

commands in probe view to

clear the history records

after the

exceptions are

removed:

•

In standalone mode:

local logbuffer

slot slot-number

clear

Question

Command or method

Result

Remarks

• In IRF mode:

local logbuffer

chassis

chassis-number

slot slot-number

clear

Troubleshooting hardware

This section provides troubleshooting information for common hardware issues.

NOTE:

This section describes how to troubleshoot

switch reboot failure, power module failure, and fan tray

failure

. To troubleshoot transceiver modules, ports, and temperature alarms, see "Troubleshooting

ports" and "Troubleshooting system management."

Switch reboot failure

Symptom

The switch fails to reboot.

Troubleshooting flowchart

Figure 1 Troubleshooting switch reboot failure

Solution

System software

image correct?

Memory runs correctly?

Reload the system

software image

Resolved?

Contact the support

Switch reboot failure

Yes No

Yes

End

Error continues to

be reported?

Replace the switch Resolved?

No Yes

Yes

Replace the switch

To resolve the issue:

1. Verify that the system software image on the switch is correct.

a. Log in to the switch through the console port and restart the switch. If the system reports

that a CRC error occurs or that no system software image is available during the BootWare

loading process, reload the system software image.

b. Verify that the system software image in the flash memory is the same size as the one on

the server. If no system software image is available in the flash memory, or if the image size

is different from the one on the server, reload the system software image. Then set the

reloaded system software image to the current system software image.

The system software image in the flash memory is automatically set to the current system

software image during the BootWare loading process.

2. Verify that the memory is running correctly.

Reboot the switch, and immediately press CTRL+T to examine the memory. If a memory fault is

detected, replace the switch.

3. Verify that no error is reported during the BootWare loading process.

If the memory is running correctly but there are still errors reported during the BootWare loading

process, replace the switch.

4. If the issue persists, contact H3C Support.

Operating power module failure

Symptom

An operating power module fails.

Solution

To resolve the issue:

1. Identify the operating state of the power module.

 Execute the display power command to view the operating state of the power module.

<Sysname> display power

Input Power:132W

PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type

1 Absent -- -- -- -- ---

2 Normal -- -- -- -- PSR300-A

 Execute the display alarm command to view alarm information about the power module.

<Sysname> display alarm

Slot CPU Level Info

1 0 INFO Chassis 1 power 1 is absent.

If the power module is in Absent state, go to step 2. If the power module is in Fault state, go to

step 3.

2. Verify that the power module is installed securely.

Remove and reinstall the power module to ensure that the power module is installed securely.

Then execute the display power command to verify that the power module has changed to

Normal state. If the power module remains in Absent state, replace the power module.

3. Verify that the power module is operating correctly.

a. Verify that the power cord is connected to the power module securely.

<Sysname> display power

Input Power:132W

PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type

1 Absent -- -- -- -- ---

2 Normal -- -- -- -- PSR300-A

If the voltage and current of the power module are 0 and the power module state is Fault,

the power cord is disconnected. Connect the power cord securely to the power module.

Then execute the display power command to verify that the power module has changed

to Normal state.

b. Determine whether the power module is in high temperature. If dust accumulation on the

power module causes the high temperature, remove the dust. Then remove and reinstall

the power module. Execute the display power command to verify that the power module

has changed to Normal state.

c. Install the power module into an empty power module slot. Then execute the display

power command to verify that the power module has changed to Normal state in the new

slot. If the power module remains in Fault state, replace the power module.

4. If the issue persists, contact H3C Support.

Newly

installed power module failure

Symptom

A newly installed power module fails.

Solution

To resolve the issue:

1. Identify the operating state of the power module.

 Execute the display power command to view the operating state of the power module.

<Sysname> display power

Input Power:132W

PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type

1 Absent -- -- -- -- ---

2 Normal -- -- -- -- PSR300-A

 Execute the display alarm command to view alarm information about the power module.

<Sysname> display alarm

Slot CPU Level Info

1 0 INFO Chassis 1 power 1 is absent.

If the power module is in Absent state, go to step 2. If the power module is in Fault state, go to

step 3.

2. Verify that the power module is installed securely.

a. Remove and reinstall the power module to make sure the power module is installed

securely. Then execute the display power command to verify that the power module has

changed.

b. Remove and install the power module into an empty power module slot. Then execute the

display power command to verify that the power module has changed to Normal state

in the new slot. If the power module remains in Absent state, go to step 4.

3. Verify that the power module is operating correctly.

a. Verify that the power module is connected to the power source correctly. If it is not, connect

it to the power source correctly. Then execute the display power command to verify that

the power module has changed.

b. Remove and install the power module into an empty power module slot. Then execute the

display power command to verify that the power module has changed to Normal state in

the new slot. If the power module remains in Fault state, go to step 4.

4. If the issue persists, contact H3C Support.

Fan tray failure

Symptom

An operating fan tray or a newly installed fan tray fails.

Solution

To resolve the issue:

1. Identify the operating state of the fan tray.

 Execute the display fan command to view the operating state of the fan tray.

<Sysname> display fan

Fan-tray 1:

Status : Normal

Fan Type : LSWM1FANSA

Fan number: 2

Fan mode : Auto

Airflow Direction: Port-to-power

Fan Speed(rpm)

--- ----------

1 10692

2 9105

Fan-tray 2:

Status : Normal

Fan Type : LSWM1FANSA

Fan number: 2

Fan mode : Auto

Airflow Direction: Port-to-power

Fan Speed(rpm)

--- ----------

1 10702

2 9133

Fan-tray 3:

Status : Normal

Fan Type : LSWM1FANSA

Fan number: 2

Fan mode : Auto

Airflow Direction: Port-to-power

Fan Speed(rpm)

--- ----------

1 10692

2 9162

Fan-tray 4:

Status : Normal

Fan Type : LSWM1FANSA

Fan number: 2

Fan mode : Auto

Airflow Direction: Port-to-power

Fan Speed(rpm)

--- ----------

1 10731

2 9183

Fan-tray 5:

Status : Normal

Fan Type : LSWM1FANSA

Fan number: 2

Fan mode : Auto

Airflow Direction: Port-to-power

Fan Speed(rpm)

--- ----------

1 10672

2 9183

 Execute the display alarm command to view alarm information about the fan tray.

<Sysname> display alarm

Slot CPU Level Info

1 0 INFO Chassis 1 power 1 is absent.

If the fan tray is in Absent state, go to step 2. If the fan tray is in Fault state, go to step 3.

2. Verify that the fan tray is installed securely.

Remove and reinstall the fan tray to ensure that the fan tray is installed securely. Then execute

the display fan command to verify that the fan tray has changed to Normal state. If the fan

tray remains in Absent state, replace the fan tray.

3. Verify that the fan tray is operating correctly.

a. Identify whether the fan tray is faulty.

− Execute the display environment command to view temperature information.

If the temperature continues to rise, put your hand at the air outlet to feel if air is being

expelled out of the air outlet. If no air is being expelled out of the air outlet, the fan tray is

faulty.

− Execute the display fan command to view the fan speed information.

If the fan speed is less than 500 rpm, the fan tray is faulty.

b. If the fan tray is faulty, remove and reinstall the fan tray to make sure the fan tray is installed

securely. Then execute the display fan command to verify that the fan tray has changed

to Normal state.

c. If the fan tray remains in Fault state, replace the fan tray.

You must make sure the switching operating temperature is below 60°C (140°F) while you

replace the fan tray. If a new fan tray is not readily available, power off the switch to avoid

damage caused by high temperature.

4. If the issue persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting the hardware.

Command

Description

display alarm

Displays alarm information.

display environment

Displays temperature information.

display fan

Displays the operating states of the fan tray.

display power

Displays power module information.

Troubleshooting system management

This section provides troubleshooting information for common system management issues.

High CPU utilization

Symptom

The sustained CPU utilization on a device is apparently higher than the CPU utilization on other

devices.

Troubleshooting flowchart

Figure 2 Troubleshooting high CPU utilization

Solution

To resolve the issue:

1. Identify the job that has a high CPU utilization. For example:

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display process cpu slot 1

Identify the job that has a

high CPU utilization

Display the stack of the

job

High CPU utilization

Contact the support

CPU utilization in 5 secs: 6.0%; 1 min: 5.6%; 5 mins: 5.7%

JID 5Sec 1Min 5Min Name

1 0.0% 0.0% 0.0% scmd

2 0.0% 0.0% 0.0% [kthreadd]

3 0.0% 0.0% 0.0% [migration/0]

4 0.0% 0.0% 0.0% [ksoftirqd/0]

5 0.0% 0.0% 0.0% [watchdog/0]

6 0.0% 0.0% 0.0% [migration/1]

7 0.0% 0.0% 0.0% [ksoftirqd/1]

8 0.0% 0.0% 0.0% [watchdog/1]

9 0.0% 0.0% 0.0% [migration/2]

10 0.0% 0.0% 0.0% [ksoftirqd/2]

11 0.0% 0.0% 0.0% [watchdog/2]

12 0.0% 0.0% 0.0% [migration/3]

13 0.0% 0.0% 0.0% [ksoftirqd/3]

14 0.0% 0.0% 0.0% [watchdog/3]

15 0.0% 0.0% 0.0% [migration/4]

16 0.0% 0.0% 0.0% [ksoftirqd/4]

17 0.0% 0.0% 0.0% [watchdog/4]

18 0.0% 0.0% 0.0% [migration/5]

19 0.0% 0.0% 0.0% [ksoftirqd/5]

20 0.0% 0.0% 0.0% [watchdog/5]

21 0.0% 0.0% 0.0% [migration/6]

The output shows the average CPU usage values of jobs for the last 5 seconds, 1 minute, and

5 minutes. Typically, the average CPU usage of a job is less than 5%.

2. Display the job's stack. In this example, the job uses the ID of 284.

[Sysname-probe]follow job 284 slot 1

Attaching to process 284 ([OPTK])

Iteration 1 of 5

------------------------------

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0

[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450

[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]

[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]

[<ffffffff80266470>] kthread+0x140/0x150

[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20

Iteration 2 of 5

------------------------------

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0

[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450

[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]

[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]

[<ffffffff80266470>] kthread+0x140/0x150

[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20

Iteration 3 of 5

------------------------------

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0

[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450

[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]

[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]

[<ffffffff80266470>] kthread+0x140/0x150

[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20

Iteration 4 of 5

------------------------------

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0

[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450

[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]

[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]

[<ffffffff80266470>] kthread+0x140/0x150

[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20

Iteration 5 of 5

------------------------------

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0

[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450

[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]

[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]

[<ffffffff80266470>] kthread+0x140/0x150

[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20

3. Save the information displayed in the previous steps and use the display

diagnostic-information command to collect diagnostic information.

4. Contact H3C Support.

High memory utilization

Symptom

The display memory command shows that the memory utilization of the device is higher than 60%

during a period of time (typically 30 minutes).

H3C S6890 Series Troubleshooting Manual

Related papers

Other documents