Aruba IMC Orchestrator 6.3 Solution Routine Inspection User guide

  • Hello! I am an AI chatbot trained to assist you with the Aruba IMC Orchestrator 6.3 Solution Routine Inspection User guide. I’ve already reviewed the document and can help you find the information you need or explain it in simple terms. Just ask your questions, and providing more details will help me assist you more effectively!
IMC Orchestrator 6.3 Solution
Routine Inspection Guide
The information in this document is subject to change without notice.
© Copyright 2023 Hewlett Packard Enterprise Development LP
i
Contents
About this document ·········································································1
Routine inspection guide for IMC PLAT ·················································2
Check services ···························································································································· 2
Access to IMC PLAT ·············································································································· 2
Webpage response ··············································································································· 2
Data collection ······················································································································ 3
Alarming ······························································································································ 4
Check the running environment ······································································································ 5
Size of the database backup directory ······················································································· 5
Database backup time ············································································································ 6
Time zone and system time ····································································································· 7
Cluster state ························································································································· 7
Cluster resources ·················································································································· 8
Check resource information ··········································································································· 9
Licensing state ······················································································································ 9
Pod running state ················································································································ 10
PXC database service state ·································································································· 11
Check the license server ············································································································· 12
License server network quality ······························································································· 12
License server state ············································································································· 13
License server HA state ········································································································ 14
License server client state ····································································································· 14
Check the container platform information ························································································ 15
Information about the container platform ·················································································· 15
Collection of additional logs ··································································································· 15
State of the platform nodes, pods, and services ········································································ 16
Network transmission between node IP addresses and the VIP ··················································· 18
Unnecessary or decompressed installation packages ································································· 19
Grafana and JobService file directory permissions ····································································· 20
Check automatic inspection check items ························································································ 21
One-click health check on IMC PLAT in E0706L01 or a later version ············································· 21
Automatic inspection for IMC PLAT in E06 or a version earlier than E0706L01 ······························· 35
Inspection reports ······················································································································ 41
Routine inspection guide for IMC Orchestrator ······································ 46
Perform one-click check ·············································································································· 46
Check data consistency ·············································································································· 46
IMC Orchestrator inspection reports ······························································································ 47
Routine inspection guide for IMC Orchestrator Analyzer ························· 48
Routine infrastructure inspection with IMC Orchestrator Analyzer ························································ 48
Check topology overview ······································································································ 48
Check area overview ··········································································································· 48
Check health state of an application ························································································ 48
Perform TCP flow analysis ···································································································· 48
Perform UDP flow analysis ···································································································· 49
Perform compliance analysis ································································································· 49
Perform host analysis ··········································································································· 49
Check network health ··········································································································· 49
Perform change analysis ······································································································ 49
Perform RoCE network analysis ····························································································· 50
Perform on-path flow analysis ································································································ 50
Check problem center ·········································································································· 50
Perform issue analysis ········································································································· 50
Check AI tasks ···················································································································· 50
Check SeerCollector ············································································································ 50
ii
Inspection guide for the IMC Orchestrator Analyzer backend ······························································ 51
Check CPU usage ··············································································································· 51
Check memory usage ·········································································································· 52
Check disk usage ················································································································ 52
Check key process state of the IMC Orchestrator Analyzer ························································· 53
Inspection guide for network devices ····························································································· 54
IMC Orchestrator Analyzer inspection reports ·················································································· 57
Routine inspection guide for fixed-port data center switches ···················· 61
Routine maintenance checklists for fixed-port data center switches ····················································· 61
1
About this document
This document describes the routine inspection services of the IMC Orchestrator solution in a
systematic way.
The complete routine inspection of the IMC Orchestrator solution covers IMC PLAT, IMC
Orchestrator, IMC Orchestrator Analyzer, core switches, core firewalls, and access switches.
2
Routine inspection guide for IMC PLAT
Check services
Access to IMC PLAT
Check item
Access to IMC PLAT from a browser
Targets
Time used for logging in to IMC PLAT.
Pass criteria
The system can load the dashboard within 10 seconds after login.
Example
Webpage response
Check item
Response time of the webpages to operations
Targets
Response time for the alarm management page and monitor list page of the monitor module.
Pass criteria
The system loads the pages within 5 seconds.
Example
3
Data collection
Check item
Functionality of data collection
Targets
Collection items configured in the monitoring template.
Pass criteria
The system can correctly collect data for the collection items at an interval of 5 minutes.
Example
4
Select Monitor > Monitor List > Network Monitors, and then click the device label of a
monitor-enabled device.
On the page that opens, view the monitor details.
Alarming
Check item
Alarm functionality
Targets
Verify that active alarms and alarming functionality are normal, and that alarms can be generated
and classified correctly.
Pass criteria
The system can generate active alarms and display the alarms by alarm level.
Example
Select Monitor > Alarm > Active Alarms, and view alarm information.
5
Select Monitor > Alarm > Active Alarms, and then select a severity to view alarms with that
severity.
Check the running environment
Size of the database backup directory
Check item
Database backup directory
Targets
The size of the database backup directory.
Pass criteria
The database backup directory does not exceed 50 GB.
Example
By default, IMC PLAT automatically backs up data in the database at 00:00 every day. Check the
size of the database backup directory.
6
In this example, the size of the database backup directory is 9.4 MB, less than 50 GB.
Database backup time
Check item
Last database backup time
Targets
Last database backup time.
Pass criteria
By default, the last database backup is performed early in the morning. If you have edited the
scheduled backup time, the last database backup is performed at the specified time.
Example
Log in to IMC PLAT, and navigate to the System > Backup & Restore page to check the execution
time of the last database backup record.
7
In this example, the scheduled backup time is set to 00:00, and the last database backup was
performed at 00:00.
Time zone and system time
Check item
Time zone and system time of IMC PLAT cluster
Targets
Time zone and system time of each node in IMC PLAT cluster.
Pass criteria
The time zone and system time are consistent with local situation.
Example
Execute the date command on each node in the background to check the time zone and system
time.
Cluster state
Check item
States of the three master nodes in the cluster
Targets
Cluster state.
Pass criteria
The master nodes in the cluster are in normal states.
Example
8
Log in to https://northbound service VIP:8443/matrix/ui by using the username and password
admin/Pwd@12345. Select DEPLOY > Clusters > Deploy Cluster to verify that the nodes are
normal.
Cluster resources
Check item
Usage of cluster resources and state of free resources
Targets
Cluster resources.
Pass criteria
The CPU usage and memory usage of the cluster resources do not exceed 70%.
Example
1. Log in to https://northbound service VIP:8443/matrix/ui by using the username and password
admin/Pwd@12345.
2. Open the dashboard to view resource usage of different nodes.
3. Verify that the CPU usage and memory usage of the cluster resources do not exceed 70%.
9
Check resource information
Licensing state
Check item
Software licensing and node registration of IMC PLAT is performed correctly.
Targets
Software licensing and node registration of IMC PLAT.
Pass criteria
Licenses of all nodes are accessible. The valid license names and license quantity are the same as
those on the license server.
Example
Log in to IMC PLAT, and select System > License Management > License Information to view
license information.
10
Pod running state
Check item
Running state and reboot count of each pod
Targets
Running state and reboot count of each pod.
Pass criteria
The pod state is Running, and the reboot count for each pod is less than 10.
Example
1. Execute the kubectl get pods -o wide -A | sort -nr -k5 command to check the running
state and reboot count of each pod.
2. Mark the pods that have rebooted more than 10 times as risks, and list the top 10 pods by
reboot count.
3. For a pod that has rebooted more than 10 times, identify whether its resources are sufficient. If
the resources are sufficient, contact HPE Support.
11
PXC database service state
Check item
State of the PXC database service
Targets
Running state of the service nodes in the PXC cluster.
Pass criteria
Cluster environment: The READY field of each PXC service pod displays 1/1, and the STATUS field
displays Running.
Standalone environment: The READY field of the PXC service pod displays 1/1, and the STATUS
field displays Running.
Example
1. Use K8s commands to check the database service state:
Log in to any node of the IMC PLAT backend, use the kubectl get pod -n service-software |
grep pxc command to view the state of all PXC database services.
Cluster environment:
Standalone environment:
2. Use the script to check the database service state:
12
PxcCheckStatus.sh
Log in to any backend of IMC PLAT, use the FTP tool to upload the script to any path of the
backend. Use the chmod 777 PxcCheckStatus.sh command to make the file executable,
and then use the ./PxcCheckStatus.sh command to execute the file. In a normal situation,
the output is as follows:
Cluster environment:
Standalone environment:
Check the license server
License server network quality
Check item
Quality of the network of the license server and nodes in the cluster
Targets
Check the quality of the network of the license server and nodes in the cluster.
Pass criteria
The packet loss ratio is 0%, and the average response time is shorter than 0.25 ms.
13
Example
When one cluster node pings the license server 50 times, the packet loss ratio is 0%, and the
average response time is shorter than 0.25 ms.
License server state
Check item
License server state
Targets
License server state.
Pass criteria
The license server is accessible, and the authorization state is normal.
Example
Log in to the license server through https://licenseserverIP:port number/licsmanager to verify that
the license server runs correctly.
The default port number is 28443, and the default username and password are
admin/admin@h3c.
14
Identify whether usage information has any exceptions or the authorization will expire in less than
30 days.
License server HA state
Check item
HA state of the license server
Targets
HA state of the license server.
Pass criteria
If HA is disabled, the page displays Configure HA.
If HA is enabled, the page displays HA Configuration, in which the HA state is normal.
Example
Log in to the web page, select HA.
License server client state
Check item
Client configuration and license deployment information
Targets
Client configuration and license deployment information.
Pass criteria
The client is connected to the license server.
Example
1. Log in to the license server through http://licenseserverIP:port number/licsmanage.
2. Select License Management > Connections to check the connection state of the client and
license deployment information.
15
Check the container platform information
Information about the container platform
Check item
Information about the container platform
Targets
Execute the ./env_check.sh script in the cd /opt/matrix/tools/ directory on each master node. For
information about how to use the script, execute the ./env_check.sh h command.
Pass criteria
All check items in the script pass the inspection. If a check item fails the inspection, mark the item
as a risk and locate the cause.
Example
Collection of additional logs
Check item
Collection of additional logs (other than system logs, operation logs, and running logs)
Targets
Execute the /opt/matrix/tools/matrix_log_collection.sh script on each node to collect additional
logs, including the following types:
OS basic information, network information, K8s node/pod information, docker ps a, and
additional information for matrix.
Influxdb data that contains container and node monitor data.
OS message information.
You can export any types of the preceding information as needed.
Pass criteria
By default, the additional logs are stored in the /home/matrix_log_collection directory. To edit the
storage path, edit the path="/home/matrix-log-collect" parameter in the script.
Example
16
State of the platform nodes, pods, and services
Check item
State of the platform nodes, pods, and services (docker, kubelet, etcd, and Matrix)
Targets
State of the platform nodes, pods, and services (docker, kubelet, etcd, and Matrix).
Pass criteria
The STATUS field displays Ready for each node.
The READY field displays n/n (for example, 1/1) and the STATUS field displays Running for each
pod.
For each node, the kubelet and docker services are in active (running) state and the etcd and
Matrix services on the master node are in active (running) state.
Example
Access the CLI of the master node, and use the kubectl get nodes command to view the state of
each node.
Use the kubectl get pods -n kube-system command to view the state of each pod.
17
Use the systemctl status kubelet command to view the state of the kubelet service.
Use the systemctl status etcd command to view the state of the etcd service.
/