Broadcom NetXtreme-E Tuning for AMD EPYC 7002 Series on Linux User guide

Type
User guide
Broadcom Confidential NetXtreme-E-UG600
October 1, 2020
NetXtreme-E
®
Tuning Guide for AMD
®
EPYC™ 7002 Series on Linux
User Guide
Broadcom, the pulse logo, Connecting everything, Avago Technologies, Avago, and the A logo are among the trademarks
of Broadcom and/or its affiliates in the United States, certain other countries, and/or the EU. AMD and EPYC are trademarks
of AMD.
Copyright © 2020 Broadcom. All Rights Reserved.
The term “Broadcom” refers to Broadcom Inc. and/or its subsidiaries. For more information, please visit www.broadcom.com.
Broadcom reserves the right to make changes without further notice to any products or data herein to improve reliability,
function, or design. Information furnished by Broadcom is believed to be accurate and reliable. However, Broadcom does
not assume any liability arising out of the application or use of this information, nor the application or use of any product or
circuit described herein, neither does it convey any license under its patent rights nor the rights of others.
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
Broadcom Confidential NetXtreme-E-UG600
2
Broadcom Confidential NetXtreme-E-UG600
3
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
Table of Contents
1 Introduction ..................................................................................................................................................................5
2 AMD EPYC 7002 Series Microarchitecture ................................................................................................................5
2.1 Core Cache Dies (CCD) and Core Cache Complexes (CCX) .............................................................................5
2.2 NUMA and NUMA Per Socket (NPS) ..................................................................................................................6
2.2.1 NPS=1 .......................................................................................................................................................6
2.2.2 NPS=2 .......................................................................................................................................................6
2.2.3 NPS=4 .......................................................................................................................................................7
2.3 Memory Optimizations .........................................................................................................................................7
2.3.1 Platforms Specifically Designed for AMD EPYC 7002 ..............................................................................7
3 BIOS Tuning .................................................................................................................................................................8
3.1 NPS (NUMA Per Socket) .....................................................................................................................................8
3.2 X2APIC ................................................................................................................................................................9
3.3 Determinism Control and Determinism Slider....................................................................................................10
3.4 APBDIS..............................................................................................................................................................11
3.5 Preferred I/O and Enhanced Preferred I/O ........................................................................................................12
3.6 PCIe Ten Bit Tag ...............................................................................................................................................13
3.7 Memory Clock Speed.........................................................................................................................................14
3.8 L3 LLC (Last Level Cache) as NUMA ................................................................................................................15
3.9 Socket/Inter-Chip Global Memory Interconnect (xGMI) .....................................................................................16
4 TCP Performance Tuning ..........................................................................................................................................17
4.1 BIOS Tuning ......................................................................................................................................................17
4.2 NIC Tuning.........................................................................................................................................................17
4.2.1 NUMA: Local vs. Non Local.....................................................................................................................17
4.2.2 Configuring Queues.................................................................................................................................18
4.2.3 Configure IRQ and Application Affinity ....................................................................................................19
4.2.4 TX and RX Flow Steering ........................................................................................................................19
4.2.5 TX/RX Queue Size ..................................................................................................................................20
4.2.6 Interrupt Moderation ................................................................................................................................20
4.2.7 GRO (Generic Receive Offload) ..............................................................................................................20
4.2.8 TX-NoCache-Copy ..................................................................................................................................21
4.2.9 Relaxed Ordering.....................................................................................................................................21
4.2.10 PCIe MRRS (Maximum Read Request Size) ........................................................................................22
4.3 OS Tuning (Linux)..............................................................................................................................................22
4.3.1 IOMMU ....................................................................................................................................................22
4.3.2 Performance Governor ............................................................................................................................23
4.3.3 TCP Memory Configuration .....................................................................................................................23
4.3.4 nohz=off ...................................................................................................................................................23
4.3.5 TCP Example with the BCM957508-P2100G..........................................................................................24
Broadcom Confidential NetXtreme-E-UG600
4
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
5 DPDK Tunings ............................................................................................................................................................26
5.1 BIOS Tuning ......................................................................................................................................................26
5.2 Kernel Tuning.....................................................................................................................................................26
5.3 PCIe Configuration.............................................................................................................................................26
5.4 DPDK Configuration...........................................................................................................................................27
5.5 DPDK Results ....................................................................................................................................................27
6 IP Forwarding Tunings ..............................................................................................................................................27
6.1 BIOS Tuning ......................................................................................................................................................27
6.2 Kernel Tuning.....................................................................................................................................................28
6.3 NIC Tuning.........................................................................................................................................................29
6.4 IP Forwarding Results........................................................................................................................................29
7 RDMA Tunings...........................................................................................................................................................30
7.1 BIOS Tuning ......................................................................................................................................................30
7.2 OS Configuration ...............................................................................................................................................30
7.3 RDMA Results ...................................................................................................................................................31
8 Debugging Performance Issues ...............................................................................................................................31
8.1 Identifying CPU Bottleneck ................................................................................................................................31
8.2 Checking the PCIe Gen4 Speed and Width.......................................................................................................31
9 References..................................................................................................................................................................32
Revision History............................................................................................................................................................ 33
NetXtreme-E-UG600; October 1, 2020.....................................................................................................................33
Broadcom Confidential NetXtreme-E-UG600
5
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
1 Introduction
This document provides detailed steps to configure the Broadcom NetXtreme-E 100G/200G adapters for optimal
performance on AMD EPYC PCIe Gen4 architecture.
There is no single configuration that can work optimally for all scenarios and custom tuning may be required to achieve the
desired results for specific workloads. This tuning guide broadly covers each tunable item and provides a specific example
for common workloads.
2 AMD EPYC 7002 Series Microarchitecture
2.1 Core Cache Dies (CCD) and Core Cache Complexes (CCX)
The AMD EPYC 7002 series processor has a unique CPU microarchitecture and the tuning steps mentioned in this
document leverage the new architecture.
It is designed as a multi-chip package with up to nine CPU dies, the central 14 nm I/O die is surrounded by up to eight 7 nm
core chiplets also called core cache dies (CCD). Each of the CCD has two CCX (Core Cache Complexes) with up to 4 cores
and 16 MB L3 cache each.
The I/O die provides the I/O, memory function, and Infinity fabric connection between sockets. There is support for up to
eight memory channels per socket.
Figure 1: 2nd Generation EPYC
Another important aspect to consider is interconnect distance when dealing with dual socket systems. A CPU might be
accessing memory connected to the local memory controller (local memory) or it could be accessing memory from across
the socket (non-local memory). The following command can provide the details about the NUMA distances:
numactl -H
For best performance and latency, it is recommended to use the local access only as the memory access across socket
results in significant performance penalty.
Broadcom Confidential NetXtreme-E-UG600
6
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
2.2 NUMA and NUMA Per Socket (NPS)
The AMD EPYC 7002 Series Processor uses the NUMA (Non Uniform Memory Access) architecture. The architecture is
divided into four logical quadrants per socket that can be further partitioned into different NUMA domains called NPS (NUMA
Per Socket).
2.2.1 NPS=1
The processor is a single NUMA domain. For example, all the cores on the processor, memory and PCIe devices connected
to it are in one NUMA domain. Memory is interleaved across eight local memory channels so this mode offers the maximum
memory bandwidth.
Figure 2: EPYC 7002 Series Processors
NOTE: Dual socket machine in nps=1 configuration.
2.2.2 NPS=2
The NPS2 divides the CPU processor into two NUMA domains. Half of the cores and half of the memory channels connected
to the processor are grouped into each NUMA domain. Memory is interleaved across the four memory channels in each
NUMA domain.
Figure 3: EPYC 7002 Series Processors
NOTE: Dual socket machine in nps=2 configuration.
Broadcom Confidential NetXtreme-E-UG600
7
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
2.2.3 NPS=4
NPS4 partitions the CPU into four NUMA domains. As shown in Figure 4, each quadrant is a NUMA domain and memory is
interleaved across the two memory channels. PCIe devices will be local to one of four NUMA domains on the socket
depending on which quadrant of the I/O die has the PCIe root for that device. In other words, only two channels are local to
a PCIe device which may not offer enough memory bandwidth for 200 Gb/s.
Figure 4: EPYC 7002 Series Processors
NOTE: Certain EPYC 7002 series SKUs may not support NPS4 (specifically, those SKUs manufactured with six CCD dies)
and lower-stack SKUs manufactured with four CCD dies are not recommended for 200 Gb/s.
NOTE: In many cases, it is advisable to use the cores local to your PCIe device to get the optimal throughput and latency.
Going across CCD and across NUMA can result in performance degradation in some cases.
2.3 Memory Optimizations
For optimal performance, populate 8 DIMMs for 1 DPC (DIMMs per Channel) configuration, or 16 DIMMs for 2 DPC (DIMMs
per Channel) configuration, per processor. Other configurations, such as 12 DIMMs per processor, do not provide optimal
performance. 1 DPC configuration runs the memory DIMMs at a higher speed when compared to 2 DPC. OEM servers
supporting AMD EPYC 7002 Series processors are built to either support previous generations of AMD EPYC (EPYC 7001
series) or are specifically designed for AMD EPYC 7002. Contact your OEM to determine the characteristics of your servers.
2.3.1 Platforms Specifically Designed for AMD EPYC 7002
Platforms are specifically designed for AMD EPYC 7002 and are not compatible with AMD EPYC 7001 Processors.
The maximum memory bus frequency supported on these platforms is 3200 MT/s.
For throughput sensitive applications to obtain higher I/O throughput, Maximum Memory Bus Frequency can be set to
the maximum allowed (3200 MT/s) provided your memory DIMM hardware supports it. However, the Infinity Fabric
Clock on these platforms does not optimally synchronize with a memory bus frequency of 3200 MT/s, which can lead to
higher latency.
For latency-sensitive applications, better performance is obtained by setting the maximum memory bus frequency down
to 2933 MT/s, 2667 MT/s, or 2400 MT/s, since these frequencies synchronize with the Infinity fabric clock. The best
latency performance is observed by setting the maximum memory bus frequency to 2933 MT/s synchronized with the
Infinity fabric clock frequency in NPS=4 mode, although the total memory bandwidth is limited.
Broadcom Confidential NetXtreme-E-UG600
8
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3 BIOS Tuning
This section covers the various BIOS configuration options to tune the system for optimal performance. The BIOS screens
in this section are for reference only and have been captured on AMD EPYC reference platform. It is recommended to find
the equivalent settings in the target system BIOS console.
3.1 NPS (NUMA Per Socket)
NOTE: NPS=1 configuration is recommended for 200 Gb/s and above. NPS=4 is recommended for up to 100 Gb/s which
provides better CPU and memory locality.
To access the NUMA nodes per socket setting (see Figure 5), select Advanced AMD CBSDF Common Options
Memory AddressingNUMA Nodes Per Socket NPS1 SocketNPS1
Figure 5: NUMA Nodes Per Socket Settings
Broadcom Confidential NetXtreme-E-UG600
9
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.2 X2APIC
Set X2APIC = Enabled to allow the OS to work with 256 threads and improve performance over legacy APIC.
NOTE: Disable SMT if you are running an OS that does not support X2APIC and have dual socket 64 core processor.
To access the Local APIC Mode setting (see Figure 6), select Advanced AMD CBS CPU Common Options Local
APIC ModeX2APICNPS1 Socket
Figure 6: Local APIC Mode Settings
Broadcom Confidential NetXtreme-E-UG600
10
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.3 Determinism Control and Determinism Slider
Set Determinism Control to Manual and the Determinism Slider to Performance (see Figure 7) to ensure a consistent
performance across a fleet of similarly configured systems.
1. To access the Determinism Control setting, select AdvancedAMD CBSNBIO Common OptionsSMU
Common OptionsDeterminism ControlManual
2. To access the Determinism Slider setting, select AdvancedAMD CBSNBIO Common OptionsSMU
Common OptionsDeterminism SliderPerformance
Figure 7: Determinism Control/Determinism Slider Settings
Broadcom Confidential NetXtreme-E-UG600
11
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.4 APBDIS
Set APBDIS=1 to disable Algorithmic Performance Boost (see Figure 8) which subsequently disables the switching of P-
states in infinity fabric (CPU P-states remain unaffected) and forces the system to be in P0 state, which is the highest
performing infinity fabric P-state. The APBDIS states are as follows:
0: Disable APBDIS – Locks the fabric clock to the non-boosted speeds.
1: Enable APBDIS – Unlocks the fabric clock to the boosted speeds.
Auto (Default setting) – Use the default value for APBDIS. The default value is 0.
1. To access the APBDIS setting, select AdvancedAMD CBS NBIO Common Options SMU Common Options
APBDIS1
2. To access the Fixed SOC Pstate setting, select Advanced AMD CBS NBIO Common Options SMU Common
OptionsFixed SOC PstateP0
Figure 8: APBDIS Settings
Broadcom Confidential NetXtreme-E-UG600
12
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.5 Preferred I/O and Enhanced Preferred I/O
Preferred I/O (see Figure 9) is a new capability in the EPYC 7002 series BIOS that prioritizes the traffic from the selected I/
O device and facilitates the ordering of PCIe packets which reduces the overhead and results in better adapter performance.
Enhanced Preferred I/O (see Figure 9), further ensures that the same configured I/O device remains at the highest
performance by keeping its clocks at the maximum frequency.
1. To access the Preferred I/O setting, select AdvancedAMD CBSNBIO Common OptionsPreferred I/O
Manual
2. To access the Preferred I/O Bus setting select AdvancedAMD CBS NBIO Common OptionsPreferred I/O
Bus[PCIe Bus Number]
3. To access the Enhanced Preferred I/O setting, select AdvancedAMD CBSNBIO Common Options
Enhanced Preferred I/O ModeAuto/Enable P0
Figure 9: Preferred I/O and Enhanced Preferred I/O Settings
Broadcom Confidential NetXtreme-E-UG600
13
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.6 PCIe Ten Bit Tag
Enable the PCIe Ten Bit Tag (see Figure 10) to increase the number of non posted requests from 256 to 768 for better
performance. As latency increases, the increase in unique tags are required to maintain the peak performance at 16 GT/s.
To access the PCIe Ten Bit Tag setting, select Advanced AMD CBS NBIO Common Options PCIe Ten Bit Tag
Enable
Figure 10: PCIe Ten Bit Tag Settings
Broadcom Confidential NetXtreme-E-UG600
14
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.7 Memory Clock Speed
Set the Memory Clock Speed (see Figure 11) to match the maximum fabric clock speed supported by installed EPYC 7002
series server, which is either 1467 MHz or 1333 MHz (the double data rate is 2x this clock – for example, MCLK = 1467
means 2933 MTS data rate).
NOTE: A platform may be capable of supporting higher speed memory (for example, 1600 MHz memory clock) and while
this may increase the overall platform memory bandwidth, the average memory latency is higher.
1. To access the Overclock setting, select AdvancedAMD CBSUMC Common OptionsDDR4 Common
OptionsDRAM Timing ConfigurationAcceptOverclockEnabled
2. To access the Memory Clock setting, select AdvancedAMD CBS UMC Common OptionsDDR4 Common
OptionsDRAM Timing ConfigurationAcceptMemory Clock Speed1467MHz
Figure 11: Memory Clock Speed Settings
Broadcom Confidential NetXtreme-E-UG600
15
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.8 L3 LLC (Last Level Cache) as NUMA
Enable L3 as NUMA to create NUMA nodes equal to the number of L3 Caches (CCX). This helps the operating system
schedulers maintain locality to the LLC without causing unnecessary cache-to-cache transactions and improves the
performance.
NOTE: Currently this is a benchmarking feature meant for isolating L3 caches and is not recommended for production
deployments.
To access the ACPI settings (see Figure 12), select AdvancedAMD CBSDF Common OptionsACPIACPI
SRAT L3 cache As NUMA DomainEnabled
Figure 12: ACPI SRAT L3 cache As NUMA Domain Setting
Broadcom Confidential NetXtreme-E-UG600
16
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
3.9 Socket/Inter-Chip Global Memory Interconnect (xGMI)
xGMI Dynamic Link Width Management (see Figure 13) saves power during periods of low socket-to-socket data traffic
by reducing the number of active xGMI lanes per link from 16 to 8. However, under certain scenarios, involving low
bandwidth, but latency sensitive traffic, the transition from low power to full power xGMI can adversely impact latency. Setting
xGMI Link Width Control to Manual and specifying a Max Link Width of 16 forces the xGMI interface into full power mode,
eliminating any latency jitter.
NOTE: Socket/Inter-Chip Global Memory Interconnect option only applies to a 2P system.
1. To access the xGMI Link Width Control setting, select AdvancedAMD CBSSMU Common OptionsxGMI
Link Width ControlManual
2. To access the xGMI Force Link Width setting, select AdvancedAMD CBSSMU Common OptionsxGMI
Force Link Width2
3. To access the xGMI Force Link Width Control setting, select AdvancedAMD CBSSMU Common Options
xGMI Force Link Width ControlForce
4. To access the xGMI Max Link Width Control setting, select Advanced AMD CBSSMU Common Options
xGMI Max Link Width ControlAuto
Figure 13: Socket/Inter-Chip Global Memory Interconnect (xGMI) Settings
Applications that are known to be insensitive to both socket-to-socket bandwidth and latency can set a fixed link width of
eight to save power, which can divert more power to the cores for boost.
Broadcom Confidential NetXtreme-E-UG600
17
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
4 TCP Performance Tuning
4.1 BIOS Tuning
Configure all BIOS settings as shown in BIOS Tuning.
4.2 NIC Tuning
Contact the OEM to get the latest driver, firmware, and tools and follow the installation instructions.
4.2.1 NUMA: Local vs. Non Local
Non Uniform Memory Access (NUMA) is a memory architecture in which each CPU is connected to its own local memory.
The local NUMA CPUs provide faster access to the local memory (shorter distance), accessing the memory on remote
NUMA is possible, but it is slower.
Local NUMA: cat /sys/class/net/[interface]/device/numa_node
Local CPUs: cat /sys/class/net/[interface]/device/local_cpulist
Alternately, install hwloc and use the output of lstopo to find the local NUMA nodes and cpulist:
yum install hwloc hwloc-gui
lstopo --logical --output-format png > cpu.png
NOTE: Single Socket NPS=1, all cores appear local.
Broadcom Confidential NetXtreme-E-UG600
18
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
NOTE: Single Socket NPS=4, only a single node is local.
4.2.2 Configuring Queues
The Broadcom NIC supports both combined and separate queues configuration using ethtool. For most cases, the use of
combined queues is recommended. With combined queues, TX and RX queues are shared with a single IRQ.
ethtool -L [interface] combined 8 tx 0 rx 0
For granular control packet processing, but at the cost of more resources, it is recommended to use separate queues. In this
configuration, the TX and RX queue are separate and have independent IRQs giving the ability to have finer control over the
IRQ affinities.
ethtool -L [interface] combined 0 tx 4 rx 4
In either case, it is recommended to have no more than a single IRQ per physical core. It may be beneficial to only allocate
a single IRQ per local core in a CCD.
Broadcom Confidential NetXtreme-E-UG600
19
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
4.2.3 Configure IRQ and Application Affinity
IRQ affinity refers to the binding of interrupts from a specific device to one or multiple logical processors. The distribution of
the IRQs across different local logical cores results in improved performance due to better CPU utilization.
Use the following steps for IRQ affinity configuration:
1. Disable irqbalance (to prevent the service from dynamically moving your IRQ) using the following commands:
service irqbalance stop
service irqbalance disable (to keep it persistent through reboot)
2. Identify local CPUs using the following command:
cat /sys/class/net/[interface]/device/local_cpulist
3. Identify IRQ numbers using the following command:
cat /proc/interrupts | grep [interface] | awk -F ":" '{print $1}'
4. Pin each of the interrupts to a different local NUMA CPUs using the following command:
echo [cpu_core] > /proc/irq/[interface number]/smp_affinity_list
NOTE: It is preferred to use the same CPUs for application affinity which also allows cache locality between interrupts and
application threads and reduces the processing overhead. taskset and numactl tools or application-specific options
(for example, netperf with -T) can be used for configuring application locality:
taskset -c [cpu_core list] application
or
numactl -C [cpu_cores list] application
or
application-specific options, for example:
If using netperf there is a -T option to handle both server and client application affinity.
4.2.4 TX and RX Flow Steering
For the best performance, it is important to distribute traffic across multiple software queues. It distributes the traffic it and
allows the utilization of multiple CPU cores to handle the processing. There are several ways to distribute traffic across
multiple software queues.
Enabled by default, RSS (Receive Side Scaling), provides a good mechanism for RX load distribution as it hashes different
streams to separate RX queues to spread the load evenly. However, it does not consider application locality. For example,
a flow could hash to queue 0 which is being processed on core 0 but the application consuming that data is running on core
64. This does not benefit from any locality. However, if the processing is highly CPU intensive there could be a benefit to
having the application and IRQ on different processors. RFS (Receive Flow Steering) overcomes this shortcoming by
steering the packets to the CPU cores where the application thread is running and thus increasing the data cache hit rate.
Further, Broadcom NICs support this steering in HW, aRFS (Accelerated RFS).
The TX flow steering can be achieved by configuring XPS (Transmit Packet Steering) which guarantees that TX Interrupts
are generated on the same core running the application.
Broadcom Confidential NetXtreme-E-UG600
20
NetXtreme-E User Guide Tuning Guide for AMD EPYC 7002 Series on Linux
Configure XPS by setting the appropriate local CPU mask for every TX queue as shown in the following steps:
1. Configure XPS for each TX queue using the following command:
echo [cpumask] > /sys/class/net/$ifname/queues/tx-$i/xps_cpus
NOTE: Any application generating traffic on CPUs in the CPU mask interrupts the ring tx-$i specified here.
To configure aRFS, use the following steps:
1. Enable ntuple filtering (required for aRFS) using the following command:
ethtool -K [interface] ntuple on
2. Configure RFS for each RX queue using the following commands:
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
rps_flow_val = 32768/(number of rings)
echo [rps_flow_value] > /sys/class/net/[interface]/queues/rx-$i/rps_flow_cnt
NOTE: See the Linux kernel Scaling.txt for more details on both of these mechanisms.
4.2.5 TX/RX Queue Size
Increasing the TX and RX queue size helps with queuing more data for transmit and receive and helps in avoiding packet
drop during high data transfer.
Increase the TX and RX queue size to 2047 using the following command:
ethtool -G [interface] tx 2047 rx 2047
However, this is not suggested for all cases as it also results in higher latency by a bigger build up in the software rings.
There could be other side effects such as poor cache utilization if the ring size exceeds the cache size.
4.2.6 Interrupt Moderation
Interrupt moderation controls the rate of interrupts to the CPU during the TX and RX. Too many interrupts (per packet
interrupt) increase CPU usage impacting the throughput adversely while too few interrupts (after time or number of packets)
increases the latency.
Enabling adaptive-rx improves RX latency at low packet-receiving rates and improves throughput at high packet-receiving
rates and thus providing a good performance balance.
ethtool -C [interface] adaptive-rx on rx-usecs 40 rx-frames 40
4.2.7 GRO (Generic Receive Offload)
GRO is an aggregation technique to coalesce several receive packets from a stream into a single large packet, thus saving
CPU cycles as fewer packets need to be processed by the kernel. By default, GRO is accomplished in the Linux kernel,
however, Broadcom NICs support Hardware GRO.
ethtool -K [interface] rx-gro-hw on lro off gro on
Broadcom NICs support the aggregation in HW and it can co-exist with SW GRO.
  • Page 1 1
  • Page 2 2
  • Page 3 3
  • Page 4 4
  • Page 5 5
  • Page 6 6
  • Page 7 7
  • Page 8 8
  • Page 9 9
  • Page 10 10
  • Page 11 11
  • Page 12 12
  • Page 13 13
  • Page 14 14
  • Page 15 15
  • Page 16 16
  • Page 17 17
  • Page 18 18
  • Page 19 19
  • Page 20 20
  • Page 21 21
  • Page 22 22
  • Page 23 23
  • Page 24 24
  • Page 25 25
  • Page 26 26
  • Page 27 27
  • Page 28 28
  • Page 29 29
  • Page 30 30
  • Page 31 31
  • Page 32 32
  • Page 33 33
  • Page 34 34

Broadcom NetXtreme-E Tuning for AMD EPYC 7002 Series on Linux User guide

Type
User guide

Ask a question and I''ll find the answer in the document

Finding information in a document is now easier with AI