Skip to content

Latest commit

 

History

History
 
 

ena

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
FreeBSD kernel driver for Elastic Network Adapter (ENA) family:
===============================================================

Version:
========
0.8.1

Supported FreeBSD Versions:
===========================
- FreeBSD release/11.0
- FreeBSD release/11.1
- FreeBSD 12, starting from rS3091111,
  commit cbc182c1724fcd3ee240ed9933fcc53eb6db5b9b (Nov 24, 2016).

Overview:
=========
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.

The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.

The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.

Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.

ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized
data placement.

The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.

The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.

Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds. This feature will
be implemented for driver in future releases.

Driver compilation:
===================
Prerequisites:
--------------
In order to build and run standalone driver system, the OS sources
corresponding to currently installed OS version are required.

Depending on user configuration the sources retrieval process may vary,
however, on a fresh installation on EC2 instance, the necessary steps
may look like shown below (some may require super user privileges):

pkg install subversion
mkdir /usr/src
# Get sources for current installation. This step may require
# accepting certificate.
# Getting sources may vary between system versions. The resources need
# to be adjusted accordingly.
# - for stable:
svn checkout https://svn0.us-east.FreeBSD.org/base/stable/11/ /usr/src
# - for release (FreeBSD 11.1)
svn checkout https://svn0.us-east.FreeBSD.org/base/releng/11.1/ /usr/src
# - for -CURRENT (unstable)
svn checkout https://svn0.us-east.freebsd.org/base/head /usr/src
# - for kernel version currently running in the system
uname -a # provides revision of running kernel
# Sample output:
# FreeBSD host 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r316750: current_date
# r316750 is indicating revision of current kernel
# In this example, we have to pull kernel tree with revision r316750 from head:
svn checkout -r316750 https://svn0.us-east.freebsd.org/base/head /usr/src
# r316750 must be changed to the revision number from the 'uname -a' output

Compilation:
------------
Run "make" in the amzn-drivers/kernel/fbsd/ena/ directory.
As a result of compilation if_ena.ko kernel module file is created in
the same directory.

Driver installation:
====================
loading driver:
---------------
kldload ./if_ena.ko

For automatic driver start upon OS boot
-------------------------------------------
vi /boot/loader.conf
# insert 'if_ena_load="YES"' in the above file
cp if_ena.ko /boot/modules/
sync; sleep 30;

Then restart the OS (reboot and reconnect).

Driver update - if the kernel was built with ENA
-------------------------------------------
vi /boot/loader.conf
# insert 'if_ena_load="YES"' in the above file
cp if_ena.ko /boot/modules/

# remove old module
rm /boot/kernel/if_ena.ko
sync; sleep 30;

Then restart the OS (reboot and reconnect).

Supported PCI vendor ID/device IDs:
===================================
1d0f:0ec2 - ENA PF
1d0f:1ec2 - ENA PF with LLQ support
1d0f:ec20 - ENA VF
1d0f:ec21 - ENA VF with LLQ support

ENA Source Code Directory Structure:
====================================
/*
ena.[ch]   	  - Main FreeBSD kernel driver.
ena_sysctl.[ch]   - ENA sysctl nodes for ENA configuration and statistics.

ena_com/*
ena_com.[ch]      - Management communication layer. This layer is
                    responsible for the handling all the management
                    (admin) communication between the device and the
                    driver.
ena_eth_com.[ch]  - Tx/Rx data path.
ena_admin_defs.h  - Definition of ENA management interface.
ena_eth_io_defs.h - Definition of ENA data path interface.
ena_common_defs.h - Common definitions for ena_com layer.
ena_regs_defs.h   - Definition of ENA PCI memory-mapped (MMIO) registers.
ena_plat.h        - Platform dependent code for FreeBSD.

Management Interface:
=====================
ENA management interface is exposed by means of:
- PCIe Configuration Space
- Device Registers
- Admin Queue (AQ) and Admin Completion Queue (ACQ)
- Asynchronous Event Notification Queue (AENQ)

ENA device MMIO Registers are accessed only during driver
initialization and are not involved in further normal device
operation.

AQ is used for submitting management commands, and the
results/responses are reported asynchronously through ACQ.

ENA introduces a very small set of management commands with room for
vendor-specific extensions. Most of the management operations are
framed in a generic Get/Set feature command.

The following admin queue commands are supported:
- Create I/O submission queue
- Create I/O completion queue
- Destroy I/O submission queue
- Destroy I/O completion queue
- Get feature
- Set feature
- Configure AENQ
- Get statistics

Refer to ena_admin_defs.h for the list of supported Get/Set Feature
properties.

The Asynchronous Event Notification Queue (AENQ) is a uni-directional
queue used by the ENA device to send to the driver events that cannot
be reported using ACQ. AENQ events are subdivided into groups. Each
group may have multiple syndromes, as shown below

The events are:
	Group			Syndrome
	Link state change	- X -
	Fatal error		- X -
	Notification		Suspend traffic
	Notification		Resume traffic
	Keep-Alive		- X -

ACQ and AENQ share the same MSI-X vector.

Keep-Alive is a special mechanism that allows monitoring of the
device's health. The driver maintains a watchdog (WD) handler which,
if fired, logs the current state and statistics then resets and
restarts the ENA device and driver. A Keep-Alive event is delivered by
the device every second. The driver re-arms the WD upon reception of a
Keep-Alive event. A missed Keep-Alive event causes the WD handler to
fire.

Data Path Interface:
====================
I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
SQ correspondingly). Each SQ has a completion queue (CQ) associated
with it.

The SQs and CQs are implemented as descriptor rings in contiguous
physical memory.

The ENA driver in 0.8.1 currently supports one Queue Operation mode for Tx SQs:
- Regular mode
  * In this mode the Tx SQs reside in the host's memory. The ENA
    device fetches the ENA Tx descriptors and packet data from host
    memory.

The Rx SQs support only the regular mode.

The driver supports multi-queue for both Tx and Rx. This has various
benefits:
- Reduced CPU/thread/process contention on a given Ethernet interface.
- Cache miss rate on completion is reduced, particularly for data
  cache lines that hold the mbuf structures.
- Increased process-level parallelism when handling received packets.
- Increased data cache hit rate, by steering kernel processing of
  packets to the CPU, where the application thread consuming the
  packet is running.
- In hardware interrupt re-direction.

Interrupt Modes:
================
The driver assigns a single MSI-X vector per queue pair (for both Tx
and Rx directions). The driver assigns an additional dedicated MSI-X vector
for management (for ACQ and AENQ).

Management interrupt registration is performed when the FreeBSD kernel
attaches the adapter, and it is de-registered when the adapter is
removed. I/O queue interrupt registration is performed when the FreeBSD
interface of the adapter is opened, and it is de-registered when the
interface is closed.

The management interrupt is named:
   ena-mgmnt@pci:<PCI domain:bus:slot.function>
and for each queue pair, an interrupt is named:
   <interface name>-TxRx-<queue index>

The ENA device operates in auto-mask and auto-clear interrupt
modes. That is, once MSI-X is delivered to the host, its Cause bit is
automatically cleared and the interrupt is masked. The interrupt is
unmasked by the driver after cleaning all TX and Rx packets or the cleanup
routine is being called 8 times while handling single interrupt.

Statistics:
===========
The user can obtain ENA device and driver statistics using sysctl.

MTU:
====
The driver supports an arbitrarily large MTU with a maximum that is
negotiated with the device. The driver configures MTU using the
SetFeature command (ENA_ADMIN_MTU property). The user can change MTU
via ifconfig.

Stateless Offloads:
===================
The ENA driver supports:
- TSO over IPv4/IPv6
- IPv4 header checksum offload
- TCP/UDP over IPv4/IPv6 checksum offloads

RSS:
====
- The ENA device supports RSS that allows flexible Rx traffic
  steering.
- Toeplitz and CRC32 hash functions are supported.
- Different combinations of L2/L3/L4 fields can be configured as
  inputs for hash functions.
- The driver configures RSS settings using the AQ SetFeature command
  (ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and
  ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties).
- The driver sets default CRC32 function and in 0.8.1 it cannot be
  configured manually.

DATA PATH:
==========
Tx:
---
ena_mq_start() is called by the stack. This function does the following:
- Assigns mbuf to proper tx queue according to hash type and flowid.
- Puts packet in the drbr (multi-producer, {single, multi}-consumer lock-less
  ring buffer).
- If drbr was empty before putting packet, tries to acquire lock for tx queue
  and, if succeeded, it runs ena_start_xmit() function for sending packet that
  was just added.
- If lock could not be acquired, it enqueues task ena_deferred_mq_start()
  which will run ena_start_xmit() in different thread and it will
  clean all of the packets in the drbr.
- ena_start_xmit() is doing following steps:
  * Check if there is still enough space in the hw queues, if not, call
    ena_tx_cleanup() function directly
  * Call ena_xmit_mbuf() function for all mbufs in the drbr or until
    transmission error occurs.
  * ena_xmit_mbuf() is sending mbufs to the ENA device with given steps:
    + Mbufs are mapped and defragmented if necessary for the DMA transactions
    + Allocates a new request ID from the empty req_id ring. The request
      ID is the index of the packet in the Tx info. This is used for
      out-of-order TX completions.
    + The packet is added to the proper place in the TX ring.
    + ena_com_prepare_tx() is called, an ENA communication layer that converts
      the ena_bufs to ENA descriptors (and adds meta ENA descriptors as
      needed.)
      # This function also copies the ENA descriptors and the push buffer
        to the Device memory space (if in push mode.)
  * Write doorbells to the ENA device.
  * After emptying drbr, if hw tx queue is low on space, call ena_tx_cleanup()
    routine

When the ENA device finishes sending the packet, a completion
interrupt is raised:
- The interrupt handler cleans RX and TX descriptors in the loop until all
  descriptors are cleaned up or number of loop iteration exceeds maximum value
- The ena_tx_cleanup() function is called. This function calls
  ena_tx_cleanup() which handles the completion descriptors generated by
  the ENA, with a single completion descriptor per completed packet.
  * req_id is retrieved from the completion descriptor. The tx_info of
    the packet is retrieved via the req_id. The data buffers are
    unmapped and req_id is returned to the empty req_id ring.
  * The function stops when the completion descriptors are completed or given
    budget is depleted
- All interrupts are being unmasked

Rx:
---
When a packet is received from the ENA device:
- The interrupt handler cleans RX and TX descriptors in the loop until all
  descriptors are cleaned up or global number of loop iteration exceeds maximum
  value
- The ena_rx_cleanup() function is called. This function calls
  ena_com_rx_pkt(), an ENA communication layer function, which returns the
  number of descriptors used for a new unhandled packet, and zero if
  no new packet is found.
- Then it calls the ena_rx_mbuf() function:
  The new mbuf is updated with the necessary information (protocol,
  checksum hw verify result, etc.).
- Mbuf is then passed to the network stack, using the ifp->if_input function
  or tcp_lro_rx() if LRO is enabled and packet is of type TCP/IP with TCP
  checksum computed by the hardware.
- The function stops when all packets are handled or given budget is depleted.

Unsupported features:
=====================
- LLQ support
- RSS configuration by user

Known issues:
=============
- FLOWTABLE option (per-CPU routing cache) leads to system crash on both
  FreeBSD 11 and FreeBSD 12-CURRENT system versions