111 lines
4.7 KiB
ReStructuredText
111 lines
4.7 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
============
|
|
Introduction
|
|
============
|
|
|
|
The Linux compute accelerators subsystem is designed to expose compute
|
|
accelerators in a common way to user-space and provide a common set of
|
|
functionality.
|
|
|
|
These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
|
|
Although these devices are typically designed to accelerate
|
|
Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
|
|
is not limited to handling these types of accelerators.
|
|
|
|
Typically, a compute accelerator will belong to one of the following
|
|
categories:
|
|
|
|
- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
|
|
or an IP inside a SoC (e.g. laptop web camera). These devices
|
|
are typically configured using registers and can work with or without DMA.
|
|
|
|
- Inference data-center - single/multi user devices in a large server. This
|
|
type of device can be stand-alone or an IP inside a SoC or a GPU. It will
|
|
have on-board DRAM (to hold the DL topology), DMA engines and
|
|
command submission queues (either kernel or user-space queues).
|
|
It might also have an MMU to manage multiple users and might also enable
|
|
virtualization (SR-IOV) to support multiple VMs on the same device. In
|
|
addition, these devices will usually have some tools, such as profiler and
|
|
debugger.
|
|
|
|
- Training data-center - Similar to Inference data-center cards, but typically
|
|
have more computational power and memory b/w (e.g. HBM) and will likely have
|
|
a method of scaling-up/out, i.e. connecting to other training cards inside
|
|
the server or in other servers, respectively.
|
|
|
|
All these devices typically have different runtime user-space software stacks,
|
|
that are tailored-made to their h/w. In addition, they will also probably
|
|
include a compiler to generate programs to their custom-made computational
|
|
engines. Typically, the common layer in user-space will be the DL frameworks,
|
|
such as PyTorch and TensorFlow.
|
|
|
|
Sharing code with DRM
|
|
=====================
|
|
|
|
Because this type of devices can be an IP inside GPUs or have similar
|
|
characteristics as those of GPUs, the accel subsystem will use the
|
|
DRM subsystem's code and functionality. i.e. the accel core code will
|
|
be part of the DRM subsystem and an accel device will be a new type of DRM
|
|
device.
|
|
|
|
This will allow us to leverage the extensive DRM code-base and
|
|
collaborate with DRM developers that have experience with this type of
|
|
devices. In addition, new features that will be added for the accelerator
|
|
drivers can be of use to GPU drivers as well.
|
|
|
|
Differentiation from GPUs
|
|
=========================
|
|
|
|
Because we want to prevent the extensive user-space graphic software stack
|
|
from trying to use an accelerator as a GPU, the compute accelerators will be
|
|
differentiated from GPUs by using a new major number and new device char files.
|
|
|
|
Furthermore, the drivers will be located in a separate place in the kernel
|
|
tree - drivers/accel/.
|
|
|
|
The accelerator devices will be exposed to the user space with the dedicated
|
|
261 major number and will have the following convention:
|
|
|
|
- device char files - /dev/accel/accel*
|
|
- sysfs - /sys/class/accel/accel*/
|
|
- debugfs - /sys/kernel/debug/accel/accel*/
|
|
|
|
Getting Started
|
|
===============
|
|
|
|
First, read the DRM documentation at Documentation/gpu/index.rst.
|
|
Not only it will explain how to write a new DRM driver but it will also
|
|
contain all the information on how to contribute, the Code Of Conduct and
|
|
what is the coding style/documentation. All of that is the same for the
|
|
accel subsystem.
|
|
|
|
Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
|
|
|
|
To expose your device as an accelerator, two changes are needed to
|
|
be done in your driver (as opposed to a standard DRM driver):
|
|
|
|
- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
|
|
driver_features field. It is important to note that this driver feature is
|
|
mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
|
|
to expose both graphics and compute device char files should be handled by
|
|
two drivers that are connected using the auxiliary bus framework.
|
|
|
|
- Change the open callback in your driver fops structure to accel_open().
|
|
Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
|
|
set the correct function operations pointers structure.
|
|
|
|
External References
|
|
===================
|
|
|
|
email threads
|
|
-------------
|
|
|
|
* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
|
|
* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
|
|
|
|
Conference talks
|
|
----------------
|
|
|
|
* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
|