REFEX: Remote FPGA-as-a-Service Exploitation
Newly reported attacks and vulnerabilities in FPGA-as-a-Service (FaaS) cloud platforms.
In this blog post, we disclose a series of security vulnerabilities that exist in the FPGA-as-a-Service (FaaS) platforms. We also plan to establish a common sharing mechanism to disclose vulnerabilities and bugs discovered in such platforms urging the FaaS providers towards establishing secure computing platforms to its users by keeping trust, assurance, and integrity in mind.
FaaS platforms enable users to rent systems with FPGA resources on the cloud and execute/accelerate their custom applications using unique reconfigurability and parallelism features provided by the FPGA hardware. In a typical use case, the environment resembles a heterogeneous acceleration ecosystem where an application, originally running on the core CPU, offloads some of its computational tasks to an accelerator (like GPUs or reconfigurable fabrics) for greater efficiency gains such as a parallel execution of a large number of processing elements, low latency, high bandwidth, etc. With modern platforms that introduce FPGAs as accelerators, the underlying hardware specification of the accelerator now can be reconfigured by its user to suit the intended application and dataset. This introduces a set of challenges towards the fundamental assumption that, in a cloud environment, the underlying hardware specification and functionalities remain static all the time and the users only have access to virtualized hardware.
- Remote DoS and code injection vulnerabilities
- Partial Reconfiguration DoS Vulnerability (CVE-2019-11165)
- DMA Page Lock Failure
- NULL Pointer Dereferencing (CVE-2019-11165)
- MemDrool: Global Memory Leakage
FaaS Platform Architecture
FaaS is an emerging architecture with only a handful of providers such as AWS, Nimbix, and Baidu supporting public-domain cloud users. In most cases, the FPGA hardware is from well-known FPGA vendors e.g., Xilinx or Intel (formerly Altera). ( To view Intel and Xilinx supported SDKs, please check important links.)
In resemblance to the heterogeneous computing architecture, the host machine and FPGA on the cloud are connected together using the high bandwidth PCIe link, as shown in Fig 1. The primary application running on the host first dumps the data that requires accelerated processing to the assigned FPGA and reads back the result once the computation is done (often in data chunks) using the PCIe link with the help of vendor-provided APIs and libraries (like OpenCL libraries). The hardware functionality on the FPGA can be dynamically reconfigured by the host using partial reconfiguration through PCIe to support a different set of host applications. The host uses a device driver to communicate with the FPGA at a lower abstraction level, which is again provided by the FPGA vendor.
For the host application to partially reconfigure FPGA focusing hardware acceleration, it is important that the FPGA is recognized as a valid PCI/PCIe device, followed by device registration, and a corresponding driver module is loaded in the host’s kernel. To do so, a static design is programmed into FPGA’s flash, which initializes PCIe hardblock, external memory controller, partial reconfiguration controller, and other important modules during power on. Successful initialization of the FPGA device enables the host to recognize it as a valid PCI/PCIe device during PCI/PCIe enumeration (Link explains the PCI/PCIe enumeration process). The entire flow is shown in Fig 2.
The virtualization of a PCI/PCIe device by the hypervisor is a very costly process and results in the degradation of throughput. As a performance-oriented solution, FaaS platforms generally use “PCI Passthrough” (Link explains the PCI Passthrough) technology that allows the assignment of PCI/PCIe devices to the virtual machines and gives VM direct access to the physical hardware as shown in Fig 3. This helps with maintaining the data transfer bandwidth between VM and PCI/PCIe devices.