Xnnpack: High-efficiency floating-point neural network inference operators

Dec 25, 2023 ·

XNNPACK is a set of highly optimized low-level primitives for deep learning inference developed by Google. It provides a collection of computational building blocks that target mobile and desktop hardware architectures. XNNPACK is designed to maximize performance on CPU, GPU and mobile devices while minimizing latency and memory consumption.

XNNPACK offers a set of layers tailored to accelerate common operations in inference networks such as convolution, matrix multiply, pooling, activation functions, batch normalization, and element-wise operations. These layers rely on efficient low-level implementations supporting both 32-bit and 16-bit floating-point representations. XNNPACK also includes backends for Arm Cortex-A CPUs, Qualcomm Snapdragon AQMs, Intel Xeon processors, NVIDIA GPUs, Google TPUs, and the Android NN API.

XNNPACK's primary goals include achieving the highest possible throughput on various hardware platforms while maintaining a low memory footprint. It is optimized for low-latency scenarios such as video processing, speech recognition, and object detection. XNNPACK also provides tools to profile and benchmark models running on different hardware architectures.

Overall, XNNPACK is an important tool for developers and researchers who want to get the most out of their hardware when performing deep learning inference tasks. It is open source and provides many benefits in terms of performance, reliability, and flexibility. With XNNPACK, developers can quickly develop and deploy models on a variety of hardware architectures with minimal effort.