Arm has created a new version of its microNPU (neural processing unit) IP that is suitable for use alongside Cortex-A CPU cores in application processors. Lead licensee NXP plans to use this IP in an upcoming family of application processors that can handle AI application such as pose estimation, multi-face recognition and object detection in videos, and speech recognition beyond basic keyword spotting.
Arm’s existing microNPU product, the Ethos-U55, launched in February 2020, is aimed at microcontroller-class products alongside Cortex-M cores. It provides up to 0.5 TOPS of acceleration (based on smaller geometries such as 16 or 7 nm, running at 1 GHz), with between 3 and 256 multiply-accumulate units (MACs). Arm’s portfolio also has the Ethos-N77, N57 and N37 which offer 4, 2 and 1 TOPS, respectively.
The Ethos-U65 is designed to maintain the Ethos-U55’s power efficiency while doubling the MACs available – up to 512 parallel MACs at 1GHz – for a total of 1 TOPS. This power/performance combination is specifically for use alongside Cortex-A cores in application processor-class devices. The Ethos-U line’s native support for ML operators has also been updated and expanded, according to Arm.
Lead technology partner (and lead licensee) NXP worked closely with Arm on the defining system-level aspects of the Ethos-U65 and has said that it will integrate the Ethos-U65 IP into its next generation of i.MX application processors.
The architecture NXP has in mind for an AI applications processor would see the Ethos-U65 microNPU sitting alongside Cortex-M and Cortex-A cores, explained Ben Eckermann, Chief Engineer of AI/ML Hardware, NXP. The Cortex-A runs the application, handling drivers for (say) a microphone or camera, and presenting a workload to the NPU. The microNPU and the Cortex-M compute the machine learning workload and present the answer back to the Cortex-A.
“Just like an Ethos-U55, [the Ethos-U65] relies on a Cortex-M processor to be somewhere nearby in the system, just in case there are machine learning operators from the neural network which do not make sense to be offloaded entirely in hardware,” Eckermann said. “There are a lot of machine learning operators which are rarely called and may not justify being implemented purely in hardware.”
The flexibility offered by the Cortex-M in this configuration also allows a certain amount of future-proofing, Eckermann said.
The U65 has been given wider internal system buses than the U55, and has been tailored to cope with the extra buffering and latency associated with DRAM (common in systems that use application processors, whereas Cortex-M systems typically use SRAM).
Populating the portfolio
NXP is filling out its portfolio of AI-enabled SoCs.
The company’s previously announced i.MX 8M+ application processors have an NPU AI accelerator block alongside dual or quad Cortex-A53 cores. The NPU IP included in these products is not from Arm. It’s a Verisilicon design that offers 2.3 TOPS, enough for scene segmentation, live video face and object recognition or speech accent interpretation. At the lower end of the spectrum, NXP’s microcontrollers with Arm Ethos-U55 NPUs can handle person detection, wake word detection and video denoising.
Ethos-U65-enabled products will sit in between these two product categories, handling AI applications like multi-face recognition, more involved speech recognition, or pose estimation. Any new products in this category will therefore not overlap with the i.MX 8M+, said Eckermann.