Is AI on Arm Devices About to Get 2.6x Faster with KleidiAI?

Arm and Microsoft have joined forces to supercharge AI experiences across PCs and mobile devices powered by Arm CPUs. Their integration of KleidiAI into ONNX Runtime enables up to 2.6x faster inference performance, allowing intelligent features like chatbots and productivity enhancements to run directly on the device; No cloud, no lag. This collaboration aims to simplify development while accelerating AI for end-users. From flagship smartphones to Windows on Arm laptops, devices can now deliver faster, smarter AI thanks to this scalable, future-ready optimization. It’s AI that works where you are, on-device, instantly, and efficiently.

Optimized AI, Seamless Integration

Arm’s KleidiAI is now embedded into ONNX Runtime, bringing efficient neural inference to developers without requiring code changes. It uses vector extensions like Neon and SVE2 to improve performance across a wide range of Arm devices. The impact is real: Windows on Arm PCs see up to 2.4x faster prompt processing and 12% faster token generation with models like Phi-3 Mini. This leads to more responsive AI, smoother interaction, and enhanced productivity experiences on-device.

Speed. Quality. Cost. With Google Axion, @loveholidays scored all three.🚀

Powered by Arm Neoverse, Axion helped cut latency by 54%, boost conversions, and reduce costs without compromise.

🚀 Speed. ✅ Quality. ✅ Cost. ✅ With Google Axion, @loveholidays didn’t have to… pic.twitter.com/nyHdzMlIpw
— Arm (@Arm) May 16, 2025

The benefits extend to Android, too. Benchmark tests on the vivo X200 Pro, powered by the latest Armv9 CPUs, show a 2.6x speed-up in prompt response. These performance gains are available today with ONNX Runtime v1.22, making AI acceleration widely accessible. Developers don’t need to rewrite backends or overhaul applications. Whether enhancing Microsoft 365, Copilot features, or third-party apps, this integration enables better local AI with less latency and improved power efficiency. It’s AI at the edge, faster and easier.

Scalable AI for Every Arm Device

This collaboration between Arm and Microsoft enables more than just faster inference; it unlocks consistent AI performance across platforms. KleidiAI is designed to be architecture-aware, working with current and next-gen Arm features like SME and SVE2. That means developers building AI apps today can expect them to scale easily on future hardware without rewriting their code. The result is a reliable, long-term foundation for AI development across Windows and Android.

It’s also a big win for developers and OEMs. With this integration, they can build or enhance applications without added cost or engineering burden. AI models run locally, improving responsiveness while reducing reliance on internet connectivity or cloud servers. From everyday apps to advanced productivity tools, the user experience gets an instant upgrade. Microsoft’s ONNX Runtime already powers tools like Copilot and Office 365; this makes those experiences even faster. Together, Arm and Microsoft are removing friction and democratizing access to optimized AI.

The Edge Gets Smarter, Faster

By integrating KleidiAI into ONNX Runtime, Arm and Microsoft have delivered a major upgrade to on-device AI performance. Developers now get efficient, scalable inference without complex changes, while users benefit from smoother, smarter applications. Whether it’s a PC, tablet, or phone, Arm-powered devices can run intelligent features locally, with better performance and lower power usage. This sets a new standard for AI at the edge. It’s not just about speed; it’s about putting control back into the device. As AI continues to evolve, this collaboration lays the groundwork for widespread, practical adoption.

About The Author

Sahil Dhankhar

Sahil Dhankhar is a seasoned Technical Analyst at Ravant Media with over three years of hands-on experience in financial markets. Certified in NISM Series VIII and NISM Research Analyst, he specialize in price action strategies to decode market movements and deliver insightful, data-driven analysis. At Ravant Media, Sahil Dhankhar plays a key role in producing clear, actionable research that empowers traders and investors to make confident decisions. Known for a disciplined, detail-oriented approach and a deep understanding of market dynamics, Sahil Dhankhar continues to contribute meaningfully to the financial analysis landscape.

See author's posts