
F5 has introduced powerful new AI-enhanced capabilities in its BIG-IP Next for Kubernetes platform, leveraging NVIDIA BlueField-3 DPUs and the NVIDIA DOCA framework. The collaboration, validated by Sesterce a leading European AI infrastructure provider, marks a significant step in optimizing AI application delivery with better speed, security, and resource efficiency.
F5 and NVIDIA Deliver AI-First App Infrastructure
The collaboration between F5 and NVIDIA delivers high-performance traffic management, robust security, and advanced multi-tenancy for AI workloads. Running natively on BlueField-3 DPUs, F5’s BIG-IP Next for Kubernetes now offers up to 20% improved GPU utilization, smart LLM routing via NVIDIA NIM microservices, and reduced inference latency through integration with NVIDIA Dynamo and KV Cache Manager. It also supports scalable, secure Model Context Protocol (MCP) and enhanced programmability with F5 iRules for real-time AI infrastructure customization.
Sesterce Validates Real-World Performance Gains
Sesterce’s successful deployment of the F5-NVIDIA solution proves its efficiency across high-volume Kubernetes environments. According to Sesterce CEO Youssef El Manssouri, the integration has improved GPU traffic distribution and optimized workload performance, especially under demanding LLM inference conditions.
F5 and NVIDIA Boost AI Performance with Smart LLM Routing and KV Caching
F5’s smart LLM routing technology dynamically directs user requests between lightweight and large language models based on task complexity. This intelligent traffic management reduces latency, improves time-to-first-token performance, and enhances the overall accuracy of AI applications by assigning the right model for each query. The customizable routing logic ensures users get faster, more relevant results with optimized resource use.
In partnership with NVIDIA, the system further benefits from Dynamo’s distributed key-value (KV) caching, which intelligently routes traffic based on real-time GPU and memory availability. This reduces redundant computation and speeds up generative AI tasks. By offloading certain CPU tasks to DPUs, organizations can improve performance and cut costs, making enterprise-scale AI deployment more efficient and scalable.
Securing MCP Deployments with Reverse Proxy Capabilities
The collaboration also strengthens security for Model Context Protocol (MCP) servers developed by Anthropic. F5’s reverse proxy functionality and iRules programmability help secure LLM infrastructure against emerging cyber threats and protocol changes ideal for agentic AI and real-time applications.
Conclusion
Enterprises adopting LLMs or building AI factories can now leverage a unified solution that efficiently handles traffic routing, enhances GPU performance, and scales secure multi-tenant deployments. The F5 and NVIDIA stack positions itself as a future-ready AI infrastructure solution for data ingestion, RAG, inference, and agentic AI workflows.