Meta Launches LlamaFirewall to Block AI Jailbreaks and Prompt Injection Attacks

Meta has introduced LlamaFirewall, a new open-source framework aimed at securing large language models (LLMs) against growing cyber threats such as prompt injection, jailbreaks, and the generation of insecure code. Announced on Tuesday, the initiative is a key step in addressing escalating concerns around the misuse of artificial intelligence technologies.

Designed to serve as a modular, real-time guardrail system, LlamaFirewall provides a multi-layered security architecture that spans everything from user input to the final actions of an AI agent, making it applicable to both simple chatbots and advanced autonomous agents.

Three Key Modules- PromptGuard 2, Agent Alignment Checks, and CodeShield

The LlamaFirewall framework includes three core components:

PromptGuard 2: A real-time defense system that can detect and block direct prompt injections and jailbreak attempts.

Agent Alignment Checks: A tool for evaluating the reasoning of AI agents to catch goal hijacking and indirect prompt injections, which are more subtle and dangerous forms of manipulation.

CodeShield: A static code analysis engine that aims to prevent AI agents from generating insecure or malicious code, especially in coding environments.

Meta emphasized that these tools allow developers and cybersecurity professionals to build layered defenses suited to the complexity of modern AI deployments. “LlamaFirewall is flexible and designed to work across varied use cases, from conversational assistants to code-generating LLMs,” the company said on GitHub.

CyberSecEval 4 and AutoPatchBench Benchmark Introduced

Alongside LlamaFirewall, Meta released updated versions of LlamaGuard and CyberSecEval, tools designed to detect policy violations and measure the cybersecurity resilience of AI systems. The new version, CyberSecEval 4, includes AutoPatchBench—a benchmark specifically created to test AI’s ability to automatically repair vulnerabilities in C/C++ code.

AutoPatchBench focuses on assessing how well LLMs can fix bugs found through fuzzing, a technique used to discover vulnerabilities by feeding unexpected or random data into a program. According to Meta, the benchmark provides a much-needed framework for comparing various AI-powered patching solutions.

Llama for Defenders and Privacy-Focused WhatsApp AI

In addition to these cybersecurity tools, Meta launched the Llama for Defenders program. This initiative is meant to help security teams and developers access open, closed, and early-access tools tailored to tackle AI-specific threats such as phishing, scams, and fraud involving AI-generated content.

Meanwhile, Meta-owned WhatsApp previewed a new feature called Private Processing, which aims to deliver AI-powered features without compromising user privacy. The system will process requests in a secure, confidential environment and is currently undergoing external audits to ensure its safety before full deployment.

Conclusion

As the adoption of generative AI accelerates, so do the threats targeting these systems. Meta’s launch of LlamaFirewall, AutoPatchBench, and the Llama for Defenders program reflects an industry-wide push toward more responsible and secure AI development. These tools not only aim to prevent prompt injection and jailbreak attacks but also to safeguard AI-generated code—an increasingly critical vector in modern cybersecurity.