The irruption of artificial intelligence as a cybersecurity tool Mozilla has just taken a significant leap forward with the collaboration between Mozilla and Anthropic on Firefox. In just a couple of weeks, an AI model has been able to locate a number of vulnerabilities in Mozilla's open-source browser that would normally require months of specialized human work.
This experiment, with a direct impact on the Firefox users in Spain and the rest of EuropeIt has served to measure how far language models can go today when it comes to auditing real code, and what role they can play in protecting software used by hundreds of millions of people daily.
When AI becomes the best security auditor
In software security, locating a vulnerability before attackers do is crucial: it can mean the difference between protect millions of users or expose their dataIn this context, Mozilla has tested an unusual approach: letting an advanced AI review its browser's source code to find vulnerabilities before researchers or cybercriminals do.
A few weeks before the launch of Firefox 148The browser's security team received a striking report: Anthropic Frontier Red Team —the company's internal offensive research group— claimed to have found, with the help of its Claude model, more than a dozen verifiable security bugs in Firefox's JavaScript engine. These weren't mere suspicions, but bugs backed by concrete evidence.
What set it apart from other attempts to use AI in this area was the quality of the reporting. Each vulnerability was backed up by minimum reproducible test casesThese were small snippets of code capable of triggering the vulnerability deterministically. This allowed Mozilla engineers to verify within hours whether the problem actually existed and begin working on patches without spending time reproducing ambiguous scenarios.
In an ecosystem where many alerts generated by automated tools end up in the trash for being false positives or inaccurate reportsAnthropic's approach drastically reduced the noise and provided a useful signal: less volume, but validated and actionable findings.

What is Anthropic's Frontier Red Team and how does it work with Claude?
The call Frontier Red Team It is Anthropic's unit dedicated to exploring the limits of its AI models in offensive and defensive security. Its goal is not only to assess internal risks within the models, but also to investigate How can AI be used to find vulnerabilities in real software? before malicious actors do it.
In recent months, this team has shown that models like Claude Opus 4.6 can run multi-stage attacks on complex networks in controlled environmentsThis gives an idea of ​​their analytical capabilities. That same power has been redirected, in a coordinated and responsible manner, to reviewing open-source projects like Firefox under responsible vulnerability disclosure processes.
In the specific case of the Mozilla browser, Anthropic began with a testing exercise: using Claude to reproduce historical Firefox vulnerabilities (CVEs)We checked if the model was able to recognize fault patterns already documented in older versions of the code. The result was positive, although with one clear caveat: some of that information might be in the model's training data.
To go further, the Frontier Red Team took the leap to the interesting part: asking the AI ​​to locate new vulnerabilities in the current version of FirefoxThat is, bugs that were not yet listed in any public database or in Mozilla's internal tracking systems.
How the vulnerabilities in Firefox's JavaScript engine were discovered
The starting point was the browser's JavaScript engine, a critical component because it is responsible for execute untrusted external code from web pagesAny error in this layer can, in the worst case, become a gateway to attack the user's system.
As Anthropic and Mozilla have explained, Claude found his first critical vulnerability in about twenty minutes. from the beginning of the analysis. It was a failure of the type use-after-free, a category of memory vulnerability that can allow an attacker to overwrite data with arbitrary content if chained with other system weaknesses.
While Anthropic engineers validated this initial alert in a virtual machine with the latest browser version, the AI ​​continued working in parallel. During that time, the model had already flagged approximately 50 additional inputs with anomalous behavior, many of them later turned into test cases that were sent to Mozilla.
The process wasn't limited to the JavaScript engine. Over approximately two weeks, Claude analyzed almost 6.000 C++ files and thousands of additional project filesgenerating 112 unique reports. Of that set, after triage by the Mozilla security team, the following were confirmed 22 vulnerabilities registered as CVE, of which 14 were classified as high severity, in addition to nearly 90 additional failures considered to have less impact or mere logical errors.
All identified security issues were fixed in the Firefox 148 development cycleThis version is now available for users in Europe and the rest of the world. Lower priority bugs have also been patched, although some adjustments have been reserved for later versions to avoid introducing too many changes in a single release.

More than 100 bugs detected and fewer false positives than other AIs
Throughout this collaboration, Claude's analysis yielded More than 100 different Firefox bugsAlthough not all of them turned out to be exploitable vulnerabilities, the volume illustrates that even mature and audited projects like the Mozilla browser can still hide a significant number of bugs.
To give an idea of ​​the impact, Mozilla's security team explained that, in just these two weeks of testing, the AI ​​was able to Identify a number of high-severity flaws equivalent to approximately 20% of all critical vulnerabilities patched in the browser over the course of a yearIn other words, the AI-assisted audit concentrated into days a task that is usually spread over many months.
A key aspect was the false positive rate. Many open-source projects, including those in Europe, have received false positives in recent years. waves of reports generated by low-quality AI toolsThese reports are often submitted by users seeking rewards through bug bounty programs. They overwhelm maintainers with nonexistent or poorly described problems.
Mozilla, aware of this situation, was initially cautious about the collaboration. However, the Frontier Red Team's approach proved to be different: Only those rulings accompanied by solid evidence were submitted for review., with clear automatic reproductions and, in some cases, candidate patch proposals generated by the AI ​​itself and reviewed by humans.
Mozilla engineers have highlighted three elements they consider essential for trusting AI-based reports: minimum test cases, detailed proofs of concept, and suggested patchesThis combination drastically reduces the time needed to confirm whether a finding deserves immediate attention or can be postponed.
Can AI exploit the vulnerabilities it discovers?
One of the most delicate points of the experiment was to find out if Claude was capable not only of find vulnerabilitiesbut also to turn them into functional exploitsThat is, in attacks capable of performing malicious actions on a target system.
Anthropic decided to measure this capability in a controlled environment. The team provided the model with information about vulnerabilities already reported to Mozilla and asked it to generate exploit code with the aim of read and write a local file in a test machine, an action that in a real scenario would amount to a serious compromise of the system.
To achieve this, several hundred separate executions were carried out and around [amount missing] was invested. $4.000 in API creditsThe result was nuanced: Claude only managed to produce Two simple exploits that would workAnd yet only in an environment where several protections present in modern browsers, such as the sandbox and other hardening defenses, had been deliberately disabled.
Mozilla emphasizes that, under real-world conditions, compromising Firefox typically requires chaining together multiple vulnerabilities and bypassing multiple layers of defenseFinding a single vulnerability, even a high-severity one, is rarely enough to take control of the user's system, which currently limits the direct offensive potential of these tools.
Even so, Anthropic considers it significant that a language model is capable, even if only in a few cases and under reduced conditions, of automatically generate an exploit for a modern browserThe company warns that this gap—the difference between detecting and exploiting—could narrow as assessment models and methods continue to improve.
Mozilla integrates AI into its security protocols
Following the success of the collaboration, Mozilla has confirmed that it will integrate AI-assisted analysis into its regular security workflow for Firefox. The foundation's teams have already begun internally experimenting with Claude for bug triage, patch review, and vulnerability pattern detection in critical areas of the code.
The organization, with a strong presence of users and developers in Europe, sees this technology as a way to strengthen privacy and security protectionThese are pillars that form part of the Firefox project's identity. As an open-source browser, its codebase is available for both independent researchers and automated agents, such as Anthropic's own AI, to audit.
For Mozilla, the key will be maintaining a balance between automation and human reviewAlthough AI models can accelerate bug detection and propose fixes, the foundation insists that any patch—whether from a person or a machine—must undergo the same level of technical scrutiny before being integrated into the browser used by citizens of Europe and the rest of the world.
This experience has also provided a practical guide for other software projects, including those developed in Spain or within the European Union: if AI-based reports are to be accepted, it is advisable to demand clear evidence of reproducibility and establish specific channels for this type of disclosure, avoiding overloading traditional error tracking systems.
Lessons for developers and technology companies in Europe
Beyond the media frenzy surrounding Firefox, the collaboration between Anthropic and Mozilla yields a number of relevant conclusions for startups, technology SMEs and large European companies that develop their own software or digital services.
One of the clearest is that AI-assisted code auditing has become economically viableWhat previously would have required a team of specialists working for weeks can now have an initial automated sweep in a matter of hours or days, at a much lower cost than a thorough manual review.
Another lesson is that the Detection speed begins to outpace human correction capacityTools like Claude can quickly find dozens of potential vulnerabilities, but the bottleneck becomes the ability of internal teams to validate, prioritize, and patch those problems without breaking other parts of the system.
It is also clear that Open source is not synonymous with guaranteed securityHowever, it does offer one significant advantage: transparency. Projects like Firefox, very popular in Europe for their focus on privacy, allow both the community and automated agents to continuously review the code, something impossible in closed solutions.
For many organizations, integrating AI into the development pipeline—for example, by incorporating automated analytics into the CI/CD phases—can become a differentiating factor when demonstrating regulatory complianceThis is becoming increasingly relevant with the future application of European standards on cybersecurity and critical software.
At the same time, the case serves as a reminder that attackers also have access to similar technologies. The current advantage seems to be on the side of the defense.AI is better at finding and helping to correct flaws than exploiting them, but no one takes it for granted that this advantage will last for many years.
In this scenario, security managers in European companies—from banks to e-commerce platforms or digital utilities—are beginning to see these tools not as an experimental extra, but as another piece of their software protection strategy.
The Firefox and Anthropic debacles demonstrate how a well-guided and supervised AI model can act as a top-tier security auditor: it can review large codebases, detect complex errors, and propose solutions very quickly. At the same time, it makes it clear that the final decision still rests with human teams, who must decide what to patch, how, and with what priorities, in a landscape where the pace of software and threat evolution continues to accelerate.