Back to Blog
Governance13 avril 202610 min

Fronterio vs Fairly AI: Why LLM Red-Teaming Alone Isn't EU AI Act Compliance

Fairly AI's Asenion platform is one of the best LLM red-teaming tools on the market, but the EU AI Act is an organisational obligations problem, not a test-suite problem. Fronterio's Design → Govern → Prove spine covers the operational, literacy, transparency, and incident-reporting duties that CI/CD red-teamers don't — and makes the compliance floor free forever.

Fairly AI's Strength: Adversarial Red-Teaming for LLMs in the Pipeline

Fairly AI built a product that solves a specific problem extremely well. If you are shipping a customer-facing LLM application and you want continuous assurance that your system prompt cannot be trivially jailbroken, that your retrieval-augmented generation does not leak PII under crafted probes, and that your agent refuses to comply with harmful instructions, their Asenion platform runs thousands of adversarial tests against your model on every pull request and fails the build when the attack success rate crosses a threshold. It is red-teaming at the cadence of CI, and that is a genuinely valuable primitive.

The technical depth is serious. Fairly's attack generation goes beyond the public lists of jailbreak prompts to include synthesised adversarial suffixes, multi-turn social-engineering chains, and targeted probes for known failure modes of specific model families. The platform reports pass/fail rates by attack category, tracks regression across model versions, and integrates as a native step in GitHub Actions or GitLab pipelines. For engineering teams whose models are under active development, this is the kind of tool that legitimately prevents production incidents.

The question for an AI governance buyer is whether this particular primitive is the binding constraint on their compliance posture. For a small number of organisations — those building bespoke LLM products with an active model pipeline — it is. For the majority of organisations deploying AI today, it is not. They are not building models. They are procuring Microsoft Copilot, configuring a Salesforce Einstein feature, rolling out an AI applicant tracking system from a vendor, or connecting Google Gemini to their workspace. The LLMs they use are operated by the provider. The red-team surface belongs to OpenAI, to Anthropic, to Microsoft — not to the deployer. A CI-based red-teamer does not have anything to attach to.

This is not a criticism of Fairly. Their tool is aimed at a specific buyer, and for that buyer it is excellent. It is a claim about what most AI governance buyers actually need, which turns out to be a very different set of capabilities.

The Classic-ML Gap: Tabular Drift, Credit Scoring, and HR Screening

Fairly's design centre is generative AI. That design centre is intentional — the founders correctly identified that GenAI risk was the category-expanding wave, and they went deep on it. The downstream effect is that classic machine learning — tabular-data systems, credit-scoring models, insurance pricing engines, HR screening tools — falls outside the product's natural scope.

Classic ML does not fail the way LLMs fail. An LLM fails by generating toxic output under adversarial prompts or by leaking information from its context window. A credit-scoring model fails by drifting over time as the distribution of applicants shifts away from the training data, by producing disparate approval rates across protected groups in ways that violate Equal Credit Opportunity regulation, or by being calibrated against base rates that no longer reflect reality. You cannot detect these failure modes with a prompt-injection attack. You need statistical drift detection, disparate-impact testing, and calibration monitoring — the kind of observability Holistic AI specialises in and that Fronterio addresses through its agent activity logs and PMM report synthesiser.

This matters because many high-risk AI systems under the EU AI Act Annex III list are classic-ML systems, not LLMs. Article 6 explicitly scopes high-risk designation to AI used in recruitment and employment decisions, creditworthiness assessment, insurance pricing for natural persons, educational assessment, access to essential public services, and law-enforcement decisions. A large share of these remain classic-ML deployments. A governance platform that only covers the LLM attack surface leaves the tabular-ML attack surface uncovered — and for Annex III organisations, the tabular systems are usually the higher-risk deployment.

Fronterio does not try to be a deep statistical auditor — that is Holistic's position, and trying to compete there would sacrifice the adoption-first focus. What Fronterio does is treat every high-risk agent, regardless of whether it is an LLM or a classic-ML system, as a governed object with the same obligations tracker, the same FRIA requirement, the same human oversight plan, the same log retention, and the same incident reporting workflow. The obligations are regime-driven, not model-family-driven, which is how the EU AI Act actually works.

The Organisational Obligations Fairly Doesn't Solve

Even if the LLM red-team gate is running perfectly, a deployer is not compliant with the EU AI Act. The regulation imposes obligations that no pipeline test can discharge. Five of them are worth walking through in detail because they account for most of the actual compliance work.

Article 4 is AI literacy. Every organisation that operates AI systems must ensure sufficient AI literacy among staff who deploy or are affected by the systems. This is a training programme obligation. You need a defined curriculum, completion tracking per employee, evidence of effectiveness, and a refresh schedule as the AI portfolio evolves. A CI gate does not do any of this. Fronterio tracks training completion per user, mails weekly reminders to non-completers rate-limited to once per fourteen days, and advances the Article 4 obligation as the completion rate crosses thresholds.

Article 26(5) is operational monitoring. Deployers must monitor the operation of high-risk systems against the instructions of use, identify risks, and report them to the provider. This is a continuous operational discipline that requires log retention, anomaly detection, and a reporting channel back to the provider. Fronterio captures activity logs per agent, retains them for the required window, and surfaces override rates, complaint keywords in consultant messages, and incident spikes through the weekly PMM report under Article 72.

Article 27 is the Fundamental Rights Impact Assessment. For Annex III use cases in HR, insurance, credit, public-service access, education, and similar, a FRIA must be completed before the system is put into operation and notified to the relevant market surveillance authority. This is a structured written assessment, not a test result. Fronterio ships a guided FRIA wizard with locale-aware templates, a scoping helper that correctly identifies which agents trigger Article 27, and PDF export for notification.

Article 72 is post-market monitoring. For high-risk systems, providers and deployers must maintain ongoing performance monitoring and produce periodic reports. Fronterio's PMM synthesiser runs weekly on Monday mornings, aggregates activity logs, incident records, complaint-keyword scans, and human-override rates, and produces a drift signal categorised as stable, warning, or alert. The report lands as a draft for a human to sign off, not as an auto-published document.

Article 73 is serious incident reporting. When a high-risk AI system causes a serious incident, the provider or deployer must report to the relevant market surveillance authority within fifteen days, and for the most serious incidents within forty-eight hours. Fronterio's incident workflow computes the 48-hour and 15-day deadlines automatically from the classification and warns at t-thirteen-days, t-seven-days, and t-forty-eight-hours, plus a breach alert. Idempotency is built in so re-running the check does not re-send warnings.

None of these obligations are deliverable by a CI/CD red-team gate. They are the substance of deployer compliance, and they are where most of the operational work sits.

Fronterio Agent Studio: Governed Runtime Without the CI/CD Seam

The Fairly comparison does not end with compliance paperwork, because Fronterio also builds agents. The Agent Studio is a native runtime — not a dashboard that pushes configs to Azure or Bedrock, but an execution engine hosted by Fronterio that runs customer-built agents directly, enforces guardrails at runtime in first-party middleware, supports human-in-the-loop approvals for sensitive tool calls, and integrates four tool tiers: built-in platform tools, Model Context Protocol clients, OpenAPI executors, and webhook callbacks.

The architectural difference from a CI/CD red-team flow is that guardrails are enforced on every tool call at execution time, not at build time. If the agent tries to invoke a tool that matches the blockedActions list, the runtime blocks it and writes an incident record — regardless of whether any test suite ever probed that specific attack. If the agent tries to reach out to a URL that resolves to a loopback or an RFC1918 private network, the SSRF egress allowlist rejects the call. If a tool binding requires human approval, the runtime pauses the session, writes a pending-approval record, and waits for an authorised user to decide — the session resumes automatically with the approved or rejected result injected into the conversation history.

For customers who are building agents, this eliminates the bolt-on pipeline step. The governance is not an additional system the team has to maintain alongside the runtime; it is the runtime. Publishing a new version goes through a seven-gate compliance check — agent approved, risk classification set, conformity assessment recorded, human oversight plan documented, data residency confirmed, capability within guardrails, transparency requirements met — and only passes when all seven are green. The published version is immutable; rollback reads the frozen tool bindings and guardrails snapshot from the version record, which means there is no version of history where a rolled-back agent accidentally runs with today's (possibly looser) guardrails.

For customers who are not building agents and are just trying to govern what vendors ship, the Agent Studio is not the relevant surface. What is relevant is Fronterio's set of seven deployment connectors — Azure AI Foundry, AWS Bedrock, LangSmith, CrewAI AMP, Anthropic, Claude Managed Agents, Copilot Studio, and a custom webhook fallback — plus the MCP Server that external systems can query for runtime guardrails. Either way, the point is the same: Fronterio meets customers where their agents actually run, rather than requiring them to shift left into a red-team gate that presumes the customer owns the build.

When to Still Pick Fairly AI

Fairly has a clear, earned position. Pick Fairly when you are an engineering-led organisation building a generative-AI product, your model pipeline is active with frequent updates, and your primary risk is that a model update introduces a regression in safety or jailbreak resistance. In that environment, continuous adversarial testing catches problems before they ship, and the CI/GitLab integration surface is where your engineers already work. Running Fairly against every pull request is the right operational choice, and the alternative — running red-team tests manually on a release cadence — is both slower and less thorough.

Pick Fairly also when your organisation does not yet have a formal compliance function. If there is nobody whose job is to produce Article 27 FRIAs or track Article 4 literacy completion, a compliance-first governance platform will feel like overhead. Fairly's positioning inside the engineering organisation means adoption is easier — the tool is installed by the engineers who are already shipping the model, and the compliance paperwork can wait until the organisation hires for it.

A realistic mid-market trajectory is to run Fairly and Fronterio together. Fairly owns the red-team gate in CI. Fronterio owns the risk register, the obligations tracker, the literacy programme, the FRIA workflow, the incident-reporting deadlines, and the PMM reports. They do not overlap. The engineering team uses Fairly, the compliance lead uses Fronterio, and the two products meet at the agent registration point: an agent that passes the Fairly gate enters the Fronterio register with its Article 15 robustness evidence already attached.

For organisations that do not need the red-team gate — deployers rather than builders — Fronterio alone is enough, and the Free tier is the right starting point. The compliance baseline is covered, and Fairly can be added later when and if the organisation starts building its own models. See /features/governance for the full governance surface.

Ready to get started?

Fronterio helps you implement everything discussed in this article — with built-in tools, automation, and guidance.