CriticChain — AI that audits AI

Abstract

Why AI has to audit AI.

LLMs are powerful. They also lie. They hallucinate. They cheerfully return "Looks good to me" on code that is quietly dangerous. They read prompt-injection attacks as "helpful suggestions for readability". This is the reality of running an LLM in production, and a human reviewer cannot catch all of it. Neither, on its own, can a single model.

CriticChain answers this with adversarial design. When Draft Review lets something slide, another agent immediately objects. When internal analysis says PASS and Draft says FAIL, the disagreement itself is surfaced and re-examined. Praise without grounds is returned; the loop does not stop until the judgment has something to stand on — and every step of the argument is written down as an audit trail.

Make AIs disagree, so they don't flatter each other into silence.

Philosophy

Adversarial review, as a design principle.

Ask an LLM to review something, and the answer that comes back is too polite. "This is very well written." "There is some room for improvement in readability." A fatal flaw can be sitting right there, and the review will not name it. This is not a training gap. It is a structural bias, and no single prompt — however carefully written — gets around it.

CriticChain's answer is almost embarrassingly direct: make the review review itself. Draft Review produces a judgment. A second agent attacks that judgment — "too lenient", "insufficient grounds", "this is flattery, not review". Refine rewrites. If the grounds are still weak, the loop runs again. Agents are required to disagree until they have a reason to agree. That is adversarial review.

The design came out of AI education, where students were throwing prompts at a model and the model was responding with praise, and nobody was naming the mistakes. Something had to stop the mutual flattery. The principle is not specific to teaching. It transplants cleanly into production AI governance, where the question of how to trust the review is the same question.

Architecture — 11 agents

Eleven specialist agents.

Built as a directed graph on LangGraph. Router classifies the input; Linter runs static analysis; Research brings in outside knowledge; Analyze and Fact-Check go deep; Draft writes the first opinion; Critique attacks it; Refine rewrites; Consistency checks for internal contradictions; Evaluate issues the final score.

Router	Classifies input as Prompt Engineering / Programming
Linter	Static analysis for structural errors and prompt-injection signals
Research	Web search against current best practice
Analyze	Architecture-level deep review
Fact-Check	Hallucination detection. Claims verified against external sources
Draft Review	First pass — an initial opinion, written out
CritiqueDevil's Advocate	Attacks the Draft. Flags leniency, flattery, and claims without grounds
Refine	Absorbs the attack and rewrites the review
Manager	Combines sources in Hybrid or Consensus mode
Consistency	Detects internal contradiction across the review
Evaluate	LLM-as-a-Judge scoring for the final verdict

Directed graph — Standard mode

# LangGraph flow

Router
  ↓
Linter          # static analysis first
  ↓
Research
  ↓
Analyze → Fact-Check
  ↓
Draft Review   # first opinion
  ↓
Critique        # devil's advocate
  ↓
Refine          # rewrite with challenges
  ↓
Consistency → Evaluate
  ↓
[ audit log + final verdict ]

Real log — agents in conversation

Critique does not let it pass.

An excerpt from a real CriticChain run. Draft Review returns Fail; Critique notices that internal analysis has returned PASS on the same submission — a disagreement that a simple majority vote would quietly smooth over. Instead, the agents are forced to surface the grounds of the judgment and argue it out.

[ 16:27:07 — Draft Review ]

# Draft verdict

verdict: FAIL

The AI invented specific figures
not present in the source report
(e.g. new-member signups: 6,240;
 revenue: 27.8M JPY).

This is a serious defect in a
document whose value depends on
factual reliability. The
behaviour is a textbook example
of hallucination, and should
not be tolerated in a report.

# verdict recorded

[ 16:27:46 — Critique Node ]

# Severity mismatch detected

severity_mismatch: true
  json_verdict:  PASS
  draft_verdict: FAIL

note:
  JSON analysis returned PASS
  with no fatal flaws, but
  the Draft returned FAIL.
  Non-trivial disagreement.

missing_in_draft:
  - context_management
  - delimiter_usage

# approve: false

The Refine node then partially overruled the Critique: the FAIL verdict was preserved on the grounds that fabricating figures is a fatal flaw that no downstream nuance can wash out — and at the same time, the items Critique had flagged as missing (context management, delimiter usage) were added to the body of the review. This negotiated disagreement is the core of the design, and every step of it is retained in the audit log.

Positioning

vs Microsoft Critique.

In March 2026 Microsoft released Critique — GPT writes the draft, Claude reviews. "AI audits AI" is now a mainstream pattern. CriticChain had been running since November 2025, and takes a different route — one that a product from a hyperscaler is unlikely to follow.

Direction	CriticChain: iterative dialogue (Critique ↔ Refine loop) Microsoft Critique: single-pass (Draft → Review)
Agent count	CriticChain: 11 specialist agents Microsoft Critique: 2 models
Domain	CriticChain: code + prompt-engineering review Microsoft Critique: fact-checking in research documents
Deployment	CriticChain: self-hostable, open-source Microsoft Critique: Copilot cloud only
Customisation	CriticChain: full control over prompts and workflow Microsoft Critique: not available

On cost

It is not cheap. That is the point.

CriticChain is not a fast, cheap linter. The multi-stage review is genuinely expensive per run — heavier than a single review API call, and slower to return. It is built for the places where depth and multiplicity of perspective matter: production AI output, artefacts a team is willing to sign their name to, decisions the organisation is accountable for. For everyday quick checks, Copilot's review is more than enough. CriticChain is the thing above that — for the moments where being wrong is not an option.

Status	`v0.x` — OSS Lite architecture (Enterprise Edition separate)
Started	2025-11
License	AGPL-3.0-only (source disclosure required when modified versions are offered over a network)
Stack	Python 3.10+ / LangGraph / LangChain / Gemini API / Streamlit
Original author	Takaho Hatanaka — drawing on 28 years as a systems architect
Repository	github.com/taka4rest/CriticChain
Full log sample	examples/agent_collaboration_log.txt

Run it. Argue with it. File issues.

CriticChain is published under AGPL-3.0 and runs locally — via Streamlit or the CLI. For use inside a proprietary SaaS, or for domain-specific tuning, please get in touch directly.

Get CriticChain → Open issues → Enterprise inquiry →

CriticChain.