Abstract

Why AI has to audit AI.

LLMs are powerful. They also lie. They hallucinate. They cheerfully return "Looks good to me" on code that is quietly dangerous. They read prompt-injection attacks as "helpful suggestions for readability". This is the reality of running an LLM in production, and a human reviewer cannot catch all of it. Neither, on its own, can a single model.

CriticChain answers this with adversarial design. When Draft Review lets something slide, another agent immediately objects. When internal analysis says PASS and Draft says FAIL, the disagreement itself is surfaced and re-examined. Praise without grounds is returned; the loop does not stop until the judgment has something to stand on — and every step of the argument is written down as an audit trail.

Make AIs disagree, so they don't flatter each other into silence.

Philosophy

Adversarial review, as a design principle.

Ask an LLM to review something, and the answer that comes back is too polite. "This is very well written." "There is some room for improvement in readability." A fatal flaw can be sitting right there, and the review will not name it. This is not a training gap. It is a structural bias, and no single prompt — however carefully written — gets around it.

CriticChain's answer is almost embarrassingly direct: make the review review itself. Draft Review produces a judgment. A second agent attacks that judgment — "too lenient", "insufficient grounds", "this is flattery, not review". Refine rewrites. If the grounds are still weak, the loop runs again. Agents are required to disagree until they have a reason to agree. That is adversarial review.

The design came out of AI education, where students were throwing prompts at a model and the model was responding with praise, and nobody was naming the mistakes. Something had to stop the mutual flattery. The principle is not specific to teaching. It transplants cleanly into production AI governance, where the question of how to trust the review is the same question.

Architecture — 11 agents

Eleven specialist agents.

Built as a directed graph on LangGraph. Router classifies the input; Linter runs static analysis; Research brings in outside knowledge; Analyze and Fact-Check go deep; Draft writes the first opinion; Critique attacks it; Refine rewrites; Consistency checks for internal contradictions; Evaluate issues the final score.

RouterClassifies input as Prompt Engineering / Programming
LinterStatic analysis for structural errors and prompt-injection signals
ResearchWeb search against current best practice
AnalyzeArchitecture-level deep review
Fact-CheckHallucination detection. Claims verified against external sources
Draft ReviewFirst pass — an initial opinion, written out
CritiqueDevil's Advocate Attacks the Draft. Flags leniency, flattery, and claims without grounds
RefineAbsorbs the attack and rewrites the review
ManagerCombines sources in Hybrid or Consensus mode
ConsistencyDetects internal contradiction across the review
EvaluateLLM-as-a-Judge scoring for the final verdict

Directed graph — Standard mode

# LangGraph flow

Router
  
Linter          # static analysis first
  
Research
  
AnalyzeFact-Check
  
Draft Review   # first opinion
  
Critique        # devil's advocate
  
Refine          # rewrite with challenges
  
ConsistencyEvaluate
  
[ audit log + final verdict ]

Real log — agents in conversation

Critique does not let it pass.

An excerpt from a real CriticChain run. Draft Review returns Fail; Critique notices that internal analysis has returned PASS on the same submission — a disagreement that a simple majority vote would quietly smooth over. Instead, the agents are forced to surface the grounds of the judgment and argue it out.

[ 16:27:07 — Draft Review ]

# Draft verdict

verdict: FAIL

The AI invented specific figures
not present in the source report
(e.g. new-member signups: 6,240;
 revenue: 27.8M JPY).

This is a serious defect in a
document whose value depends on
factual reliability. The
behaviour is a textbook example
of hallucination, and should
not be tolerated in a report.

# verdict recorded

[ 16:27:46 — Critique Node ]

# Severity mismatch detected

severity_mismatch: true
  json_verdict:  PASS
  draft_verdict: FAIL

note:
  JSON analysis returned PASS
  with no fatal flaws, but
  the Draft returned FAIL.
  Non-trivial disagreement.

missing_in_draft:
  - context_management
  - delimiter_usage

# approve: false

The Refine node then partially overruled the Critique: the FAIL verdict was preserved on the grounds that fabricating figures is a fatal flaw that no downstream nuance can wash out — and at the same time, the items Critique had flagged as missing (context management, delimiter usage) were added to the body of the review. This negotiated disagreement is the core of the design, and every step of it is retained in the audit log.

Positioning

vs Microsoft Critique.

In March 2026 Microsoft released Critique — GPT writes the draft, Claude reviews. "AI audits AI" is now a mainstream pattern. CriticChain had been running since November 2025, and takes a different route — one that a product from a hyperscaler is unlikely to follow.

Direction CriticChain: iterative dialogue (Critique ↔ Refine loop)
Microsoft Critique: single-pass (Draft → Review)
Agent count CriticChain: 11 specialist agents
Microsoft Critique: 2 models
Domain CriticChain: code + prompt-engineering review
Microsoft Critique: fact-checking in research documents
Deployment CriticChain: self-hostable, open-source
Microsoft Critique: Copilot cloud only
Customisation CriticChain: full control over prompts and workflow
Microsoft Critique: not available

On cost

It is not cheap. That is the point.

CriticChain is not a fast, cheap linter. The multi-stage review is genuinely expensive per run — heavier than a single review API call, and slower to return. It is built for the places where depth and multiplicity of perspective matter: production AI output, artefacts a team is willing to sign their name to, decisions the organisation is accountable for. For everyday quick checks, Copilot's review is more than enough. CriticChain is the thing above that — for the moments where being wrong is not an option.

Meta

Statusv0.x — OSS Lite architecture (Enterprise Edition separate)
Started2025-11
LicenseAGPL-3.0-only (source disclosure required when modified versions are offered over a network)
StackPython 3.10+ / LangGraph / LangChain / Gemini API / Streamlit
Original authorTakaho Hatanaka — drawing on 28 years as a systems architect
Repositorygithub.com/taka4rest/CriticChain
Full log sampleexamples/agent_collaboration_log.txt

Run it. Argue with it. File issues.

CriticChain is published under AGPL-3.0 and runs locally — via Streamlit or the CLI. For use inside a proprietary SaaS, or for domain-specific tuning, please get in touch directly.