A practical guide to the LLM Council — for business leaders, executives, and professionals. No coding required.
I have written earlier about why AI produces inaccuracies with such confidence and the potential cost to various sectors. Today, I begin the process of helping business teams and individuals address this problem with practical tools. Our first strategy is taking advantage of the Large Language Model (LLM) Council — a helpful framework that makes AI models fact-check themselves, if used appropriately.
The Problem You Cannot Afford to Ignore
AI language models generate the most statistically probable text given a prompt. They are not retrieving facts from a verified database. They are predicting the next most plausible word, sentence, and paragraph. When they encounter a gap in their training data, they do not pause. They fill the gap — smoothly, confidently, with complete sentences that sound authoritative.
A 2025 theoretical analysis showed that perfect hallucination control in large language models is mathematically impossible — the same architecture that enables fluency also produces plausible falsehoods (Karpowicz, 2025). This is not a bug to fix; it is a structural feature of the technology.
The cost is measurable and significant. AllAboutAI estimated hallucinations cost businesses $67.4 billion globally in 2024, including bad decisions, regulatory penalties, and correction overhead (AllAboutAI, 2025). Forrester found employees spend 4.3 hours weekly verifying AI outputs — about $14,200 per employee annually in lost productivity (Forrester, 2025). In a 100-person knowledge organization, that is $1.42 million just checking AI work.
The risk is greatest at the leadership level. Deloitte reports that 47% of enterprise AI users have made at least one major decision based on unverified AI-generated content (Deloitte, 2025). Nearly half have, at some point, acted on information that may have been invented.
The most dangerous AI output is not the obviously wrong one. It is the one that sounds exactly right.
Arize AI's LibreEval benchmark — 72,155 samples across seven languages — found that even advanced, purpose-built LLM evaluation tools catch hallucinations at 92% accuracy (Arize AI, 2025). Which means even the best automated systems miss 8% of errors. For a single model used without any verification layer, the exposure is substantially higher.
What Is the LLM Council — and Where Did It Come From?
In late November 2025, Andrej Karpathy — co-founder of OpenAI, former Director of AI at Tesla, and one of the most influential AI researchers in the world — published a project on GitHub he described as a "fun Saturday hack" (Karpathy, 2025). He named it the LLM Council.
The idea is elegant. Instead of trusting one AI model with your question, you convene a council of multiple models. Each one answers independently. Then they anonymously review and rank each other's answers — without knowing which model produced which response, so no model can play favorites. Then a designated Chairman model reads all the answers, all the reviews, and synthesizes the most accurate, pressure-tested final response.
"The idea of this repo is that instead of asking a question to your favorite LLM provider, you can group them into your 'LLM Council'... it then asks them to review and rank each other's work, and finally a Chairman LLM produces the final response."
— Andrej Karpathy, GitHub repository description ( github.com/karpathy/llm-council )
The logic maps to practices every leader already understands: peer review in science, a second opinion in medicine, a board that debates before deciding. One voice can be confident and wrong. A structured panel that critiques itself is far harder to mislead — and far more likely to surface what a single model would miss.
One important clarification: the LLM Council is Karpathy's open-source software project, built to run multiple AI models simultaneously through a local web app. The prompt-based "LLM Council" methodology — running structured multi-advisor analysis through a system prompt in a single AI interface — is a separate, related practice that draws on the same intellectual architecture. Both are valid. This guide covers both.
Two Ways to Access the LLM Council
Option 1 — The Full Technical Setup (for organizations with IT support)
Visit github.com/karpathy/llm-council. Your technical team clones the repository, configures an OpenRouter API key, and runs the full multi-model council as a local web application. It queries multiple frontier models simultaneously — GPT, Claude, Gemini, Grok — runs blind peer review across them, and delivers a Chairman synthesis. Cost per query through OpenRouter is fractions of a cent.
If you want a hosted version with no installation required, a working demo is available at huggingface.co/spaces/burtenshaw/karpathy-llm-council. Open it in any browser. No setup. No API key. This is a valuable tool available to every individual.
Option 2 — The No-Code Manual Council (for individuals and small organizations)
No API. No installation. No technical background required. You need a subscription to one major AI platform — ChatGPT Plus, Claude Pro, or Gemini Advanced — and access to a second, different platform for cross-model verification. Free tiers on most platforms are sufficient to run this method.
The No-Code LLM Council: Exact Steps and Copy-Paste Prompts
STEP 1 — Paste the Council Prompt into Your Primary AI Platform
Open chat.openai.com , claude.ai , or gemini.google.com. If you want the Council to run automatically on every conversation, paste the prompt into your system settings:
- In ChatGPT: Click your profile icon → "Customize ChatGPT" → under "How would you like ChatGPT to respond?" → paste the prompt → Save.
- In Claude: Click your profile icon → Settings → Custom Instructions → paste the prompt → Save.
- In Gemini: Click the gear icon (Settings) → Personal Context → toggle ON → Add → paste the prompt → Submit.
- In Perplexity: Click your profile icon → All Settings → Personalization → Custom Instructions → paste the prompt → Save.
Here is the Council prompt — copy it exactly:
Operate as an LLM Council for every response. Run this 3-stage process before answering:
STAGE 1 — 5 ADVISORS (internal, not shown):
1. First-Principles: Rebuild truth from facts only, no assumptions.
2. Contrarian: Find failure points, risks, and flaws in common narratives.
3. Expansionist: Identify missing context and overlooked implications.
4. Outside Observer: Zero bias, no industry assumptions.
5. Practical Executor: Focus on what actually works in reality.
STAGE 2 — BLIND PEER REVIEW (internal, not shown):
Advisors cross-evaluate anonymously. Find: where they agree (Consensus),
where they clash (Dissent), what all 5 missed, and flag anything
unverifiable as [UNCERTAIN].
STAGE 3 — FINAL OUTPUT:
### The Consensus
[Verified, pressure-tested answer. Label uncertain claims.]
### Critical Dissent & Risks
[Blind spots and risks uncovered. Be direct.]
### Recommended Next Step
[One actionable step from the Practical Executor.]
If I start with "Quick:" — skip the council and answer directly.
STEP 2 — Ask Your Question
After the Council prompt is in place, ask anything: analyze this contract, review this press release, evaluate this market entry strategy, fact-check this report. The model runs the five-advisor analysis internally and delivers structured output in three labeled sections.
Pay careful attention to anything the model flags as [UNCERTAIN]. Those flags are the most valuable part of the output. They tell you exactly where human judgment, domain expertise, or independent research needs to step in. Do not ask a second AI to fill those gaps automatically. Investigate them yourself with primary sources.
STEP 3 — Cross-Model Verification: The Step That Separates Verification from Illusion
A single AI model, however well prompted, can only fact-check within the limits of its own training data. If a model was trained on incorrect information, or if a claim falls outside its training window, the council running inside that same model cannot catch the error. It does not know what it does not know.
The real verification move: take the full output from Steps 1 and 2 and move it to a completely different AI system. If you ran the Council on ChatGPT — open Claude or Gemini. If you used Claude — open ChatGPT or Gemini. Then paste (or attach as a docx/pdf) the output along with this prompt:
You are the Chairman of this LLM Council. Your job is to act as the
ultimate objective arbiter.
Below is an output from a Council session run on a different AI model.
Using the same LLM Council framework — five advisors, blind peer review,
structured synthesis — verify, critique, and identify any errors,
hallucinations, contradictions, or blind spots in this response.
[PASTE THE FULL OUTPUT FROM YOUR FIRST SESSION HERE]
For any claim that cannot be verified with confidence, write explicitly:
INSUFFICIENT DATA — requires human verification.
Do not guess. Do not fill gaps with plausible-sounding language.
Accuracy over fluency.
Where both models agree after Council scrutiny, your confidence is meaningfully higher. Where they diverge — on a statistic, a claim, a strategic assertion — you have found exactly what needs investigating before it reaches the world under your name.
A Note on Perplexity AI
Perplexity AI offers a native multi-model feature that approximates the Council methodology with the added advantage of real-time web sourcing — which provides a verification layer that memory-based models cannot replicate. However, access to diverse frontier models and the full multi-model logic in Perplexity is available only on the paid Max plan. On free or standard tiers, you are working with a single underlying model, which does not provide the cross-model deliberation the Council methodology requires. The Max plan is worth evaluating for organizations using Perplexity as a primary research tool, but it is not a prerequisite for the no-code method above.
What to Do With the Output
When the Chairman synthesis arrives, the output appears in three labeled sections. Read all three. Most people skip the second one. That is expensive.
- The Consensus is your working answer. Use it — but hold it lightly until cross-verified.
- Critical Dissent & Risks is your risk register. Treat it like one. The Council has surfaced what a single model would have smoothed over.
- Recommended Next Step is one clear action. Follow it.
For any business-critical output — legal, financial, strategic, reputational — treat the Chairman synthesis as a well-reasoned, pressure-tested first draft. Then verify the claims that matter most before you act on them or publish them under your name.
For Organizations: Roll This Out Today
You do not need an IT department, a budget line, or a consultant. You need thirty minutes and a shared document. Have every team member paste the Council prompt into their preferred AI tool using the setup instructions above. Create one shared team document with the prompt ready to copy. Then establish one standard: any AI output that is customer-facing, legal, financial, or brand-related must run through the Council on one model and be cross-verified on a second model before it is published or acted on.
That is your AI governance policy. Written in plain language. Implementable this afternoon. No vendor required.
For solo operators and small teams: this applies to you especially. You do not have a team of fact-checkers behind you. The Council is your backstop — the structure that catches what confidence in a single tool cannot.
Conclusion
Somewhere in your organization right now, someone is trusting an AI output they have not verified. It might be in a proposal. In a client brief. In a board presentation. In a press release that goes out tomorrow.
The cost of catching that error before it goes out is zero. The cost of catching it after — in corrections, in credibility, in legal exposure, in client relationships — is not.
You now have the exact prompt. You have the step-by-step instructions. You have the cross-model verification workflow. There is no technical barrier between you and a more defensible AI practice.
A confident AI is not a verified AI. The Council does not eliminate that gap — but it makes it much harder to ignore.
Set this up today. Share it with your team. Make it a habit before it becomes a headline.
References
AllAboutAI. (2025). AI hallucination statistics and research report 2025–2026. https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/
Arize AI. (2025). LibreEval: The open-source benchmark for RAG hallucination detection. https://arize.com/llm-hallucination-dataset/
Deloitte. (2025). Global AI survey 2025: Enterprise AI decision-making patterns. Deloitte Insights. https://www.scribd.com/document/898051189/
Forrester Research. (2025). Enterprise AI cost analysis 2025: Verification overhead and productivity impact. https://www.forrester.com/report/the-state-of-ai-2025/RES189955
Karpathy, A. (2025, November 22). llm-council [GitHub repository]. https://github.com/karpathy/llm-council
Karpowicz, M. P. (2025). On the fundamental impossibility of hallucination control in large language models (arXiv:2506.06382). arXiv. https://doi.org/10.48550/arXiv.2506.06382
Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D., & Ho, D. E. (2024). AI on trial: Legal models hallucinate in 1 out of 6 (or more) benchmarking queries. Stanford Human-Centered AI Institute. https://hai.stanford.edu/
Umesiri, F. E. (2026, June 5). AI told you that? Here's how to know if it's actually true. Axitos Publishing House. https://www.axitos.ai/
AI Use Disclosure: The author advocates for transparency in AI use and notes that AI models assisted with initial organization of this guide. All claims, citations, and final text reflect the author's independent judgment and primary source verification.








