Back to Newsroom
newsroomdeep-diveAIeditorial_board

How Kepler built verifiable AI for financial services with Claude

Kepler, a financial services firm, has partnered with Anthropic to build verifiable AI systems leveraging Claude, marking a significant advancement in applying large language models LLMs within a highly regulated industry.

Daily Neural Digest TeamMay 4, 202612 min read2 240 words

Inside Kepler's Bet on Verifiable AI: Why Claude Is Becoming Wall Street's New Compliance Officer

The financial services industry has long faced an uncomfortable paradox when it comes to artificial intelligence: the most powerful models are often the most opaque, and the most transparent models are often the least useful. For years, this tension has kept cutting-edge AI on the sidelines of regulated financial workflows, where every decision must be defensible, auditable, and explainable to regulators wielding frameworks like the EU's AI Act [1]. But a new partnership between Kepler, a financial services firm, and Anthropic is attempting to crack that code—by building verifiable AI systems on top of Claude that don't just produce answers, but produce receipts [1].

This isn't just another enterprise AI integration. It represents a fundamental rethinking of how large language models (LLMs) should operate in high-stakes environments, where the cost of a hallucination isn't a funny chatbot response but a regulatory fine or a ruined customer relationship. The implications ripple far beyond Kepler's balance sheet, touching on everything from AI security vulnerabilities to the future of model valuation in a market where Anthropic is reportedly fielding pre-emptive investment offers valuing it between $850 billion and $900 billion [4].

The Black Box Problem Meets Its Match

To understand why Kepler's initiative matters, you have to appreciate the depth of the trust deficit that currently exists between financial institutions and AI. Traditional machine learning models in finance—credit scoring algorithms, fraud detection systems, trading bots—have long operated as "black boxes," where even their creators struggle to explain why a particular decision was made [1]. This opacity was tolerated because these models were relatively narrow in scope and their outputs could be validated against historical data.

LLMs have shattered that tolerance. When a model like Claude processes a complex financial document, analyzes market conditions, and generates a recommendation, the chain of reasoning involves billions of parameters, contextual embeddings, and probabilistic sampling. The result can be brilliant or disastrous, and often it's impossible to tell which without extensive manual review. For a financial services firm subject to regulations that demand clear audit trails, this is a non-starter [1].

Kepler's solution combines Claude's native capabilities—particularly its ability to generate step-by-step reasoning through techniques like chain-of-thought prompting—with proprietary verification frameworks that create granular audit trails of every decision [1]. Think of it as installing a flight recorder on an AI model. Every inference, every reasoning step, every data point consulted is logged and made available for automated checks and human review. The system likely leverages Claude's constitutional AI training, which guides responses according to pre-defined principles, ensuring that outputs stay within acceptable boundaries even when the model is operating autonomously [1].

This approach doesn't just satisfy compliance requirements; it fundamentally changes what's possible. Financial institutions can now deploy AI in areas that were previously off-limits due to regulatory risk—automated wealth management advice, real-time compliance monitoring, complex contract analysis—because they can reconstruct and validate the reasoning behind every output [1]. The trade-off, of course, is complexity. Integrating verification layers into existing AI pipelines introduces friction and increases development costs, requiring specialized skills in AI ethics, model verification, and regulatory compliance that are currently in short supply [1].

Security Incidents and the Transparency Imperative

The timing of Kepler's announcement is no accident. The AI industry has been rocked by a series of security incidents that have exposed the fragility of even the most sophisticated LLM deployments [2]. BeyondTrust's demonstration of credential theft via crafted GitHub branch names showed how seemingly innocuous inputs could compromise entire systems [2]. The accidental public release of Claude Code source code onto npm highlighted the risks of inadequate security protocols in AI development pipelines [2]. These aren't edge cases; they're symptoms of an industry that prioritized speed and capability over security and transparency.

For financial services, these incidents are existential threats. A compromised AI model in a trading system could execute unauthorized transactions. A leaked API key could expose sensitive customer data. A manipulated reasoning chain could produce fraudulent compliance reports. The industry's response has been predictable: increased scrutiny, stricter procurement requirements, and a growing preference for models that can demonstrate their trustworthiness rather than just claim it [2].

This is where Kepler's verifiable AI framework becomes a competitive weapon. By building audit trails directly into the model's reasoning process, the system makes it dramatically harder for malicious actors to manipulate outputs without detection. Every deviation from expected behavior leaves a digital footprint that can be investigated and traced back to its source. It's not just about catching problems after they occur; it's about creating a system where problems are inherently visible [1].

The broader ecosystem is responding to these pressures as well. Tools like "claude-mem," a Claude Code plugin that has garnered 34,287 GitHub stars, and "everything-claude-code," a performance-optimization system with 72,946 stars, demonstrate the community's hunger for more robust, transparent Claude-based applications [2]. These aren't just developer toys; they're evidence of a maturing ecosystem that recognizes that the next frontier of AI adoption isn't raw capability—it's trust.

The Economics of Verifiable AI: Winners, Losers, and $900 Billion Valuations

Anthropic's reported valuation of $850 billion to $900 billion [4] might seem stratospheric for a company that, by some measures, is still playing catch-up to OpenAI in terms of market share. But that valuation reflects something deeper than current revenue: it reflects market confidence in Anthropic's ability to deliver safe, reliable AI in an environment where safety and reliability are becoming the primary purchasing criteria [4].

Kepler's partnership is a case study in why that confidence exists. By positioning Claude as the foundation for verifiable AI, Anthropic is targeting the most demanding, highest-value use cases in the enterprise market. Financial services firms have deep pockets, long procurement cycles, and zero tolerance for failure. If Anthropic can win their trust, it can win virtually any enterprise account [1].

The economics work differently for different players. For large financial institutions, the initial investment in building verifiable AI systems is significant but manageable, and the payoff—reduced regulatory risk, increased customer trust, the ability to deploy AI in previously restricted areas—is substantial [1]. For smaller firms, the cost of building and maintaining these systems could create barriers to entry, potentially widening the competitive gap between incumbents and challengers [1].

Companies that offer robust, user-friendly verifiable AI solutions are positioned to be the big winners. Anthropic, by providing the underlying model and verification framework, is capturing value at the platform level. Firms like Datavent, which are actively recruiting AI developers with Claude and AWS expertise, are positioning themselves to capitalize on the implementation demand [1]. On the other side, firms that continue to rely on opaque "black box" models risk falling behind and facing increased regulatory scrutiny as transparency requirements tighten [1].

Beyond Financial Services: A Blueprint for Responsible AI

While Kepler's initiative is focused on financial services, its implications extend across the entire AI landscape. The verifiable AI framework being developed here could serve as a blueprint for any industry where AI decisions have significant consequences—healthcare, legal services, insurance, government, and beyond [1].

The technical architecture is instructive. By combining Claude's capabilities with custom verification layers, Kepler is essentially creating a two-tier system: the model generates outputs and explanations, and the verification framework validates those explanations against ground truth data, regulatory requirements, and business rules [1]. This separation of concerns is crucial. It means that the verification process can be independently audited and improved without requiring changes to the underlying model. It also means that different verification frameworks can be applied to the same model for different use cases, increasing flexibility [1].

Bias detection and mitigation are built into this architecture at multiple levels. Training data is screened for problematic patterns. Model outputs are checked for biased language or recommendations. The verification framework itself is designed to flag decisions that deviate from expected norms [1]. This multi-layered approach doesn't guarantee perfect fairness—no system can—but it creates a feedback loop where biases can be identified and addressed systematically rather than discovered after they've caused harm [1].

The consumer side of this trend is also evolving. Amazon's AI-powered price tracking feature, now displaying year-long price history data, signals that consumers are increasingly expecting transparency from AI-driven systems [3]. This expectation will inevitably extend to financial services, where customers want to understand not just what decisions are being made about their money, but why [3]. Verifiable AI provides the infrastructure for meeting that expectation.

The Technical Reality Check: What Verifiable AI Can and Can't Do

For all its promise, Kepler's verifiable AI initiative faces significant technical challenges that are often glossed over in mainstream coverage. The most fundamental issue is that Claude, like all LLMs, remains prone to generating inaccurate or misleading information, and its explanations may not fully reflect its actual reasoning process [1]. This is not a bug that can be fixed with better verification; it's a feature of how these models work. They don't "reason" in the human sense; they generate text that is statistically likely given the input. The chain-of-thought explanations are post-hoc rationalizations, not faithful representations of internal processes [1].

This creates a paradox at the heart of verifiable AI: if the model's explanations are unreliable, how can the verification framework validate them? The answer is that verification must operate at multiple levels. It checks the model's outputs against known facts and business rules. It looks for internal consistency in the reasoning chain. It flags outputs that deviate from established patterns. But it cannot, ultimately, verify that the model's reasoning was "correct" in any fundamental sense, because the model doesn't have reasoning in the way we understand it [1].

Integrating verification frameworks into complex AI pipelines is also technically challenging. It requires significant expertise in both AI and software engineering, and it can introduce latency and computational overhead that may be unacceptable for real-time applications [1]. The recent security vulnerabilities in competing LLMs [2] serve as a stark reminder that even sophisticated systems are not immune to attacks, and the focus on credential theft [2] highlights a critical, often overlooked aspect of AI security: the security of underlying infrastructure and access controls [1].

The question that remains unanswered is whether verifiable AI can truly eliminate bias and ensure fairness, or whether it will merely provide a veneer of transparency that masks underlying issues [1]. The answer depends on the rigor of the verification process and the ongoing commitment to ethical AI development. It's not a one-time fix; it's an ongoing practice that requires continuous investment, monitoring, and improvement [1].

The Road Ahead: From Performance to Trust

Kepler's partnership with Anthropic represents a broader industry shift away from purely performance-driven AI development toward what might be called "trust-driven" AI [1]. This shift is being driven by multiple forces: regulatory pressure from frameworks like the EU's AI Act, public awareness of AI risks, and the growing recognition that trust is essential for widespread adoption [1].

Competitors like OpenAI and Google are also investing in transparency and explainability, but Kepler's focus on verifiable AI—providing demonstrable audit trails rather than just explanations—represents a more rigorous approach [1]. It's the difference between a doctor explaining a diagnosis and a doctor providing the lab results, imaging studies, and clinical notes that support it. Both are valuable, but only the latter provides the evidence needed for independent verification.

The ecosystem is responding. The popularity of Claude-based tools like "claude-mem" and "everything-claude-code" [2] indicates a broader effort to address LLM security and usability challenges. The rapid growth of Anthropic's valuation [4] reflects market confidence in the company's ability to deliver safe, reliable AI. And the rise of distilled reasoning models like Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF, which has seen 724,285 downloads [3], shows that the developer community is hungry for Claude-based solutions that can be deployed efficiently.

For developers, the implications are clear. The era of "move fast and break things" in AI is giving way to an era of "move carefully and verify everything." This means learning new skills in AI ethics, model verification, and regulatory compliance. It means building systems that are auditable by design rather than as an afterthought. And it means recognizing that the most valuable AI models won't be the ones that are the most capable, but the ones that are the most trustworthy [1].

Kepler's verifiable AI initiative is a significant step in this direction. It's not a complete solution—the technical challenges are real, and the limitations of current LLM technology are significant. But it points the way toward a future where AI can be deployed in the most demanding, high-stakes environments with confidence. And in a world where trust is becoming the ultimate currency, that's worth more than any valuation.


References

[1] Editorial_board — Original article — https://claude.com/blog/how-kepler-built-verifiable-ai-for-financial-services-with-claude

[2] VentureBeat — Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model. — https://venturebeat.com/security/six-exploits-broke-ai-coding-agents-iam-never-saw-them

[3] The Verge — Amazon’s built-in AI price history expands to show the entire last year — https://www.theverge.com/tech/922302/amazon-price-tracker-year

[4] TechCrunch — Sources: Anthropic could raise a new $50B round at a valuation of $900B — https://techcrunch.com/2026/04/29/sources-anthropic-could-raise-a-new-50b-round-at-a-valuation-of-900b/

[5] ArXiv — How Kepler built verifiable AI for financial services with Claude — related_paper — http://arxiv.org/abs/2208.04791v1

[6] ArXiv — How Kepler built verifiable AI for financial services with Claude — related_paper — http://arxiv.org/abs/2603.28944v1

[7] ArXiv — How Kepler built verifiable AI for financial services with Claude — related_paper — http://arxiv.org/abs/2501.02842v1

deep-diveAIeditorial_board
Share this article:

Was this article helpful?

Let us know to improve our AI generation.

Related Articles