Back to Newsroom
newsroomnewsAIeditorial_board

Amateur armed with ChatGPT solves an Erdős problem

A self-described amateur mathematician, utilizing ChatGPT, has reportedly contributed to a solution for a longstanding and notoriously difficult Erdős problem, specifically concerning the distribution of prime numbers.

Daily Neural Digest TeamApril 27, 202610 min read1 916 words

The Amateur Who Used ChatGPT to Crack a Six-Decade-Old Math Problem

In the rarefied world of pure mathematics, where problems can languish unsolved for generations and breakthroughs typically emerge from the hallowed halls of elite institutions, an unlikely protagonist has just rewritten the rules of engagement. A self-described amateur mathematician, armed with little more than curiosity and a subscription to OpenAI's ChatGPT, has reportedly contributed to a solution for one of Paul Erdős's notoriously intractable problems—a conjecture concerning the distribution of prime numbers that has resisted the best efforts of professional mathematicians for over sixty years [1].

The story, first reported by Scientific American, has sent shockwaves through the mathematical community, not merely because of the solution itself, but because of how it was achieved [1]. This isn't a tale of a lone genius scribbling equations on a napkin. It's a story about a new kind of collaboration—one between human intuition and a large language model that can process and generate mathematical language at a scale no human could match. While the full details remain under peer review and are not yet publicly available, the implications are already reverberating far beyond the ivory tower [1].

The Unexpected Synergy: Human Intuition Meets Machine Reasoning

To understand why this matters, we need to appreciate the sheer difficulty of what was accomplished. Paul Erdős, the legendary Hungarian mathematician who wandered the globe with a suitcase full of unsolved problems, left behind a legacy of conjectures that have haunted mathematicians for decades. These "Erdős problems" are characterized by their deceptive simplicity—easy to state, maddeningly difficult to prove. The specific problem in question concerns the distribution of twin primes, those pairs of prime numbers separated by exactly two (like 3 and 5, or 17 and 19). While it has been known since 1966 that there are infinitely many twin primes, this problem explored a more generalized form: the existence of infinitely many pairs of primes separated by a specific, larger distance [1].

The complexity here is profound. Prime numbers are the atoms of arithmetic, yet their distribution across the number line appears almost random, defying simple pattern recognition. Mathematicians have spent decades developing sophisticated tools to probe these patterns, tools that typically require years of specialized training to master. What makes this case remarkable is that the amateur mathematician reportedly used ChatGPT not as a calculator, but as a collaborator—a system to explore different approaches, test conjectures, and generate potential solutions [1].

This represents a fundamental shift in how we think about AI's role in research. Instead of treating ChatGPT as a search engine or a glorified autocomplete, this user engaged with it as a thinking partner. The model's architecture, based on generative pre-trained transformers (GPTs), allows it to predict the next token in a sequence based on patterns learned from massive datasets of text and code. When applied to mathematical problems, this capability can surface connections and approaches that might not occur to a human researcher working alone [1].

Yet this partnership is not without its perils. As Wired has cautioned, the reliability of AI-generated advice remains a significant concern, particularly in domains where subtle errors can cascade into catastrophic failures [2]. The mathematical community is now grappling with a fundamental question: How do you verify a solution when you don't fully understand the reasoning of the system that generated it? The current ChatGPT architecture remains, in many ways, a "black box," making it difficult to trace the logical steps that led to its outputs [2].

The Democratization of Discovery: Lowering the Barriers to Mathematical Research

Perhaps the most profound implication of this incident is what it says about access. For decades, advanced mathematical research has been the province of those with elite training, institutional support, and access to specialized computational resources. The amateur mathematician in this story had none of these advantages—just a problem, a laptop, and a willingness to experiment with AI tools.

This democratization of discovery is happening against a backdrop of rapid innovation in the AI ecosystem. OpenAI has been actively adapting its models for specialized applications, including clinical settings, where they've made ChatGPT for Clinicians free for verified U.S. physicians [3]. This demonstrates a commitment to deploying AI in high-stakes domains, but it also underscores the potential for misuse. The same technology that can help solve a sixty-year-old math problem can also be weaponized for increasingly sophisticated scams, as MIT Technology Review has documented, noting that cybercriminals are leveraging LLMs to craft more convincing and targeted malicious emails [4].

For the broader research community, the implications are staggering. Startups specializing in AI-powered research tools are likely to see a surge of interest from academic institutions and corporate R&D labs. The ability to integrate AI into research workflows is becoming a competitive advantage, not just a novelty. But this also raises uncomfortable questions about equity. Access to advanced AI tools is not universal, and the gap between those who can afford premium models and those who cannot may exacerbate existing inequalities in scientific research [1].

The winners in this new landscape will be those who can effectively integrate AI tools into their workflows while maintaining a healthy dose of skepticism. The losers may include those who blindly trust AI-generated results or fail to adapt to the changing nature of scientific inquiry [1]. As the ecosystem evolves, we're seeing a proliferation of tools designed to augment ChatGPT's capabilities, from browser extensions to open-source projects like "chatgpt-on-wechat," which has garnered over 42,000 stars on GitHub. This community-driven innovation is accelerating the pace of change, but it also introduces new vectors for error and misuse.

The Black Box Problem: Trust, Verification, and the Limits of AI Reasoning

For all the excitement surrounding this breakthrough, there's an uncomfortable truth that the mathematical community must confront: we don't fully understand how ChatGPT arrived at its contributions to the solution. The model's reasoning process is opaque, making it difficult to distinguish genuine insight from statistical coincidence.

This lack of transparency poses a particular challenge for mathematics, a discipline built on the foundation of rigorous proof. Every step must be verifiable, every assumption must be explicit. When an AI system generates a potential solution, how do we validate it? The current approach relies on human mathematicians to check the work, but this defeats much of the purpose of using AI as a collaborator. If we have to verify every step manually, we're not really augmenting human capability—we're just shifting the bottleneck.

The need for explainable AI (XAI) has never been more urgent. As AI becomes increasingly integrated into scientific research, we must prioritize the development of techniques that can shed light on the decision-making processes of these systems [2]. This isn't just an academic concern; it has practical implications for how we deploy AI in high-stakes domains. OpenAI's work on specialized models for clinical settings demonstrates an awareness of this challenge, but the gap between current capabilities and what's needed for rigorous scientific validation remains significant [3].

The irony is that the very features that make ChatGPT so powerful—its ability to generate novel combinations of ideas, to explore vast hypothesis spaces, to make unexpected connections—are the same features that make it difficult to trust. The model can produce brilliant insights and subtle errors with equal confidence, and distinguishing between the two requires human expertise that is itself becoming increasingly scarce.

The Competitive Landscape: A New Arms Race in AI-Assisted Research

This incident is likely to intensify the already fierce competition among AI companies vying for dominance in the LLM space. OpenAI's success with ChatGPT has spurred a wave of innovation, with competitors like Google (Gemini) and Anthropic (Claude) racing to demonstrate the value of their models in solving complex problems [1].

For researchers and developers, this competition is a double-edged sword. On one hand, it's driving rapid improvements in model capabilities, accuracy, and reliability. On the other hand, it creates a fragmented ecosystem where the best tool for a given task may depend on factors that are difficult to evaluate. The open-source LLM ecosystem is also evolving rapidly, offering alternatives that provide greater transparency and customization at the cost of requiring more technical expertise to deploy effectively.

The business implications are significant. Research institutions that invest in AI infrastructure and talent now may gain a substantial competitive advantage in the years ahead. But they also face the challenge of building workflows that can effectively integrate AI tools while maintaining rigorous standards of verification and reproducibility. The startups that succeed in this space will be those that can demonstrate not just impressive demos, but reliable, verifiable results.

As the technology matures, we can expect to see increased specialization. Just as ChatGPT is being adapted for clinical use, we'll likely see models fine-tuned for specific research domains, from number theory to drug discovery. These specialized models will face unique challenges, including ensuring data privacy, addressing ethical concerns, and maintaining transparency in their reasoning processes [3].

The Future of Human-AI Collaboration: Augmentation, Not Replacement

Perhaps the most important lesson from this incident is what it tells us about the future of human-AI collaboration. The amateur mathematician didn't ask ChatGPT to solve the problem—they used it as a tool to explore possibilities, test hypotheses, and refine their thinking. This is a model of augmentation, not replacement.

The real breakthrough isn't just the solution to the Erdős problem, but the demonstration of a new collaborative paradigm. Human intuition provides the direction, the creativity, the ability to recognize what's interesting and important. AI provides the computational power to explore vast spaces of possibility, to generate and test hypotheses at a scale that would be impossible for a human working alone.

This partnership has the potential to unlock new levels of innovation across scientific disciplines. But it also requires a fundamental rethinking of how we train researchers and evaluate their work. The mathematician of the future will need not just deep domain expertise, but also the ability to effectively collaborate with AI systems—to know when to trust them, when to question them, and how to extract the most value from their capabilities.

The next 12 to 18 months are likely to see continued advancements in LLMs, with a focus on improving accuracy, reliability, and transparency [2]. We can also expect to see increased integration of AI tools into research workflows, as well as a growing emphasis on responsible AI development and ethical considerations [2, 3, 4]. The rise of AI-driven scams, which have seen a 10% increase in sophistication, underscores the urgent need for robust security measures and public awareness campaigns [4].

Ultimately, the question becomes: How do we ensure that AI empowers human creativity and innovation without eroding the foundations of critical thinking and rigorous scientific inquiry? The amateur mathematician who used ChatGPT to crack an Erdős problem has given us a glimpse of one possible future—a future where the boundaries between human and machine intelligence blur, and where the greatest breakthroughs come not from either alone, but from their collaboration. The challenge now is to build the infrastructure, the methodologies, and the ethical frameworks that will allow this partnership to flourish without sacrificing the rigor that makes science trustworthy.


References

[1] Editorial_board — Original article — https://www.scientificamerican.com/article/amateur-armed-with-chatgpt-vibe-maths-a-60-year-old-problem/

[2] Wired — 5 Reasons to Think Twice Before Using ChatGPT—or Any Chatbot—for Financial Advice — https://www.wired.com/story/5-reasons-to-think-twice-before-using-chatgpt-for-financial-advice/

[3] OpenAI Blog — Making ChatGPT better for clinicians — https://openai.com/index/making-chatgpt-better-for-clinicians

[4] MIT Tech Review — The Download: supercharged scams and studying AI healthcare — https://www.technologyreview.com/2026/04/24/1136400/the-download-supercharged-scams-questionable-ai-healthcare/

newsAIeditorial_board
Share this article:

Was this article helpful?

Let us know to improve our AI generation.

Related Articles