Opus, Gemini and Chatpt top models all disappeared from the Arena, is this the reason?

The News

Several prominent large language models (LLMs), including Opus, Gemini, and ChatGPT, have abruptly disappeared from the Arena, a popular platform for evaluating AI models [1]. This removal has sparked debate in the AI community, particularly on Reddit’s r/LocalLLaMA [1]. The Arena, which uses user-based pairwise comparisons to rank models, now shows a message stating these models are unavailable, leaving users and developers seeking explanations [1]. The timing of this event coincides with the release of GLM-5.1, a new open-source LLM from Z.ai, which reportedly outperforms Opus 4.6 and GPT-5.4 on the SWE-Bench Pro benchmark [2]. While no direct link has been confirmed, the simultaneous occurrence has fueled speculation about competitive pressures and shifts in LLM accessibility [1]. The removal affects a significant portion of the Arena’s user base, as Daily Neural Digest data shows high download numbers for Opus-related models, including Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF (869,356 downloads), opus-mt-en-ru (683,908 downloads), and opus-mt-tr-en (764,481 downloads).

The Context

The disappearance of these models from the Arena highlights the evolving dynamics of LLM development and accessibility. Opus, named for its Latin root "work," and Gemini, derived from a constellation, have become synonymous with advanced AI models. The Arena serves as a critical, informal benchmark for the AI community, enabling users to compare model performance through blind A/B testing. This crowdsourced evaluation has shaped developer preferences and influenced model adoption. The sudden removal suggests a deliberate action by developers or administrators, rather than a technical failure [1].

Z.ai’s GLM-5.1 release adds context to this shift [2]. Its open-source MIT License allows unrestricted commercial use, contrasting with proprietary models dominant in the West [2]. The model’s performance exceeding Opus 4.6 and GPT-5.4 on the SWE-Bench Pro benchmark positions it as a strong competitor [2]. This release reflects China’s growing investment in open-source AI, potentially challenging U.S. dominance [2]. VentureBeat notes rapid progress in AI agents, with capabilities reaching 20 steps by the end of 2023 [2]. Total investment in this area reached $52.83 billion [2]. Meanwhile, Google advances AI video editing via Google Vids, integrating models like Veo 3.1 and offering directable AI avatars [4]. This underscores ongoing innovation despite potential model access restrictions [4]. Google also enhances user experience with Gemini notebooks, enabling project organization and file integration [3], mirroring ChatGPT’s "Projects" feature [3].

Why It Matters

The removal of Opus, Gemini, and ChatGPT from the Arena has significant implications. For developers, the loss of these models as benchmarks creates technical friction. The Arena historically provided a straightforward platform for comparative analysis, allowing developers to assess model strengths and weaknesses [1]. Without this direct comparison, evaluating performance now relies more on proprietary benchmarks and less transparent assessments.

Enterprise and startup adoption of LLMs is also impacted. The Arena’s rankings often influenced purchasing decisions, particularly for cost-effective solutions [1]. The absence of these models may increase uncertainty, pushing businesses toward alternatives or in-house development. This could raise AI adoption costs for smaller firms lacking independent evaluation resources.

GLM-5.1’s emergence further complicates the landscape. Its open-source nature offers an alternative to proprietary models, potentially disrupting existing business models and lowering entry barriers for new developers [2]. This shift could accelerate AI democratization but also introduce challenges in model governance, security, and responsible development. Google’s focus on video AI, as seen in Google Vids [4], reflects a strategic diversification, reducing reliance on a single product category. Gemini notebooks [3], designed to enhance user organization and context, directly respond to ChatGPT’s Projects feature, highlighting competitive efforts to improve AI usability [3].

The Bigger Picture

The events around the Arena and GLM-5.1 signal a broader shift in the global AI landscape. While the U.S. has dominated LLM development, China’s renewed focus on open-source AI, exemplified by Z.ai’s GLM-5.1, represents a significant challenge [2]. This trend may lead to a more decentralized, competitive ecosystem, fostering innovation through collaboration [2]. The removal of models from the Arena suggests developers are increasingly sensitive to public perception and competitive positioning. Models underperforming in benchmarks may face pressure to be withdrawn to avoid negative publicity or enable internal improvements [1]. This raises questions about the long-term viability of crowdsourced benchmarking platforms, as they risk manipulation or strategic withdrawal by developers [1].

Advancements in AI video generation, as demonstrated by Google Vids [4], indicate a move beyond text-based interactions toward immersive experiences. This trend is likely to accelerate, with implications for content creation, entertainment, and education. Features like directable AI avatars [4] blur reality and simulation lines, raising ethical concerns about authenticity and misuse. Gemini notebooks [3] reflect an industry focus on improving AI usability, moving beyond chatbots to context-aware assistants [3].

Daily Neural Digest Analysis

The mainstream narrative often frames the AI race as a U.S.-dominated competition. However, events around the Arena and GLM-5.1 reveal a more complex reality [1], [2]. The sudden disappearance of leading models from a public benchmark, paired with rapid progress in open-source alternatives from China, underscores the fragility of current AI dominance and the potential for disruptive innovation [1], [2]. The Arena’s role as a public forum has inadvertently created a platform for competitive pressure that developers now actively manage, potentially undermining transparency in AI evaluation [1]. The hidden risk lies in public benchmarks becoming tools for strategic manipulation rather than genuine performance indicators [1]. The focus on user experience, as seen in Google’s Gemini notebooks [3], highlights a crucial but often overlooked aspect of AI adoption. Ultimately, the question remains: will increasing commercialization and strategic maneuvering stifle open innovation and limit transformative AI potential?

References

[1] Editorial_board — Original article — https://reddit.com/r/LocalLLaMA/comments/1sg29tl/opus_gemini_and_chatpt_top_models_all_disappeared/

[2] VentureBeat — AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro — https://venturebeat.com/technology/ai-joins-the-8-hour-work-day-as-glm-ships-5-1-open-source-llm-beating-opus-4

[3] The Verge — Gemini gets notebooks to help you organize projects — https://www.theverge.com/tech/909031/google-gemini-notebooks-notebooklm

[4] Ars Technica — Google Vids gets AI upgrade with Veo and Lyria models, directable AI avatars — https://arstechnica.com/ai/2026/04/google-vids-gets-ai-upgrade-with-veo-and-lyria-models-directable-ai-avatars/

Opus, Gemini and Chatpt top models all disappeared from the Arena, is this the reason?

The News

The Context

Why It Matters

The Bigger Picture

Daily Neural Digest Analysis

References

Was this article helpful?

Related Articles

backend-agnostic tensor parallelism has been merged into llama.cpp

ChatGPT finally offers $100/month Pro plan

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT