Higher token prices and dwindling subsidies have forced AI developers to ask a simple question: do they really need the most powerful model for every task? The answer appears to be shifting toward "no." A wave of cost‑conscious decision‑making is sweeping the industry, and the impact could be profound.

Coinbase co‑founder Brian Armstrong laid out a bold forecast on X, saying 80 percent of AI workloads will run on "99 percent cheaper models" within the next 12 to 18 months. Only the most demanding 20 percent of jobs, he added, would stay on cutting‑edge systems where maximum intelligence matters.

That projection challenges the prevailing assumption that bigger models automatically deliver better results. For years, AI companies have raced to train ever larger architectures, betting that clients would choose raw performance over price. Investors poured money into the pursuit, effectively subsidizing the cost of running the most advanced models.

But the financial landscape is changing. As token fees climb, businesses are feeling the pinch. Some are trimming usage—sending fewer queries, shortening prompts, or abandoning marginal projects. Others are experimenting with smaller, more economical models.

Legal‑tech startup Harvey recently partnered with inference platform Fireworks AI to test this approach. By combining Claude Opus with Fireworks' GLM 5.1 and delegating the most intensive tasks to the larger model, Harvey slashed its inference spend by a factor of three without any noticeable drop in output quality. Co‑founder Gabe Pereyra told TechCrunch that “quality comes first, and in legal it always will,” but added that the definition of quality now includes delivering the right answer efficiently, not just using the biggest model.

The experiment underscores a broader trend: the divide is no longer between proprietary and open‑weight models, but between large and small models regardless of their source. Whether a company opts for DeepSeek’s V4 Flash or a trimmed‑down version of GPT‑5, the goal is the same—cut costs while preserving performance.

This shift threatens the economics of heavyweight labs like OpenAI and Anthropic, which have built their valuations on the promise of ever‑larger models. A move toward cheaper alternatives could erode the revenue streams that these firms rely on, especially as they approach high‑profile IPOs.

Meanwhile, a price war is brewing between in‑house inference from big labs and independent providers offering open‑weight models. As users evaluate options, the market may see a rapid consolidation around the most cost‑effective solutions, reshaping the competitive landscape.

Whether the industry fully embraces smaller models remains to be seen. Early results suggest that many tasks can be handled just as well by less compute‑intensive systems, but enterprises will continue to balance cost, speed, and accuracy as they chart their AI strategies.

Questo articolo è stato scritto con l'assistenza dell'IA.
News Factory APP - notizie agentiche per potenziare il tuo SEO e AEO.