OpenAI rolled out Images 2.0 on Tuesday, positioning the new system as a "visual thought partner" that goes beyond pixel‑by‑pixel rendering. The model first scans a prompt, plans composition, reasons about spatial relationships, and can pull real‑time data from the web before drawing. The result: up to eight images that share consistent characters and settings, all generated from a single instruction.

Text rendering, a long‑standing weak spot for AI image tools, reaches roughly 99% accuracy across languages and scripts, including Japanese, Korean, Chinese, Hindi and Bengali. If independent tests confirm the claim, designers could finally rely on the model for production‑grade graphics rather than novelty demos.

Within 12 hours of launch, Images 2.0 claimed the number‑one spot on the Image Arena leaderboard, posting a score of 1,512—242 points ahead of the runner‑up, Google’s Nano Banana 2. The lead marks the largest margin ever recorded on the benchmark, underscoring the model’s immediate impact on the competitive landscape.

OpenAI split the offering into two access modes. "Instant" mode ships to every ChatGPT user, free or paid, delivering the core visual upgrades: sharper layouts, better text, and richer detail. "Thinking" mode, reserved for Plus ($20/month), Pro ($200/month), Business and Enterprise plans, unlocks web‑search, multi‑image batching, and output verification. The premium tier therefore carries the bulk of the reasoning capability that fuels the quality jump.

Pricing follows a token‑based structure. Input tokens cost $8 per million, cached inputs $2 per million, and image output $30 per million tokens. In practice, a single image runs between $0.04 and $0.35 depending on prompt complexity and resolution, which can reach up to 2K.

The model’s API identifier is gpt-image-2, with developer access slated for early May 2026. OpenAI noted a knowledge cutoff of December 2025; anything that emerged after that date requires the model’s web‑search function to stay current.

Safety measures include content filtering, C2PA metadata for provenance, and ongoing monitoring, reflecting the company’s response to growing regulatory scrutiny over synthetic media. OpenAI declined to disclose the underlying architecture, describing Images 2.0 simply as a "generalist model" or "GPT for images" without confirming whether it uses diffusion, autoregressive or hybrid techniques.

Industry observers see the multi‑image capability as a workflow game‑changer. Marketing teams can now generate a family of social‑media assets, storyboard frames, or children’s‑book illustrations in one go, eliminating the need to prompt and stitch each image individually. Integration with Codex means developers can produce UI mockups and visual assets directly within the same environment they use for code, slides and browser automation.

With DALL‑E 2 and DALL‑E 3 slated for retirement on May 12, 2026, Images 2.0 becomes OpenAI’s flagship visual tool. The shift signals a broader strategy: embed generative image capabilities across the ChatGPT platform, positioning the service as a direct competitor not only to Midjourney and Google’s models but also to design‑focused SaaS products like Canva and Figma.

Este artículo fue escrito con la asistencia de IA.
News Factory SEO te ayuda a automatizar contenido de noticias para tu sitio.