Connect with us

Tech

The Wonder And The Promise Of GPT 5.2 Is Here

Published

on


Amid heavy competition from rivals like Google, OpenAI CEO Sam Altman issued a “code red” just weeks ago for an all-hands-on-deck effort to expedite a new model. Now we have it — GPT 5.2 is officially out, and those who are curious are already probing its abilities for an update on cutting-edge model design and what these LLMs can do for us.

“It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects,” explains an OpenAI spokesperson in the official announcement of the model that dropped Thursday, citing 5.2’s performance on SWE-Bench metrics and ARC tests.

There’s also this interesting brand call-out in the announcement, where OpenAI suggests that Notion, Box, Shopify, Harvey and Zoom ⁠have seen GPT 5.2 excel at “state-of-the-art long-horizon reasoning and tool-calling performance,” that Databricks, Hex and Triple Whale like its proficient work on agentic data science and document analysis tasks, and that Cognition, Warp, Charlie Labs, JetBrains and Augment Code have seen the model provide excellent agentic coding performance.

Economic Expertise

OpenAI people are explaining that they created 5.2 to help with “common professional tasks” and to “unlock even more economic value” for users, with a model that can work on tasks like cap tables and workforce planning, with what one reviewer called “stronger abstraction, clearer, more realistic balance and strategic responses and … deeper conceptual insights and ‘vibe,’” noting that 5.2 is great for tasks requiring a high degree of analytical capabilities or math reasoning.

In terms of value, an OpenAI GPT enterprise survey found that previous models saved professional users something like 40 to 60 minutes per day, and 5.2 is expected to outdo this.

Besides SWE and ARC tests, there are other concrete ways to measure model evolution. Earlier this year, OpenAI highlighted the GDPVal concept, using the idea of gross domestic product to explain the role of large language models in business.

“Previous AI evaluations like challenging academic tests and competitive coding challenges have been essential in pushing the boundaries of model reasoning capabilities, but they often fall short of the kind of tasks that many people handle in their everyday work,” the company stated. “To bridge this gap, we’ve been developing evaluations that measure increasingly realistic and economically relevant capabilities.”

This almost seems like it is written specifically for ChatGPT 5.2 — and it’s the type of thing that insiders are touting about the model’s power.

It also has a certain breadth of application: GDPVal covers a set of 44 occupations in the top nine industries contributing to U.S. GDP (nurse practitioners? Data scientists?) along with 1,320 specialized tasks.

Here’s another piece of what OpenAI says about the broad survey of what GDPVal measures and its practicality compared to other benchmarks: “GDPval is distinctive both in its realism and diversity of tasks being evaluated. Unlike other evaluations tied to economic value which concentrate on specific domains (e.g., SWE-Lancer), GDPval covers many tasks and occupations. And unlike benchmarks which involve synthetically creating tasks in the style of an academic exam or test (e.g., Humanity’s Last Exam or MMLU), GDPval focuses on tasks based on deliverables that are either an actual piece of work or product that exists today or are a similarly constructed piece of work product.”

I thought the reference to Humanity’s Last Exam was cogent, partly because I wrote about that particular analysis tool last week, and partly because, given AI’s progress, it seems like HLE might really be the last “exam” dominated by human expertise.

Getting to the Point

Another improvement that users are talking about with 5.2 has to do with classical machine learning, where engineers pondered a program’s ability to “converge” or coalesce information in a targeted way. There are a lot of ways to think about this, from analyzing how a biological organism understands visual data in sight, to noting how dimensional changes influence a neural net’s attention result. But some early users contend 5.2 is better at converging, cohering, targeting a result that makes sense. To be fair, some humans are better at this than others, too.

I heard one user, in the context of ChatGPT 5.2, mention “concision of thinking” — where I didn’t think “concision” was a word until I looked it up.

Anyway…

Whatever you want to call it, those who are enthusiastic about 5.2 are hoping it can do this consistently.

An Early Test

I wanted to include a shout out to Ethan Mollick, a power user with MIT connections, who is often front and center in reviewing new models. Sure enough, there’s an X post by Mollick making the rounds, where he asks 5.2 to render an undersea world with what he called “neo-gothic towers” — or, in my view, something that looks like you’re a tiny bug walking between the hairs of a dog or cat, albeit underwater.

The one-shot result? Terrific.

“Its an impressive model,” Mollick wrote, with concision.

Moving On

That’s some of what I’m hearing about GPT 5.2, but it’s only the first full day that the model has been out for use, so I’m sure we’ll be getting a lot of updates. Meanwhile, do yourself a favor and check out my blog on PaCoRe and similar models coming out of China to compare and contrast the American approach with what’s going on elsewhere in the world. Stay tuned.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Galaxy S26 Ultra Release Details: Samsung Confirms Powerful Upgrades

Published

on



Ahead of the Galaxy S26 Ultra’s launch in January, Samsung’s filings with the FCC reveal the answer to a long-standing community question… Exynos or Snapdragon?



Source link

Continue Reading

Tech

8 Google Zero-Day Warnings — Should You Stop Using Chrome?

Published

on



Google has now confirmed no fewer than eight zero-day vulnerabilities affecting the Chrome browser this year. Is it time to change your browser?



Source link

Continue Reading

Tech

Apple iPhone: Major Leak Reveals New Features In “Secret Roadmap”, Report Claims

Published

on



You might have just installed the latest iPhone software but a leaked build of iOS has just revealed what Apple has planned for months and years to come.



Source link

Continue Reading

Trending