Connect with us

Tech

OpenAI’s ChatGPT‑5.2 is Here and It Beats Human Experts on Most Knowledge Work Benchmarks

Published

on


OpenAI has announced GPT‑5.2, which it describes as its “most capable model series yet for professional knowledge work.” The company says ChatGPT Enterprise users already report saving 40–60 minutes per day with AI, while heavy users save more than 10 hours per week. GPT‑5.2 is designed to expand this impact by improving spreadsheets, presentations, code, image understanding, long‑context tasks, tool use, and complex multi‑step projects.

Knowledge‑Work and Coding Benchmarks

OpenAI states that GPT‑5.2 sets a new state of the art on multiple benchmarks, including GDPval, which measures well‑specified knowledge‑work tasks across 44 occupations. GPT‑5.2 Thinking beats or ties top industry professionals in 70.9% of GDPval comparisons, while GPT‑5.2 Pro reaches 74.1%. OpenAI calls GPT‑5.2 Thinking its first model performing at or above human expert level on this benchmark and says it produces outputs more than 11 times faster and at under 1% of expert cost, based on historical metrics.

GDPval tasks include sales presentations, accounting spreadsheets, urgent‑care schedules, manufacturing diagrams, and short videos. A GDPval judge described one GPT‑5.2 output as “an exciting and noticeable leap in output quality” that “appears to have been done by a professional company with staff,” while still noting minor errors.

On internal junior investment‑banking spreadsheet tasks, GPT‑5.2 Thinking scores 68.4%, up from 59.1% for GPT‑5.1, while GPT‑5.2 Pro scores 71.7%. These tasks include three‑statement models and leveraged buyout models for take‑private deals.

For coding, GPT‑5.2 Thinking scores 55.6% on SWE‑Bench Pro, 80.0% on SWE‑bench Verified, and 74.6% on SWE‑Lancer IC Diamond, all above GPT‑5.1 Thinking. OpenAI says GPT‑5.2 more reliably debugs production code, implements feature requests, refactors large codebases, and ships end‑to‑end fixes, with stronger front‑end performance including complex and 3D interfaces.

From single prompts, GPT‑5.2 has produced an “Ocean Wave Simulation” app, a holiday card builder, and a typing‑rain game. Early testers such as Windsurf, Warp, JetBrains, Augment Code, Cline, Charlie Labs, Kilo, and Azad describe GPT‑5.2 as state‑of‑the‑art for “agentic” coding. Windsurf CEO Jeff Wang calls it “the biggest leap for GPT models in agentic coding since GPT‑5.”

Factuality, Long‑Context Reasoning and Vision

OpenAI says GPT‑5.2 Thinking hallucinates less than GPT‑5.1 Thinking. On de‑identified ChatGPT queries, answers with at least one error are 30% relatively less common. With search and maximum reasoning, GPT‑5.2 Thinking answers 93.9% of questions without errors, versus 91.2% for GPT‑5.1 Thinking; without search, it scores 88.0% versus 87.3%. OpenAI notes that GPT‑5.2 remains imperfect and urges double‑checking for critical work.

For long‑context reasoning, GPT‑5.2 Thinking sets a new state of the art on MRCRv2. On the “4‑needle” variant up to 256,000 tokens, OpenAI says it is the first model it has seen approach near‑100% accuracy and that it consistently outperforms GPT‑5.1 Thinking from 4K to 256K tokens. GPT‑5.2 Thinking also scores higher on long‑context BrowseComp and GraphWalks. OpenAI says this enables use on long reports, contracts, research papers, transcripts, and multi‑file projects, and it pairs GPT‑5.2 Thinking with a new /compact Responses endpoint to extend effective context.

OpenAI describes GPT‑5.2 Thinking as its strongest vision model so far, citing higher scores than GPT‑5.1 Thinking on CharXiv Reasoning and ScreenSpot‑Pro, and better spatial understanding in image examples. On tool‑use benchmarks such as τ2‑bench Telecom and Retail, BrowseComp, Scale MCP‑Atlas, and Toolathlon, GPT‑5.2 Thinking and GPT‑5.2 Pro also outperform GPT‑5.1 Thinking.

Science, Reasoning and Rollout

On science and math tests, GPT‑5.2 Pro and GPT‑5.2 Thinking improve GPQA Diamond and FrontierMath scores over GPT‑5.1 Thinking. On abstract reasoning benchmarks ARC‑AGI‑1 and ARC‑AGI‑2, GPT‑5.2 Pro and GPT‑5.2 Thinking also post higher results, with GPT‑5.2 Pro crossing 90% on ARC‑AGI‑1.

OpenAI is rolling out GPT‑5.2 Instant, Thinking, and Pro to paid ChatGPT plans and the API, with GPT‑5.1 remaining as a legacy model for three months. The company says GPT‑5.2 is part of an ongoing push to improve general intelligence, long‑context understanding, tool use, vision, safety, and reliability.





Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Apple’s Edge Light Adds A Virtual Ring Light To Your Mac Video Calls: Here’s How To Use It

Published

on



Apple released a new MacOS update recently. It brings the Edge Light feature to all Apple silicon-powered Macs. Here’s how you can use it on your next video call.



Source link

Continue Reading

Tech

If You See This Message From PayPal, You Are Under Attack

Published

on



This email comes from PayPal, but it is an attack — what you need to know and do to be safe from these hackers.



Source link

Continue Reading

Tech

Pakistan to Establish Its First Dedicated Council for Digital Businesses

Published

on



The government has decided to establish Pakistan’s first Pakistan Council of Digital Economy as part of its efforts to strengthen the country’s expanding digital sector.

The initiative is aimed at creating an institutional framework to support digital businesses and accelerate the growth of the digital economy.

According to official sources, the Pakistan Digital Authority is preparing to form a separate council for digital businesses, structured on the lines of a chamber of commerce. The proposed council is expected to act as a formal platform representing digital companies operating across different segments of the economy.

The Prime Minister is expected to inaugurate the Pakistan Council of Digital Economy once the initial framework is finalized. The main objective of the council will be to bring digital companies together on a single platform to address common challenges and opportunities within the sector.

The council will facilitate coordination between the government and private digital companies, focusing on consultations related to online business issues, regulatory frameworks, taxation, and policy development. Through this platform, private sector input will be incorporated into decision-making processes related to the digital economy.

Officials said the Pakistan Digital Authority has already held consultations with around 40 private digital companies. The move is intended to ensure greater private sector participation in promoting the digital economy and to create a more supportive environment for digital innovation and investment in Pakistan.





Source link

Continue Reading

Trending