Google Pushes Gemini Toward 2-Million-Token Context With Gemma 4 Family Alongside

Google has continued to push the frontier of context length and multimodal reasoning with the release of Gemini 3.1 Ultra, a model that operates natively across text, image, audio and video and supports a context window in the region of two million tokens. The release pairs with an updated open-weights family, Gemma 4, engineered specifically for advanced reasoning and agentic workflows on smaller deployments.

The headline number, a two-million-token window, is more relevant for builders than for end users. At that scale, applications can keep the entirety of a long legal contract, a multi-day call-centre transcript or a large software repository in working memory rather than relying on retrieval. That changes the design pattern for several classes of enterprise application, in particular knowledge management and compliance review.

Multimodality without transcription

Gemini 3.1 Ultra is positioned as a native multimodal model rather than a pipeline that converts audio and video into text before processing. For products such as meeting copilots, surveillance analytics and customer-experience recording, that means fewer integration seams and lower latency. It also implies a different cost surface, since audio and video tokens scale differently from text.

Gemma 4 takes a complementary route. Google has framed the family around "intelligence per parameter", suggesting a tighter focus on agentic tasks where developers want a model that can plan, call tools and run inside a constrained environment. Open-weights models still trail closed frontiers on the most demanding reasoning benchmarks, but the practical gap on common enterprise tasks has narrowed substantially over the past twelve months.

Enterprise positioning

The release lands at a moment when enterprise spending on AI has shifted toward platforms that offer multiple model families rather than a single foundation. Google Cloud has accelerated its own enterprise stack, and the company has continued to invest in security tooling that wraps frontier-model use with logging, retention controls and policy enforcement.

For regional buyers, the most relevant question is residency. Long-context models are particularly attractive in regulated sectors, but only if the data those windows hold can be processed in a way that satisfies UAE, Saudi or European requirements. Google has expanded its sovereign and hybrid offerings, and competitors are doing the same.

What it changes

The competitive picture continues to compress. OpenAI, Anthropic, Google and a clutch of well-funded challengers now ship updates within weeks of each other, and feature parity on basic capabilities is the norm rather than the exception. Where models still differentiate is on deployment surface, agentic orchestration depth and the realism of the customer references each vendor can put on a slide.

Multimodality without transcription

Enterprise positioning

What it changes

You may also like

Oracle Unveils OCI Enterprise AI With Grok 4.3 and Nemotron Models

Mid-May Roundup: Agents Are the Real Story of 2026 So Far

Stargate UAE Lines Up 200MW First Phase for Q3 Launch

Newsletter