Making Vertex AI the most enterprise-ready generative AI platform

It’s been amazing to see the impressive things customers are doing with generative AI and agents. Less than three months ago, we shared 101 real-world gen AI use cases from the world’s leading organizations. Since then, to enable businesses to roll out compelling AI agents faster, Google DeepMind has continued to pioneer model advances, in particular with Gemini and Imagen, and we’ve delivered dozens of groundbreaking features in our enterprise AI platform, Vertex AI. 

Customers are doing great things with generative AI, including UberEats, Ipsos, Jasper, Shutterstock, Quora, and many other organizations are accelerating their gen AI use cases into production with Google Cloud. 

For example, before Gemini 1.5 Pro, it was impossible to pursue most multimodal use cases, like submitting a video and simply asking questions about it. But since its release, we’ve seen innovative examples of customers having conversation with their data like:

A fast food retailer is using Gemini to analyze video footage from its stores to identify peak traffic periods and to optimize store layouts for improved customer experiences. The retailer also plans to combine this video analysis with sales data to better understand the factors that drive efficient and successful service.

A financial institution is processing scanned images of identification with submitted data forms, leveraging Gemini’s multimodality to automatically (and quickly) process both images and text to compare information for accuracy and help customers more conveniently open and access accounts. 

A sports company is leveraging Gemini to analyze a player’s swing. By overlaying Gemini’s insights onto their existing application, the AI’s analysis enhances the functionality of their swing analysis tool.

An insurance company can now analyze dashcam footage of accidents using Gemini to better understand and describe scenarios. This analysis can help calculate risk scores and even provide personalized driving tips based on observed behaviors.

An advertising and marketing services company is revolutionizing video description solutions by developing real-time streaming capabilities for both description and narration. This innovation streamlines video creation, increases efficiency, and allows for personalized content.

And that’s just looking at multimodality coupled with long-context windows — Gemini is equally powerful with code bases, long documents with embedded images, audio interviews, and much more. 

In addition to the reception from customers, it’s been encouraging to see industry analysts recognize us. For example, in just the last two months, Forrester Research named Google a Leader in The Forrester Wave™: AI Foundation Models for Language, Q2 2024 and Gartner® named Google a Leader in the 2024 Magic Quadrant™ for Cloud AI Developer Services and the 2024 Magic Quadrant™ for Data Science and Machine Learning Platforms1

Today, to accelerate this momentum, we are announcing significant advancements in models and enterprise platform capabilities with Vertex AI. 

Let’s start with models. 

Gemini 1.5 Flash: Market-leading cost-performance and low latency

Announced last month in public preview and now generally available, Gemini 1.5 Flash combines low latency, competitive pricing, and our groundbreaking 1 million-token context window, making it an excellent option for a wide variety of use cases at scale, from retail chat agents, to document processing, to research agents that can synthesize entire repositories. 

Most important of all, Gemini 1.5 Flash’s strong capabilities, low latency, and cost efficiency has quickly become a favorite with our customers, offering many compelling advantages over comparable models like GPT 3.5 Turbo: 

  • 1 million-token context window, which is approximately 60x bigger than the context window provided by GPT-3.5 Turbo 
  • On average, 40% faster than GPT-3.5 Turbo when given input of 10,000 characters2 
  • Up to 4X lower input price than GPT-3.5 Turbo, with context caching enabled for inputs larger than 32,000 characters 

“At UberEats, we are actively reimagining the way people get the things they want and need,” said Narendran Thangarajan, Staff Software Engineer at Uber. “As a result, we built the Uber Eats AI assistant, which enables our users to learn, ideate, discover and shop for things in our catalog seamlessly via natural language conversations. With Gemini 1.5 Flash, we are seeing close to 50% faster response times, which is critical to the overall customer experience. We look forward to the impact the model will have on efficiency and customer satisfaction and the new opportunities it unlocks with multimodality and longer context windows.”

“Gemini 1.5 Flash makes it easier for us to continue our scale-out phase of applying generative AI in high-volume tasks without the trade-offs on quality of the output or context window, even for multimodal use cases,” said JC Escalante, Global Head of Generative AI at market research firm Ipsos. “Gemini 1.5 Flash creates opportunities to better manage ROI.”

“As an AI-first company focused on empowering enterprise marketing teams to get work done faster, it is imperative that we use high-quality multimodal models that are cost-effective yet fast, so that our customers can create amazing content quickly and easily and reimagine existing assets,” said Suhail Nimji, Chief Strategy Officer at “With Gemini 1.5 Pro and now 1.5 Flash, we will continue raising the bar for content generation, ensuring adherence to brand voice and marketing guidelines, all while improving productivity in the process.”

Businesses and developers can click here to get started now with Gemini 1.5 Flash on Vertex AI. 

Gemini 1.5 Pro: With industry-leading 2 million-token context capabilities 

Now available with an industry-leading context window of up to 2 million tokens, Gemini 1.5 Pro is equipped to unlock unique multimodal use cases that no other model can handle.

Processing just six minutes of video requires over 100,000 tokens and large code bases can exceed 1 million tokens — so whether the use case involves finding bugs across countless lines of code, locating the right information across libraries of research, or analyzing hours of audio or video, Gemini 1.5 Pro’s expanded context window is helping organizations break new ground. 

Businesses and developers can click here to get started now with Gemini 1.5 Pro with 2 million-token context capabilities

Imagen 3: Faster image generation, superior prompt comprehension

Imagen 3 is Google’s latest image generation model. It delivers outstanding image quality alongside several improvements over Imagen 2 — including over 40% faster generation for rapid prototyping and iteration; better prompt understanding and instruction-following; photo-realistic generations, including of groups of people; and greater control over text rendering within an image. 

Launching in preview for Vertex AI customers with early access, Imagen 3 also includes multi-language support, built-in safety features like Google DeepMind’s SynthID digital watermarking, and support for multiple aspect ratios.

Image generated via Imagen 3

“The early results of Imagen 3 models have pleasantly surprised us with its quality and speed in our testing,” said Gaurav Sharma, Head of AI Research, Typeface, a startup that specializes in leveraging generative AI for enterprise content creation. “It brings improvements in generating details, as well as lifestyle images of humans. As early partners of Google’s foundation models, we are looking forward to exploring the new Imagen and Gemini models further on the journey ahead together.”

“We make it easy for our users to turn their ideas into eye-catching presentations, websites, and other visual documents generated with the power of AI. To enable even greater personalization and creativity while reducing manual tasks, we offer the high-quality text-to-image capabilities of Imagen,” said Jon Noronha, Co-Founder, Gamma. “Our users have already generated over 4 million images with Imagen, and we’re excited about how Imagen 3 will enable them to create images even faster, include text in images, and safely improve the generation of photorealistic images with people.” 

“Since adding Imagen to our AI image generator, our users have generated millions of pictures with the model. We’re excited by the enhancements Imagen 3 promises as it enables our users to execute their ideas faster without sacrificing quality. As an important enhancement to Shutterstock’s launch of the first ethically-sourced AI image generator, we also appreciate how safety is built in and that the content that is created is protected under Google Cloud’s indemnification for generative AI,” said Justin Hiza, VP of Data Services, Shutterstock.

Customers can click here to apply for access to Imagen 3 on Vertex AI. 

Third-party and open models: Delivering expanded model choice with Vertex AI

At Google Cloud, we’re committed to empowering customer choice and innovation through our curated collection of first-party, open, and third-party models available on Vertex AI. That’s why we’re thrilled that we recently added Anthropic’s newly released model, Claude 3.5 Sonnet, to Vertex AI. Customers can begin experimenting with or deploying in production Claude 3.5 Sonnet on Google Cloud. Later this summer, we’ll be deepening our partnership with Mistral with the addition of Mistral Small, Mistral Large, and Mistral Codestral to Vertex AI Model Garden. 

Continuing our push to meet customers where they are, earlier this year, we introduced Gemma, a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. We’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9-billion (9B) and 27-billion (27B) parameter sizes, Gemma 2 is much more powerful and efficient than the first generation, with significant safety advancements built in. Starting next month, customers will be able to access Gemma 2 on Vertex AI.

Lower costs: Context caching for both Gemini 1.5 Pro and Flash

To help our customers efficiently take advantage of Gemini’s vast context windows, starting today, we are rolling out context caching in public preview for both 1.5 Pro and Flash. As context length increases, it can be expensive and slow to get responses for long-context applications, making it difficult to deploy to production. Vertex AI context caching helps customers significantly reduce input costs, by 75 percent, leveraging cached data of frequently-used context. Today, Google is the only provider to offer a context caching API. 

Predictable performance: Provisioned throughput for Gemini models 

Generally available today, with allowlist, provisioned throughput lets customers responsibly scale their usage of Google’s first-party models, like 1.5 Flash, providing assurances for both capacity and price. This Vertex AI feature brings predictability and reliability to customer production workloads, giving them the assurance required to scale gen AI workloads aggressively. 

Delivering enterprise truth: Grounding with Google Search and now, grounding with third-party data 

Enterprise readiness requires more than the model. Enterprises need to maximize factuality and drastically minimize hallucinations, which means grounding model output in web, first-party, and third-party truth and data, while meeting stringent enterprise-readiness  standards, such as data governance and sovereignty. 

At Google I/O, we announced the general availability of Grounding with Google Search in Vertex AI. With the service now generally available, businesses of all kinds can augment Gemini outputs with Google Search grounding, giving the models access to fresh and high-quality information. Customers can easily integrate the enhanced Gemini models into their AI agents. 

“Gemini 1.5 Flash creates opportunities to better manage ROI moving forward. With the ability to ground model responses in Google Search, we can better increase the relevancy of results of our conversational experience, Ipsos Facto, with fresh data,” said JC Escalante of Ipsos. “This capability is a key component in our efforts to improve output quality and researcher experience.”

“Grounding with Google Search translates into more accurate, up-to-date, and trustworthy answers,” said Spencer Chan, Product Lead at Quora, which offers Grounding with Google Search on its Poe platform. “We’ve been delighted with the positive feedback so far, as users are now able to interact with Gemini bots with even greater confidence.”

Customers can click here to get started with Grounding with Google Search.

Additionally, today we are announcing that starting next quarter, Vertex AI will offer a new service that will enable customers to ground their AI agents with specialized third-party data. This will help enterprises integrate third-party data into their generative AI agents to unlock unique use cases and drive greater enterprise truth across their AI experiences. We are working with premier providers such as Moody’s, MSCI, Thomson Reuters, and Zoominfo to bring their data to this service.

“Google Cloud’s third-party data grounding offerings will open up new applications for KPMG and our clients,” said Brad Brown, Global Tax & Legal CTO at KPMG. “By seamlessly integrating specialized third-party data from industry leaders into our generative AI offerings, we can reduce time to insight, drive more informed decision-making, and ultimately deliver greater value using highly trustworthy data sources.”

To learn more about grounding, click here for a deeper dive. 

More factual responses: Grounding with high-fidelity mode 

In data-intensive industries like financial services, healthcare, and insurance, generative AI use cases often require the generated response to be sourced from only the provided context, not the model’s world knowledge. Grounding with high-fidelity, announced in experimental preview, is purpose-built to support such grounding use cases, including summarization across multiple documents, data extraction against a set corpus of financial data, or processing across a predefined set of documents. High-fidelity mode is powered by a version of Gemini 1.5 Flash that’s been fine-tuned to only use customer-provided content to generate answers and ensures high levels of factuality in response. 

Best options for data sovereignty: Data residency for data stored at-rest, limiting ML processing to region

Customers, especially those from regulated industries, demand control over where their data is stored and processed when using generative AI capabilities. To meet these data sovereignty requirements, we have data residency for data stored at-rest guarantees in 23 countries (of which 13 — Spain, Italy, Israel, Switzerland, Poland, Finland, Brazil, India, Taiwan, Hong Kong, Australia, KSA, Qatar — were added in 2024), with additional guarantees around limiting related ML processing to the US and EU. We are also working on expanding our ML processing commitments to eight more countries, starting with four countries in 2024.

Get started with Vertex AI today

As the customer stories we’ve shared today demonstrate, Vertex AI helps businesses turn the power of generative AI into tangible, transformative results. We look forward to continuing to bring innovations like Gemini 1.5 Flash and Grounding with Google Search to our customers, and to making Vertex AI the most enterprise-ready generative AI platform.

To get started with Gemini 1.5 Flash on Vertex AI, click here

To learn more about how Vertex AI can help your organization, click here, and to learn more about how Google Cloud customers are innovating with generative AI, read How 7 businesses are putting Google Cloud’s AI innovations to work

1. Gartner, Magic Quadrant for Cloud AI Developer Services, Jim Scheibmeir, Arun Batchu, Mike Fang – April 29, 2024. Gartner, Gartner Magic Quadrant for Data Science and Machine Learning Platforms, Afraz Jaffri, Aura Popa, Peter Krensky, Jim Hare, Raghvender Bhati, Maryam Hassanlou and Tong Zhang – June 17, 2024. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose
2. Per study published by Gemini team, 14 June 2024 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Posted in

Share this article
Shareable URL
Prev Post

Free to be SRE — how to use generative AI to code, test and troubleshoot your systems

Next Post

Google Cloud expands grounding capabilities on Vertex AI

Leave a Reply

Your email address will not be published. Required fields are marked *

Read next