- | 4:19 pm
Google unveils upgrades to Gemini, new video and image generator
These announcements highlighted Google's prioritization of AI as it competes with rivals like OpenAI
Google rolled out a whole host of advanced features in its artificial intelligence (AI) models including Gemini during its annual developer conference, which the tech company dubbed “Google’s version of Eras tour.”
The launches highlight Google’s prioritization of AI as it competes with rivals like OpenAI.
Chief executive Sunder Pichai took to the stage to announce the “Gemini era” for Google, emphasizing its commitment to expanding its Gemini model, having invested in research product, and infrastructure for over a decade.
The company announced many AI tools, from new Gemini and Gemma models, Google’s answer to OpenAI’s text-to-video generator Sora, to its very own version of GPT-4o.
Here are the top announcements:
A revamped AI model
Google announced significant upgrades to its powerhouse AI model, Gemini 1.5 Pro. The model will now be able to handle massive amounts of data, allowing it to summarize 1,500 pages of user-uploaded text.
Google also introduced Gemini 1.5 Flash, a cost-effective model designed for tackling smaller tasks with speed.
In a move to broaden accessibility, Pichai announced that the improved Gemini will offer translations in 35 languages, available to developers worldwide. He also shared that Gemini 1.5 Pro will be integrated within Gmail and will be able to summarize emails, analyze PDFs and videos, providing a comprehensive overview to the user if they miss a few days of work.
Changes to videos and images
Google unveiled two groundbreaking AI tools: Veo, Google’s answer to OpenAI’s recently launched Sora, capable of generating high-resolution videos, and Imagen 3, the company’s most advanced text-to-image model yet. With Imagen 3, Pichai promised incredibly realistic images with minimal visual imperfections compared to the previous model that was launched earlier this year. Google had to pull the feature after images went viral online, failing to create accurate images.
These tools will be accessible to select creators starting Monday, integrated with Vertex AI, Google’s platform for machine learning.
Google’s answer to the viral GPT-4o
Google’s DeepMind division unveiled Project Astra, a leap towards a next-gen AI assistant. Think Samantha from Spike Jonze’s Her, but less sad and more useful.
The technology, currently a prototype, will integrate with the user’s surroundings. DUring the demo, the assistant showcased its capabilities using video as well as audio input. It helped the user with various day-to-day tasks like locating lost glasses, explaining code, and identifying components within a video frame, in real time.
Google’s focus on an audio-visual based AI assistant echoes OpenAI’s recent demonstration of a similar real-time, audio-based interaction with their ChatGPT model — GPT-4o — that went viral in a video where the AI assistant helped a volunteer get dressed for an interview, based on visual cues.
Interview prep with GPT-4o pic.twitter.com/st3LjUmywa
— OpenAI (@OpenAI) May 13, 2024