Google has introduced its “most capable” and “most flexible” AI model, Gemini, that will rival OpenAI’s GPT.
Google made Gemini available to users across the globe from Wednesday through its various products.
Developed by Google subsidiary DeepMind, Gemini is a multimodal AI model that can process and understand various types of information such as text, code, audio, images, and video while also applying knowledge or understanding from one context to another.
Gemini will be able to run on everything from data centers to mobile devices, the company said.
The initial release of Gemini is offered in three distinct sizes tailored for various purposes: Gemini Ultra for highly complex tasks, Gemini Pro for scaling across a wide range of tasks, and Gemini Nano for on-device tasks.
Gemini will be available in 170 countries across Google products, including chatbot Bard.
The chatbot Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, and understanding, Google said, adding that this is the biggest upgrade to Bard since it launched in February.
Google said it plans to expand Bard to different modalities and support new languages and locations in the short-term.
Gemini will also be available on Pixel 8 Pro, making it the first smartphone engineered to run Gemini Nano. Besides, it will be available in more of Google products and services such as Search, Ads, Chrome, and Duet AI. Google is also experimenting with Gemini in Search, making the search generative experience (SGE) faster for users.
Demis Hassabis, chief executive officer (CEO) and co-founder of Google DeepMind, said Gemini is the most capable and general model the company has ever built.
Technical specifications released by DeepMind showed Gemini surpassing OpenAI’s GPT 4V on a range of multimodal benchmarks. “Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI,” Hassabis said.
“We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models—and its capabilities are state of the art in nearly every domain,” he added.
Gemini 1.0 is trained to recognize and understand text, images, and audio at the same time. It can also understand, explain, and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go.
“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere,” Google CEO Sundar Pichai said.
Loading the player...
Tracing India’s Spice Trade Through Kerala’s Mattancherry
More Top Stories:
US regulator proposes new rules to combat AI-generated deepfakes