Gemini 1.5 Pro Now Available
Written by Kay Ewbank   
Monday, 29 April 2024

Google has released Gemini 1.5 Pro with improvements including Native Audio Understanding, System Instructions, and a JSON mode.

Google uses the name Gemini both for its conversational chatbot previously known as Bard, and its multimodal large language model (LLM) developed by Google DeepMind. Gemini Pro refers to the LLM.


gemini

Google Gemini 1.5 Pro is now available in 180 countries in the form of a public preview of the Gemini API that was first announced in February as a private preview.

Gemini Pro significantly increases the context window for a publicly available LLM. Until the release of Gemini 1.5 Pro, the largest context window in the world for a publicly available large language model was 200,000 tokens. 1.5 Pro runs up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model. Gemini 1.5 Pro will come with a 128,000 token context window by default, but the preview version has access to the experimental 1 million token context window.

The updated version adds a native audio (speech) understanding capability so it can hear information for the first time. Google says Gemini 1.5 Pro is now able to reason across both image (frames) and audio (speech) for videos uploaded in Google AI Studio, and "we look forward to adding API support for this soon".

The update also has a new File API to make it easy to handle files. The preview also adds features including system instructions and a JSON mode to give developers more control over the model's output. The JSON mode lets developers instruct the model to only output JSON objects. This mode enables structured data extraction from text or images. At the moment, it can be used via cURL, with Python SDK support "coming soon".

This release also has improvements to function calling. Users can now select modes to limit the model's outputs, improving reliability. Choose text, function call, or just the function itself.

The new version also adds support for developers being able to use Google's next generation text embedding model via the Gemini API. The new model, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the MTEB benchmarks.

Gemini 1.5 Pro is available in public preview now in Google AI Studio.

gemini

More Information

Google AI Studio

Related Articles

Google Rebrands Bard As Gemini With Subscription

Google Releases Gemma Open Models

Google Adds Gemini To Bard

Google Adds Code Generation To Bard

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Google Intensive AI Course - Free On Kaggle
05/11/2024

Google is offering a 5-Day Gen AI Intensive Course designed to equip data scientists with the knowledge and skills to tackle generative AI projects with confidence. It runs on the Kaggle platform from [ ... ]



Hour Of Code 2024 Is About To Kick Off
04/12/2024

This year the event that aims to provide a coding experience for all school students and anyone else who wants to join in runs between December 9th and 15th and includes new activities. Let's find out [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info