Google Launches Brand New Vision Language Model: PaliGemma

American tech giant Google is expanding its generative AI catalog with PaliGemma, a brand-new AI model. Announced during the recently concluded Google I/O, PaliGemma is a vision-language model (VLM) that understands both visual and text prompts simultaneously.

“Today, we’re excited to further expand the Gemma family with the introduction of PaliGemma, a powerful open vision-language model (VLM)”, the company stated during the event. The model was inspired by PaLI-3, a small-scale VLM developed by Cornell University. It integrates open components from both SigLIP (Sigmoid Language Image Pre-training) and the Gemma language model.

According to Google, the model is designed for “class-leading fine-tune performance” on several tasks including writing captions for images, answering visual questions, and understanding texts in images. Google further added, “We’re providing both pre-trained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration”.

Unlike many of Google’s other AI models, PaliGemma is an open model. It is available to developers and researchers on various platforms such as GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden, and ai.nvidia.com. Interested developers can also interact with the model via this Hugging Face Space. The launch of PaliGemma coincides with other AI tools released by Google like Gemma 2 and Gemini 1.5 Flash.

Tags: Generative AI Google PaliGemma

Google Launches Brand New Vision Language Model: PaliGemma

Most Read

Google Launches Brand New Vision Language Model: PaliGemma

Tether Plans To Launch A US-Backed Stablecoin, Talks Ongoing

Thailand SEC Greenlights USDT And USDC For Crypto Investments

SEC Scales Back Crypto Focus, Exempts Some Stablecoins From Securities Rules

Amazon Commits $4 Billion Investment In Anthropic To Power The Generation Of AI Development

Subscribe To Our Newsletter

Follow The Distributed