Google Releases EmbeddingGemma-State Of The Art On-Device Embedding

Written by Nikos Vaggalis

Thursday, 02 October 2025

Google has released a small, specialized, but powerful model that can run on low resource devices.

EmbeddingGemma is a new open embedding model that delivers value for money for its size. Based on the Gemma 3 architecture, it is trained on 100+ languages and is small enough to run on less than 200MB of RAM with quantization.

At this point let's not conflate an embedding model with a Large Language Level. The embedding comes before the LLM takes action. For instance when doing RAG, you generate the embedding of a user’s prompt and calculate the similarity with the embeddings of all the documents in question. When the relevant chunks are found, they are then passed together with the user's query to the LLM to let it perform its GenAI magic and give an answer back to the user.

Other than that, the model is 300 million parameters wide and supports embeddings of up to 768 dimensions and is tuned for performance and minimal resource consumption. Usually when doing RAG even by using a local LLM with the use of say Ollama, you generate the embeddings first by calling an embedding API of a remote service like OpenAIs. This is so because generating embeddings is a resource hungry task that requires strong hardware, so its easier to offload it to a server like OpenAI's. To generate your embeddings efficiently on your lowly mobile phone itself is a game changer. This is also crucial for complete privacy and control, as models using private data can run locally without connecting to external servers.

As such it makes for a good choice when wanting to build local on device runnable AI powered applications. Of course, for that you also need a framework like Cactus which we covered in Cactus Lets You Build LLM Powered Applications On Your Mobile Phone:

Cactus is also cross-platform, so you can build AI applications using popular frameworks like Flutter, React Native, and Kotlin Multiplatform. Key features are:

Supports GGUF Models: Works with any GGUF model from Hugging Face, including Qwen, Gemma, Llama, and DeepSeek.
Multi-Modal AI: Run various models including LLMs, VLMs, Embedding Models, and TTS (Text-to-Speech) models.

It can also generate embeddings offline. For instance taking Flutter:

import 'package:cactus/cactus.dart';

final lm = await CactusLM.download(
modelUrl: 'https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe-GGUF/resolve/main/nomic-embed-text-v2-moe.Q8_0.gguf',
contextSize: 2048,
generateEmbeddings: true,
);
lm.init()

final text = 'Your text to embed';
final result = await lm.embedding(text);

The embedding parameters include :

mode:

"local": Only use device model
"remote": Only use cloud API
"localfirst": Try local, fallback to cloud if it fails
"remotefirst": Try cloud, fallback to local if it fails, so that you can fallback to a cloud embedding API.

Now you can replace "nomic-embed-text-v2-moe.Q8_0.gguf" with the EmbeddingGemma gguf file from HuggingFace.

Google's engineers themselves make the following suggestions:

For on-device, offline use cases: EmbeddingGemma is your best choice, optimized for privacy, speed, and efficiency.
For most large-scale, server-side applications: Explore our state-of-the-art Gemini Embedding model via the Gemini API for highest quality and maximum performance.

So if you're building local first AI powered applications, EmbeddingGemma is the way to go.

More Information

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Cactus Lets You Build LLM Powered Applications On Your Mobile Phone

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Memgraph Adds AI Graph Toolkit
11/11/2025

Memgraph has been updated with the addition of two new tools; an AI Graph Toolkit that automates the porting of data into a knowledge graph in Memgraph; and an MCP Client within Memgraph Lab.

+ Full Story

Microsoft Announces GitHub Copilot App For Java And .NET
27/10/2025

GitHub Copilot has been updated with app modernization features for Java and .NET applications. The news was announced by Microsoft at last month's Migrate and Modernize Summit, alongside new ag [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 02 October 2025 )

More Information

Related Articles

Comments