Google Releases EmbeddingGemma-State Of The Art On-Device Embedding |
Written by Nikos Vaggalis | |||
Thursday, 02 October 2025 | |||
Google has released a small, specialized, but powerful model that can run on low resource devices. EmbeddingGemma is a new open embedding model that delivers value for money for its size. Based on the Gemma 3 architecture, it is trained on 100+ languages and is small enough to run on less than 200MB of RAM with quantization. At this point let's not conflate an embedding model with a Large Language Level. The embedding comes before the LLM takes action. For instance when doing RAG, you generate the embedding of a user’s prompt and calculate the similarity with the embeddings of all the documents in question. When the relevant chunks are found, they are then passed together with the user's query to the LLM to let it perform its GenAI magic and give an answer back to the user. Other than that, the model is 300 million parameters wide and supports embeddings of up to 768 dimensions and is tuned for performance and minimal resource consumption. Usually when doing RAG even by using a local LLM with the use of say Ollama, you generate the embeddings first by calling an embedding API of a remote service like OpenAIs. This is so because generating embeddings is a resource hungry task that requires strong hardware, so its easier to offload it to a server like OpenAI's. To generate your embeddings efficiently on your lowly mobile phone itself is a game changer. This is also crucial for complete privacy and control, as models using private data can run locally without connecting to external servers. As such it makes for a good choice when wanting to build local on device runnable AI powered applications. Of course, for that you also need a framework like Cactus which we covered in Cactus Lets You Build LLM Powered Applications On Your Mobile Phone: Cactus is also cross-platform, so you can build AI applications using popular frameworks like Flutter, React Native, and Kotlin Multiplatform. Key features are:
It can also generate embeddings offline. For instance taking Flutter: import 'package:cactus/cactus.dart'; final lm = await CactusLM.download( final text = 'Your text to embed'; The embedding parameters include : mode:
Now you can replace "nomic-embed-text-v2-moe.Q8_0.gguf" with the EmbeddingGemma gguf file from HuggingFace. Google's engineers themselves make the following suggestions:
So if you're building local first AI powered applications, EmbeddingGemma is the way to go. More InformationIntroducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings Related ArticlesCactus Lets You Build LLM Powered Applications On Your Mobile Phone
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Thursday, 02 October 2025 ) |