Gemma 4 Local

Mobile Setup Guide

Run Gemma 4 on iPhone & Android

Offline AI on your phone — auto-detects your device and finds the right model

Scanning your hardware…

Why run Gemma 4 on your phone?

Gemma 4 Edge models (E2B, E4B) are designed to run entirely on-device — no internet, no API keys, no data leaving your phone.

🔒

100% Private

All processing happens on your device. No data sent to any server. Perfect for sensitive queries.

✈️

Works Offline

Use AI on flights, in subways, or anywhere without connectivity. Download once, use forever.

🆓

Zero Cost

No subscription, no API fees. The model is free and open-weight. The app is free.

Which phones can run Gemma 4?

iPhone / iOS

Device RAM Model
iPhone 16 Pro Max 8 GB E2B ✓
iPhone 15 Pro 8 GB E2B ✓
iPhone 15 / 14 6 GB E2B ✓
iPhone 13 / SE 3 4 GB Too little RAM

E4B (~9.6 GB) will crash on any iPhone — stick to E2B for iOS.

🤖 Android

Device RAM Model
Galaxy S25 Ultra 12 GB E4B ✓ E2B ✓
Pixel 9 Pro 16 GB E4B ✓ E2B ✓
Mid-range (8 GB) 8 GB E2B ✓
Budget (<6 GB) 4–6 GB E2B (marginal)

E4B needs 10 GB+ available RAM. Most phones should use E2B.

How to install Gemma 4 on your phone

iPhone

  1. 1.

    Download Google AI Edge Gallery from the App Store.

  2. 2.

    Open the app, find Gemma 4 E2B, and tap Download. The model is ~2 GB.

  3. 3.

    Start chatting. Works fully offline once downloaded.

🤖 Android

  1. 1.

    Install AI Edge Gallery from Google Play.

  2. 2.

    Select Gemma 4 E2B (or E4B if your phone has 10 GB+ RAM).

  3. 3.

    Tap Download, then chat. No API key or account needed.

⚠ Safety note: Only install from Google Play or the official GitHub repo. Avoid APK mirror sites — search results for "gemma 4 apk" often lead to unverified repackages.

Building a mobile app? Use LiteRT-LM

AI Edge Gallery is for testing. If you're integrating Gemma 4 into your own iOS or Android app, Google's recommended path is LiteRT-LM — the current on-device LLM runtime in the Google AI Edge stack.

Goal Use this Platform
Try Gemma 4 on your phone AI Edge Gallery iOS & Android
Embed Gemma 4 in your own app LiteRT-LM iOS & Android
Legacy integration path MediaPipe LLM Inference Deprecated

LiteRT-LM supports Gemma 4 E2B at under 1.5 GB on some devices, with dynamic context handling for up to 128K tokens. The older MediaPipe LLM Inference API is marked deprecated — avoid it for new projects.

Watch: Gemma 4 running on iPhone

Running Gemma 4 entirely offline on an iPhone — no internet, no API key required.

What can you do with Gemma 4 on mobile?

Offline translation — translate text between languages without internet. Great for travel.

Private Q&A — ask sensitive questions (medical, legal, financial) without any data leaving your device.

Writing assistant — draft emails, messages, or social posts on the go.

Learning & study — explain concepts, summarize articles, or practice flashcards without connectivity.

When should you use a server instead?

On-device AI is great for privacy and offline use, but some workloads don't fit on a phone. Consider a server-backed approach when you need:

  • Larger models — 26B MoE or 31B Dense won't run on any phone. If you need that capability level from a mobile UI, host the model on a server and call it via API.
  • Long conversations — Edge models handle short prompts well, but extended multi-turn chats can exhaust mobile RAM and cause thermal throttling.
  • Consistent latency across devices — if your app needs to work the same on a budget Android and a flagship iPhone, a cloud endpoint is more predictable than on-device inference.

What to do after mobile testing

Stay mobile

If E2B handles your use case well, build around it. Design for short prompts, narrow scope, and offline-first — that's where mobile AI shines.

Go hybrid

Use the Edge model locally for quick tasks, with a cloud fallback for complex queries. Best of both: privacy when possible, power when needed.

Move to desktop

If the phone is too constrained, the same workflow often becomes effortless on a Mac or PC with a GPU.

Mobile FAQ

Does running Gemma 4 drain my battery quickly? +
Running inference is compute-intensive, so yes — expect noticeable battery drain during active use (similar to gaming). For short conversations (a few minutes), the impact is minor. The model doesn't use battery when idle.
Can I run the 26B model on my phone? +
No. The 26B MoE model requires ~15 GB of RAM just for weights. No current phone has enough available memory. The Edge models (E2B, E4B) are specifically designed for mobile — they're smaller, faster, and optimized for on-device inference.
Is the mobile version as smart as the desktop version? +
Edge models are smaller and less capable than the 26B/31B models you'd run on a desktop. They handle basic conversation, translation, and simple Q&A well. For complex reasoning, coding, or long documents, use a desktop with the full-size model.