Mobile Setup Guide
Run Gemma 4 on iPhone & Android
Offline AI on your phone — auto-detects your device and finds the right model
Why run Gemma 4 on your phone?
Gemma 4 Edge models (E2B, E4B) are designed to run entirely on-device — no internet, no API keys, no data leaving your phone.
100% Private
All processing happens on your device. No data sent to any server. Perfect for sensitive queries.
Works Offline
Use AI on flights, in subways, or anywhere without connectivity. Download once, use forever.
Zero Cost
No subscription, no API fees. The model is free and open-weight. The app is free.
Which phones can run Gemma 4?
iPhone / iOS
| Device | RAM | Model |
|---|---|---|
| iPhone 16 Pro Max | 8 GB | E2B ✓ |
| iPhone 15 Pro | 8 GB | E2B ✓ |
| iPhone 15 / 14 | 6 GB | E2B ✓ |
| iPhone 13 / SE 3 | 4 GB | Too little RAM |
E4B (~9.6 GB) will crash on any iPhone — stick to E2B for iOS.
🤖 Android
| Device | RAM | Model |
|---|---|---|
| Galaxy S25 Ultra | 12 GB | E4B ✓ E2B ✓ |
| Pixel 9 Pro | 16 GB | E4B ✓ E2B ✓ |
| Mid-range (8 GB) | 8 GB | E2B ✓ |
| Budget (<6 GB) | 4–6 GB | E2B (marginal) |
E4B needs 10 GB+ available RAM. Most phones should use E2B.
How to install Gemma 4 on your phone
iPhone
- 1.
Download Google AI Edge Gallery from the App Store.
- 2.
Open the app, find Gemma 4 E2B, and tap Download. The model is ~2 GB.
- 3.
Start chatting. Works fully offline once downloaded.
🤖 Android
- 1.
Install AI Edge Gallery from Google Play.
- 2.
Select Gemma 4 E2B (or E4B if your phone has 10 GB+ RAM).
- 3.
Tap Download, then chat. No API key or account needed.
⚠ Safety note: Only install from Google Play or the official GitHub repo. Avoid APK mirror sites — search results for "gemma 4 apk" often lead to unverified repackages.
Building a mobile app? Use LiteRT-LM
AI Edge Gallery is for testing. If you're integrating Gemma 4 into your own iOS or Android app, Google's recommended path is LiteRT-LM — the current on-device LLM runtime in the Google AI Edge stack.
| Goal | Use this | Platform |
|---|---|---|
| Try Gemma 4 on your phone | AI Edge Gallery | iOS & Android |
| Embed Gemma 4 in your own app | LiteRT-LM | iOS & Android |
| Deprecated |
LiteRT-LM supports Gemma 4 E2B at under 1.5 GB on some devices, with dynamic context handling for up to 128K tokens. The older MediaPipe LLM Inference API is marked deprecated — avoid it for new projects.
Watch: Gemma 4 running on iPhone
Running Gemma 4 entirely offline on an iPhone — no internet, no API key required.
What can you do with Gemma 4 on mobile?
Offline translation — translate text between languages without internet. Great for travel.
Private Q&A — ask sensitive questions (medical, legal, financial) without any data leaving your device.
Writing assistant — draft emails, messages, or social posts on the go.
Learning & study — explain concepts, summarize articles, or practice flashcards without connectivity.
When should you use a server instead?
On-device AI is great for privacy and offline use, but some workloads don't fit on a phone. Consider a server-backed approach when you need:
- → Larger models — 26B MoE or 31B Dense won't run on any phone. If you need that capability level from a mobile UI, host the model on a server and call it via API.
- → Long conversations — Edge models handle short prompts well, but extended multi-turn chats can exhaust mobile RAM and cause thermal throttling.
- → Consistent latency across devices — if your app needs to work the same on a budget Android and a flagship iPhone, a cloud endpoint is more predictable than on-device inference.
What to do after mobile testing
Stay mobile
If E2B handles your use case well, build around it. Design for short prompts, narrow scope, and offline-first — that's where mobile AI shines.
Go hybrid
Use the Edge model locally for quick tasks, with a cloud fallback for complex queries. Best of both: privacy when possible, power when needed.
Move to desktop
If the phone is too constrained, the same workflow often becomes effortless on a Mac or PC with a GPU.