Should I use AI Edge Gallery or LiteRT-LM?

AI Edge Gallery is for testing and personal use — install it and chat with Gemma 4 in minutes. LiteRT-LM is Google's SDK for developers embedding Gemma 4 into their own iOS or Android apps. The older MediaPipe LLM Inference API is deprecated.

When should I use a server instead of running locally on my phone?

Use a server when you need larger models (26B/31B), long multi-turn conversations, or consistent performance across different devices. On-device is best for privacy, offline use, and short tasks with Edge models.

Mobile Setup Guide

Run Gemma 4 on iPhone & Android

Q: Does running Gemma 4 drain my battery quickly?

Running inference is compute-intensive, so expect noticeable battery drain during active use (similar to gaming). For short conversations, the impact is minor. The model doesn't use battery when idle.

Q: Can I run the 26B model on my phone?

No. The 26B MoE model requires ~15 GB of RAM just for weights. No current phone has enough available memory. The Edge models (E2B, E4B) are specifically designed for mobile.

Q: Is the mobile version as smart as the desktop version?

Edge models are smaller and less capable than the 26B/31B models. They handle basic conversation, translation, and simple Q&A well. For complex reasoning or coding, use a desktop with the full-size model.

Offline AI on your phone — auto-detects your device and finds the right model

Scanning your hardware…

Why run Gemma 4 on your phone?

Gemma 4 Edge models (E2B, E4B) are designed to run entirely on-device — no internet, no API keys, no data leaving your phone.

🔒

100% Private

All processing happens on your device. No data sent to any server. Perfect for sensitive queries.

✈️

Works Offline

Use AI on flights, in subways, or anywhere without connectivity. Download once, use forever.

🆓

Zero Cost

No subscription, no API fees. The model is free and open-weight. The app is free.

Which phones can run Gemma 4?

iPhone / iOS

Device	RAM	Model
iPhone 16 Pro Max	8 GB	E2B ✓
iPhone 15 Pro	8 GB	E2B ✓
iPhone 15 / 14	6 GB	E2B ✓
iPhone 13 / SE 3	4 GB	Too little RAM

E4B (~9.6 GB) will crash on any iPhone — stick to E2B for iOS.

🤖 Android

Device	RAM	Model
Galaxy S25 Ultra	12 GB	E4B ✓ E2B ✓
Pixel 9 Pro	16 GB	E4B ✓ E2B ✓
Mid-range (8 GB)	8 GB	E2B ✓
Budget (<6 GB)	4–6 GB	E2B (marginal)

E4B needs 10 GB+ available RAM. Most phones should use E2B.

How to install Gemma 4 on your phone

iPhone

1.
Download Google AI Edge Gallery from the App Store.
2.
Open the app, find Gemma 4 E2B, and tap Download. The model is ~2 GB.
3.
Start chatting. Works fully offline once downloaded.

🤖 Android

1.
Install AI Edge Gallery from Google Play.
2.
Select Gemma 4 E2B (or E4B if your phone has 10 GB+ RAM).
3.
Tap Download, then chat. No API key or account needed.

⚠ Safety note: Only install from Google Play or the official GitHub repo. Avoid APK mirror sites — search results for "gemma 4 apk" often lead to unverified repackages.

Building a mobile app? Use LiteRT-LM

AI Edge Gallery is for testing. If you're integrating Gemma 4 into your own iOS or Android app, Google's recommended path is LiteRT-LM — the current on-device LLM runtime in the Google AI Edge stack.

Goal	Use this	Platform
Try Gemma 4 on your phone	AI Edge Gallery	iOS & Android
Embed Gemma 4 in your own app	LiteRT-LM	iOS & Android
~~Legacy integration path~~	~~MediaPipe LLM Inference~~	Deprecated

LiteRT-LM supports Gemma 4 E2B at under 1.5 GB on some devices, with dynamic context handling for up to 128K tokens. The older MediaPipe LLM Inference API is marked deprecated — avoid it for new projects.

Watch: Gemma 4 running on iPhone

Running Gemma 4 entirely offline on an iPhone — no internet, no API key required.

What can you do with Gemma 4 on mobile?

→

Offline translation — translate text between languages without internet. Great for travel.

→

Private Q&A — ask sensitive questions (medical, legal, financial) without any data leaving your device.

→

Writing assistant — draft emails, messages, or social posts on the go.

→

Learning & study — explain concepts, summarize articles, or practice flashcards without connectivity.

When should you use a server instead?

On-device AI is great for privacy and offline use, but some workloads don't fit on a phone. Consider a server-backed approach when you need:

→ Larger models — 26B MoE or 31B Dense won't run on any phone. If you need that capability level from a mobile UI, host the model on a server and call it via API.
→ Long conversations — Edge models handle short prompts well, but extended multi-turn chats can exhaust mobile RAM and cause thermal throttling.
→ Consistent latency across devices — if your app needs to work the same on a budget Android and a flagship iPhone, a cloud endpoint is more predictable than on-device inference.

What to do after mobile testing

Stay mobile

If E2B handles your use case well, build around it. Design for short prompts, narrow scope, and offline-first — that's where mobile AI shines.

Go hybrid

Use the Edge model locally for quick tasks, with a cloud fallback for complex queries. Best of both: privacy when possible, power when needed.

Move to desktop

If the phone is too constrained, the same workflow often becomes effortless on a Mac or PC with a GPU.

Mobile FAQ

Does running Gemma 4 drain my battery quickly? +

Running inference is compute-intensive, so yes — expect noticeable battery drain during active use (similar to gaming). For short conversations (a few minutes), the impact is minor. The model doesn't use battery when idle.

Can I run the 26B model on my phone? +

No. The 26B MoE model requires ~15 GB of RAM just for weights. No current phone has enough available memory. The Edge models (E2B, E4B) are specifically designed for mobile — they're smaller, faster, and optimized for on-device inference.

Is the mobile version as smart as the desktop version? +

Edge models are smaller and less capable than the 26B/31B models you'd run on a desktop. They handle basic conversation, translation, and simple Q&A well. For complex reasoning, coding, or long documents, use a desktop with the full-size model.

Run Gemma 4 on iPhone & Android

Why run Gemma 4 on your phone?

100% Private

Works Offline

Zero Cost

Which phones can run Gemma 4?

iPhone / iOS

🤖 Android

How to install Gemma 4 on your phone

iPhone

🤖 Android

Building a mobile app? Use LiteRT-LM

Watch: Gemma 4 running on iPhone

What can you do with Gemma 4 on mobile?

When should you use a server instead?

What to do after mobile testing

Stay mobile

Go hybrid

Move to desktop

Mobile FAQ

Other platforms

Gemma 4 on Mac →

Full Hardware Matcher →