Run AI Models Locally in Your Browser – No Subscriptions, 100% Private.

A Chrome sidepanel extension that runs AI models entirely in your browser using WebGPU. No external software, no cloud—just install and chat with Llama 3.2, DeepSeek-R1, and Mistral.

Completely free, you get:

True privacy—everything runs locally in your browser
Completely free—use in exchange for your email address and usage statistics (not your data)
Messages and chat history stored locally (IndexedDB); export or delete anytime
Your conversations stay private—we only collect usage statistics (not data)
Pre-optimized models that work immediately
Ongoing updates and support

The tool remains free as long as you share your email address and usage statistics (not your conversation data) via GA4 Events, plus provide feedback when prompted.

Download extension

NoAIBills Chrome Extension main chat interface

As featured in

So what—yet another chatbot. What's the big deal?

Runs entirely in your browser—just add the Chrome extension. No cloud, no desktop app.

Cloud solutions

Often need access to your data

Desktop apps

Can't install when you lack admin rights

NoAIBills

Runs entirely in the browser—just add the Chrome extension

The "Standalone" Advantage

Unlike extensions that control Ollama or other apps, NoAIBills is self-contained. WebGPU and Wasm run everything in the browser—no terminal, no background processes.

Tech Stack

MLC LLM

Machine Learning Compilation for Large Language Models - the core engine that enables running LLMs directly in your browser using WebGPU. Licensed under Apache License 2.0.

State-of-the-art Machine Learning for the web - run transformer models directly in your browser with no need for a server. Functionally equivalent to Hugging Face's transformers Python library, using ONNX Runtime for browser-based inference. Licensed under Apache License 2.0.

Browser AI

TypeScript library for using in-browser AI models with the Vercel AI SDK, providing seamless fallback to server-side models. Supports Chrome/Edge built-in AI, WebLLM, and Transformers.js models. Licensed under MIT License.

Prompt API

W3C proposal for a web API that enables JavaScript code to prompt browser-provided language models directly. Provides a uniform interface for accessing on-device AI capabilities with support for multimodal inputs, tool use, and structured outputs. Published under W3C Community Contributor License Agreement.

React

The user interface is built with React, providing a responsive and interactive experience. Licensed under MIT License.

Dexie.js

Minimalistic wrapper around IndexedDB for storing and querying data in the browser. Used to persist chat history and model weights locally. Licensed under Apache License 2.0.

AI Elements

Component library built on top of shadcn/ui for building AI-native applications faster. Provides pre-built, customizable React components for conversations, messages, code blocks, and reasoning displays. Licensed under MIT License.

Driver.js

Powerful and lightweight JavaScript library for creating interactive, feature-rich guided tours and tooltips. Helps users navigate and understand features through step-by-step tutorials with smooth animations and customizable styling. Licensed under MIT License.

ShadCN UI

Beautiful, accessible component library that powers the extension's interface. Licensed under MIT License.

Lucide Icons

Icon library used throughout the extension for a consistent visual language. Licensed under ISC License.

Supported AI Models

The AI models featured in this extension are all small, open-source models. While they may not have the same level of intelligence or sophistication as paid premium enterprise models, these models are more than sufficient for daily low-intensity use cases. Examples include fixing grammar, providing summaries of text, aiding in draft email messages, quick translations, and basic content generation.

When compared to enterprise-level AI solutions, these models have some limitations: they may struggle with highly complex reasoning tasks, lack the depth of knowledge in specialized domains, may produce less nuanced responses, and might require more iterations to achieve desired results. However, for most everyday tasks, they provide an excellent balance of functionality, privacy, and cost-effectiveness.

Llama 3.2

Meta's Llama 3.2 models including Llama 3.2 1B, 3B, and variants. Licensed under Llama 3.2 Community License Agreement.

Phi Models

Microsoft's Phi family including Phi 3, Phi 2, and Phi 1.5 - efficient models optimized for performance.

Gemma 2

Google's Gemma 2B model, designed for efficient inference. Licensed under Gemma Terms of Use.

Mistral 7B

Mistral AI's 7B model and variants including Hermes-2-Pro-Mistral-7B, NeuralHermes-2.5-Mistral-7B, and OpenHermes-2.5-Mistral-7B.

Qwen2

Alibaba's Qwen2 models in 0.5B, 1.5B, and 7B sizes, providing multilingual capabilities.

DeepSeek-R1

DeepSeek-R1-Distill-Qwen-7B model for advanced reasoning capabilities. Licensed under MIT License.

Open Source Power

Free to use; we support the projects that power the extension.

Use Cases: When NoAIBills Fits (and When It Doesn't)

Understanding the right tool for the right job helps you maximize productivity while maintaining privacy.

Use NoAIBills When:

●
Privacy is Non-Negotiable: You are working with proprietary code, sensitive legal documents, or personal journals that you never want stored on a corporate server.
●
The "Email & Essay" Workflow: You need a quick grammar check, a tone adjustment, or a draft for an email. These are "lightweight" tasks that don't require a trillion-parameter cloud model.
●
Coding Assistance: You want a coding buddy that understands Python or React logic but doesn't need to "know" what happened in the news 5 minutes ago.
●
Offline Productivity: You're on a flight, a train with spotty Wi-Fi, or in a "dead zone." Since the models are cached, your productivity doesn't stop.
●
Subscription Fatigue: You're tired of paying monthly fees for AI tools. NoAIBills is free with unlimited use.

Stick to Cloud AI When:

●
Real-Time Data is Vital: You need today's stock prices, current weather, or the latest news headlines from this morning.
●
Ultra-Massive Reasoning: You are asking the AI to cross-reference 500-page PDFs or perform complex multi-modal analysis that requires high-end server-side GPUs.
●
High-End Image Generation: If you need photorealistic DALL-E or Midjourney-style images, that still requires massive cloud power.

Supported Models Reference

Publication dates and training data cutoffs for models available in NoAIBills.

Source	Model ID	Publication Date	Training Data Cutoff
WEBLLM_MODELS	Llama-3.2-1B-Instruct-q4f16_1-MLC	July 2024	December 2023
WEBLLM_MODELS	Llama-3.2-1B-Instruct-q0f16-MLC	July 2024	December 2023
WEBLLM_MODELS	Llama-3.2-3B-Instruct-q4f16_1-MLC	July 2024	December 2023
WEBLLM_MODELS	Llama-3.1-8B-Instruct-q4f16_1-MLC	July 2024	December 2023
WEBLLM_MODELS	Qwen3-0.6B-q4f16_1-MLC	September 2024	September 2024
WEBLLM_MODELS	Qwen3-0.6B-q0f16-MLC	September 2024	September 2024
WEBLLM_MODELS	Qwen3-1.7B-q4f16_1-MLC	September 2024	September 2024
WEBLLM_MODELS	Qwen3-4B-q4f16_1-MLC	September 2024	September 2024
WEBLLM_MODELS	Qwen3-8B-q4f16_1-MLC	September 2024	September 2024
WEBLLM_MODELS	SmolLM2-360M-Instruct-q0f16-MLC	November 2024	November 2024
WEBLLM_MODELS	SmolLM2-360M-Instruct-q4f16_1-MLC	November 2024	November 2024
WEBLLM_MODELS	SmolLM2-135M-Instruct-q0f16-MLC	November 2024	November 2024
WEBLLM_MODELS	SmolLM2-1.7B-Instruct-q4f16_1-MLC	November 2024	November 2024
WEBLLM_MODELS	Phi-3.5-mini-instruct-q4f16_1-MLC	August 2024	October 2023
WEBLLM_MODELS	Mistral-7B-Instruct-v0.3-q4f16_1-MLC	May 2024	Not publicly specified
WEBLLM_MODELS	gemma-2-2b-it-q4f16_1-MLC	June 2024	November 2024 (updated)
WEBLLM_MODELS	Hermes-3-Llama-3.2-3B-q4f16_1-MLC	October 2024	December 2023 (base model)
WEBLLM_MODELS	Hermes-3-Llama-3.1-8B-q4f16_1-MLC	August 2024	December 2023 (base model)
WEBLLM_MODELS	DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC	January 2025	Not publicly specified
WEBLLM_MODELS	DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC	January 2025	Not publicly specified
TRANSFORMERS_JS_MODELS	HuggingFaceTB/SmolLM2-360M-Instruct	November 2024	November 2024
TRANSFORMERS_JS_MODELS	HuggingFaceTB/SmolLM2-1.7B-Instruct	November 2024	November 2024
TRANSFORMERS_JS_MODELS	Xenova/Llama-3.2-1B-Instruct	July 2024	December 2023
TRANSFORMERS_JS_MODELS	Xenova/Llama-3.2-3B-Instruct	July 2024	December 2023
TRANSFORMERS_JS_MODELS	Qwen/Qwen2-0.5B-Instruct	June 2024	March 2024
TRANSFORMERS_JS_MODELS	Qwen/Qwen2-1.5B-Instruct	June 2024	March 2024
TRANSFORMERS_JS_MODELS	microsoft/Phi-3-mini-4k-instruct	April 2024	October 2023
CORE_MODELS	text	N/A	N/A
REASONING_MODEL_IDS	DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC	January 2025	Not publicly specified
REASONING_MODEL_IDS	DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC	January 2025	Not publicly specified

Notes

Publication dates are approximate based on official announcements and releases.
Training data cutoff dates vary; some models don't publicly specify exact cutoff dates.
"q4f16_1-MLC" and similar suffixes indicate quantization formats, not different model versions.
Qwen3 models appear to have more recent training data compared to other families.
DeepSeek-R1-Distill models are very recent (January 2025) but specific training cutoffs aren't publicly detailed.
Hermes-3 models are fine-tuned versions, so they inherit the base model's training cutoff.

General knowledge, not real-time

These models offer strong general knowledge up to their training cutoff, but they do not have access to live or real-time information. For drafting, learning code syntax, summarization, and quick edits, that’s the right fit—and these models deliver.

Get started for free

Free. Unlimited messages and chat history. All models included. Free in exchange for your email and anonymous usage statistics (not your conversation data) via GA4 Events.

Download extension

Beta Access

Completely Free - No Limits!

We believe private AI should be accessible to everyone. No subscriptions, no hidden fees, no cloud processing costs.

What's included?

Messages stored locally

All chat messages live in your browser (IndexedDB) and can be exported or deleted anytime. Storage is subject to your browser’s quota (typically hundreds of MB to several GB depending on disk space), so you can keep a large amount of chat history.

Chat history stored locally

Keep all your conversations in your browser; export anytime. No limits on chat history or threads within your storage quota.

No usage limits on models

Access to Llama (3, 2, Hermes-2-Pro), Phi (3, 2, 1.5), Gemma-2B, Mistral-7B variants, and Qwen2 (0.5B, 1.5B, 7B) model families

Powered by IndexedDB

Uses your browser’s built-in IndexedDB: downloaded model weights and chat history are stored there—your data never leaves your device.

No remote API calls

All open-source models run completely offline

Free in exchange for email & usage stats

The tool remains free as long as you share your email address and usage statistics (not your conversation data) via GA4 Events, plus feedback when prompted

Product Roadmap

Create / use from Prompt Gallery

Access a library of pre-built prompts and create your own

Access to prompt gallery and prompt editor

Browse and edit prompts with an intuitive interface

Create system prompts

Customize system prompts for personalized AI behavior

Access to browser tabs content

Integrate content from your browser tabs into conversations

Access to new models

Get access to new models as they become available on WebLLM and HuggingFace

Structured JSON generation

Generate structured JSON outputs with schema validation and JSON mode support

Ollama Integration

Connect to local Ollama instances to run larger models on your machine

Download extension

No payment required. Free in exchange for anonymous usage statistics — see FAQ.

Frequently Asked Questions

Everything you need to know about NoAIBills. Can't find the answer you're looking for? Reach out to us .