Run AI Models Locally in Your Browser – No Subscriptions, 100% Private.
A Chrome sidepanel extension that runs AI models entirely in your browser using WebGPU. No external software, no cloud—just install and chat with Llama 3.2, DeepSeek-R1, and Mistral.
Completely free, you get:
- True privacy—everything runs locally in your browser
- Completely free—use in exchange for your email address and usage statistics (not your data)
- Messages and chat history stored locally (IndexedDB); export or delete anytime
- Your conversations stay private—we only collect usage statistics (not data)
- Pre-optimized models that work immediately
- Ongoing updates and support
The tool remains free as long as you share your email address and usage statistics (not your conversation data) via GA4 Events, plus provide feedback when prompted.
So what—yet another chatbot. What's the big deal?
Runs entirely in your browser—just add the Chrome extension. No cloud, no desktop app.
Often need access to your data
Can't install when you lack admin rights
Runs entirely in the browser—just add the Chrome extension
The "Standalone" Advantage
Unlike extensions that control Ollama or other apps, NoAIBills is self-contained. WebGPU and Wasm run everything in the browser—no terminal, no background processes.
Tech Stack
MLC LLM
Machine Learning Compilation for Large Language Models - the core engine that enables running LLMs directly in your browser using WebGPU. Licensed under Apache License 2.0.
Transformer.js
State-of-the-art Machine Learning for the web - run transformer models directly in your browser with no need for a server. Functionally equivalent to Hugging Face's transformers Python library, using ONNX Runtime for browser-based inference. Licensed under Apache License 2.0.
Browser AI
TypeScript library for using in-browser AI models with the Vercel AI SDK, providing seamless fallback to server-side models. Supports Chrome/Edge built-in AI, WebLLM, and Transformers.js models. Licensed under MIT License.
Prompt API
W3C proposal for a web API that enables JavaScript code to prompt browser-provided language models directly. Provides a uniform interface for accessing on-device AI capabilities with support for multimodal inputs, tool use, and structured outputs. Published under W3C Community Contributor License Agreement.
React
The user interface is built with React, providing a responsive and interactive experience. Licensed under MIT License.
Dexie.js
Minimalistic wrapper around IndexedDB for storing and querying data in the browser. Used to persist chat history and model weights locally. Licensed under Apache License 2.0.
AI Elements
Component library built on top of shadcn/ui for building AI-native applications faster. Provides pre-built, customizable React components for conversations, messages, code blocks, and reasoning displays. Licensed under MIT License.
Driver.js
Powerful and lightweight JavaScript library for creating interactive, feature-rich guided tours and tooltips. Helps users navigate and understand features through step-by-step tutorials with smooth animations and customizable styling. Licensed under MIT License.
ShadCN UI
Beautiful, accessible component library that powers the extension's interface. Licensed under MIT License.
Lucide Icons
Icon library used throughout the extension for a consistent visual language. Licensed under ISC License.
Supported AI Models
The AI models featured in this extension are all small, open-source models. While they may not have the same level of intelligence or sophistication as paid premium enterprise models, these models are more than sufficient for daily low-intensity use cases. Examples include fixing grammar, providing summaries of text, aiding in draft email messages, quick translations, and basic content generation.
When compared to enterprise-level AI solutions, these models have some limitations: they may struggle with highly complex reasoning tasks, lack the depth of knowledge in specialized domains, may produce less nuanced responses, and might require more iterations to achieve desired results. However, for most everyday tasks, they provide an excellent balance of functionality, privacy, and cost-effectiveness.
Llama 3.2
Meta's Llama 3.2 models including Llama 3.2 1B, 3B, and variants. Licensed under Llama 3.2 Community License Agreement.
Phi Models
Microsoft's Phi family including Phi 3, Phi 2, and Phi 1.5 - efficient models optimized for performance.
Gemma 2
Google's Gemma 2B model, designed for efficient inference. Licensed under Gemma Terms of Use.
Mistral 7B
Mistral AI's 7B model and variants including Hermes-2-Pro-Mistral-7B, NeuralHermes-2.5-Mistral-7B, and OpenHermes-2.5-Mistral-7B.
Qwen2
Alibaba's Qwen2 models in 0.5B, 1.5B, and 7B sizes, providing multilingual capabilities.
DeepSeek-R1
DeepSeek-R1-Distill-Qwen-7B model for advanced reasoning capabilities. Licensed under MIT License.
Open Source Power
Free to use; we support the projects that power the extension.
Use Cases: When NoAIBills Fits (and When It Doesn't)
Understanding the right tool for the right job helps you maximize productivity while maintaining privacy.
Use NoAIBills When:
- ● Privacy is Non-Negotiable: You are working with proprietary code, sensitive legal documents, or personal journals that you never want stored on a corporate server.
- ● The "Email & Essay" Workflow: You need a quick grammar check, a tone adjustment, or a draft for an email. These are "lightweight" tasks that don't require a trillion-parameter cloud model.
- ● Coding Assistance: You want a coding buddy that understands Python or React logic but doesn't need to "know" what happened in the news 5 minutes ago.
- ● Offline Productivity: You're on a flight, a train with spotty Wi-Fi, or in a "dead zone." Since the models are cached, your productivity doesn't stop.
- ● Subscription Fatigue: You're tired of paying monthly fees for AI tools. NoAIBills is free with unlimited use.
Stick to Cloud AI When:
- ● Real-Time Data is Vital: You need today's stock prices, current weather, or the latest news headlines from this morning.
- ● Ultra-Massive Reasoning: You are asking the AI to cross-reference 500-page PDFs or perform complex multi-modal analysis that requires high-end server-side GPUs.
- ● High-End Image Generation: If you need photorealistic DALL-E or Midjourney-style images, that still requires massive cloud power.
Supported Models Reference
Publication dates and training data cutoffs for models available in NoAIBills.
| Source | Model ID | Publication Date | Training Data Cutoff |
|---|---|---|---|
| WEBLLM_MODELS | Llama-3.2-1B-Instruct-q4f16_1-MLC | July 2024 | December 2023 |
| WEBLLM_MODELS | Llama-3.2-1B-Instruct-q0f16-MLC | July 2024 | December 2023 |
| WEBLLM_MODELS | Llama-3.2-3B-Instruct-q4f16_1-MLC | July 2024 | December 2023 |
| WEBLLM_MODELS | Llama-3.1-8B-Instruct-q4f16_1-MLC | July 2024 | December 2023 |
| WEBLLM_MODELS | Qwen3-0.6B-q4f16_1-MLC | September 2024 | September 2024 |
| WEBLLM_MODELS | Qwen3-0.6B-q0f16-MLC | September 2024 | September 2024 |
| WEBLLM_MODELS | Qwen3-1.7B-q4f16_1-MLC | September 2024 | September 2024 |
| WEBLLM_MODELS | Qwen3-4B-q4f16_1-MLC | September 2024 | September 2024 |
| WEBLLM_MODELS | Qwen3-8B-q4f16_1-MLC | September 2024 | September 2024 |
| WEBLLM_MODELS | SmolLM2-360M-Instruct-q0f16-MLC | November 2024 | November 2024 |
| WEBLLM_MODELS | SmolLM2-360M-Instruct-q4f16_1-MLC | November 2024 | November 2024 |
| WEBLLM_MODELS | SmolLM2-135M-Instruct-q0f16-MLC | November 2024 | November 2024 |
| WEBLLM_MODELS | SmolLM2-1.7B-Instruct-q4f16_1-MLC | November 2024 | November 2024 |
| WEBLLM_MODELS | Phi-3.5-mini-instruct-q4f16_1-MLC | August 2024 | October 2023 |
| WEBLLM_MODELS | Mistral-7B-Instruct-v0.3-q4f16_1-MLC | May 2024 | Not publicly specified |
| WEBLLM_MODELS | gemma-2-2b-it-q4f16_1-MLC | June 2024 | November 2024 (updated) |
| WEBLLM_MODELS | Hermes-3-Llama-3.2-3B-q4f16_1-MLC | October 2024 | December 2023 (base model) |
| WEBLLM_MODELS | Hermes-3-Llama-3.1-8B-q4f16_1-MLC | August 2024 | December 2023 (base model) |
| WEBLLM_MODELS | DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC | January 2025 | Not publicly specified |
| WEBLLM_MODELS | DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC | January 2025 | Not publicly specified |
| TRANSFORMERS_JS_MODELS | HuggingFaceTB/SmolLM2-360M-Instruct | November 2024 | November 2024 |
| TRANSFORMERS_JS_MODELS | HuggingFaceTB/SmolLM2-1.7B-Instruct | November 2024 | November 2024 |
| TRANSFORMERS_JS_MODELS | Xenova/Llama-3.2-1B-Instruct | July 2024 | December 2023 |
| TRANSFORMERS_JS_MODELS | Xenova/Llama-3.2-3B-Instruct | July 2024 | December 2023 |
| TRANSFORMERS_JS_MODELS | Qwen/Qwen2-0.5B-Instruct | June 2024 | March 2024 |
| TRANSFORMERS_JS_MODELS | Qwen/Qwen2-1.5B-Instruct | June 2024 | March 2024 |
| TRANSFORMERS_JS_MODELS | microsoft/Phi-3-mini-4k-instruct | April 2024 | October 2023 |
| CORE_MODELS | text | N/A | N/A |
| REASONING_MODEL_IDS | DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC | January 2025 | Not publicly specified |
| REASONING_MODEL_IDS | DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC | January 2025 | Not publicly specified |
Notes
- Publication dates are approximate based on official announcements and releases.
- Training data cutoff dates vary; some models don't publicly specify exact cutoff dates.
- "q4f16_1-MLC" and similar suffixes indicate quantization formats, not different model versions.
- Qwen3 models appear to have more recent training data compared to other families.
- DeepSeek-R1-Distill models are very recent (January 2025) but specific training cutoffs aren't publicly detailed.
- Hermes-3 models are fine-tuned versions, so they inherit the base model's training cutoff.
General knowledge, not real-time
These models offer strong general knowledge up to their training cutoff, but they do not have access to live or real-time information. For drafting, learning code syntax, summarization, and quick edits, that’s the right fit—and these models deliver.
Completely Free - No Limits!
We believe private AI should be accessible to everyone. No subscriptions, no hidden fees, no cloud processing costs.
What's included?
Messages stored locally
All chat messages live in your browser (IndexedDB) and can be exported or deleted anytime. Storage is subject to your browser’s quota (typically hundreds of MB to several GB depending on disk space), so you can keep a large amount of chat history.
Chat history stored locally
Keep all your conversations in your browser; export anytime. No limits on chat history or threads within your storage quota.
No usage limits on models
Access to Llama (3, 2, Hermes-2-Pro), Phi (3, 2, 1.5), Gemma-2B, Mistral-7B variants, and Qwen2 (0.5B, 1.5B, 7B) model families
Powered by IndexedDB
Uses your browser’s built-in IndexedDB: downloaded model weights and chat history are stored there—your data never leaves your device.
No remote API calls
All open-source models run completely offline
Free in exchange for email & usage stats
The tool remains free as long as you share your email address and usage statistics (not your conversation data) via GA4 Events, plus feedback when prompted
Product Roadmap
Create / use from Prompt Gallery
Access a library of pre-built prompts and create your own
Access to prompt gallery and prompt editor
Browse and edit prompts with an intuitive interface
Create system prompts
Customize system prompts for personalized AI behavior
Access to browser tabs content
Integrate content from your browser tabs into conversations
Access to new models
Get access to new models as they become available on WebLLM and HuggingFace
Structured JSON generation
Generate structured JSON outputs with schema validation and JSON mode support
Ollama Integration
Connect to local Ollama instances to run larger models on your machine
No payment required. Free in exchange for anonymous usage statistics — see FAQ.
Frequently Asked Questions
Everything you need to know about NoAIBills. Can't find the answer you're looking for? Reach out to us .