Own Your AI
Run AI models locally. No data leaves your machine.
Why Local AI Matters
Every time you use ChatGPT, Claude, or Gemini, your prompts — your questions, your writing, your code, your private thoughts — travel to a company's server. OpenAI, Anthropic, and Google process your data on their infrastructure. Their privacy policies say various reassuring things. But the point of building a tech fence is to not have to trust promises. The point is to not send the data in the first place.
Running AI locally means your conversations never leave your machine. No server logs, no training data extraction, no corporate access. The AI runs on your hardware, processes your prompt, and the data stays with you.
PewDiePie took this to an extreme. He built a 10-GPU rig running vLLM and created a custom YouTube extension powered by his local AI setup. That's overkill for most people. But the beauty of local AI in 2025-2026 is that you don't need a rig like that. A single decent GPU and a tool called Ollama gets you surprisingly far.
What You Need
Local AI runs on your GPU (graphics card). The more GPU memory (VRAM) you have, the larger and smarter the models you can run. Here's a rough guide:
- 8 GB VRAM (RTX 3060/4060): Can run 7B parameter models well. Think basic assistant tasks, summarization, simple coding help.
- 12 GB VRAM (RTX 3060 12GB/4070): Can run 13B models. Noticeably smarter, better at nuanced tasks.
- 16-24 GB VRAM (RTX 4070 Ti/4090): Can run 30B+ models with quantization. Approaching cloud AI quality for many tasks.
- No dedicated GPU: You can still run small models (3B-7B) on CPU, but it will be slow. Apple Silicon Macs (M1/M2/M3) actually handle this well thanks to unified memory.
Getting Started
Install Ollama
Ollama is the easiest way to run LLMs locally. It handles downloading models, managing them, and running them with a simple command-line interface. Think of it as the "Docker for AI models."
Go to ollama.com and download the installer for your OS. On Linux, you can install it with one command:
curl -fsSL https://ollama.com/install.sh | sh
That's it. Ollama runs as a service in the background and provides a local API that other tools can connect to.
Download and Run Your First Model
Open a terminal and run:
ollama run llama3.1— Meta's Llama 3.1 8B. Great all-rounder. Good at conversation, summarization, and general tasks.
The first time you run a model, Ollama downloads it (a few GB). After that, it starts instantly. You're now chatting with an AI that runs entirely on your hardware. Nothing leaves your machine.
Other models to try:
- mistral — Mistral 7B. Fast, good at structured tasks and coding.
- phi3 — Microsoft's Phi-3. Small but surprisingly capable. Great for limited hardware.
- codellama — Specialized for code generation and understanding.
- llama3.1:70b — The 70B parameter version of Llama. Much smarter, but needs ~40GB VRAM or heavy quantization.
- deepseek-coder-v2 — Excellent for programming tasks.
- gemma2 — Google's open model. Good general performance.
Browse all available models at ollama.com/library. Start with the default (usually 7B-8B parameter) versions and move up if your hardware can handle it.
Set Up Open WebUI
Chatting in the terminal works, but a web interface is much nicer. Open WebUI (formerly Ollama WebUI) gives you a ChatGPT-like interface for your local models. It supports conversation history, multiple chats, model switching, system prompts, and file uploads.
Install it with Docker (assuming you have Docker from the Self-Hosting guide):
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Open http://localhost:3000 in your browser, create an account (local only — no data leaves your machine), and you're in. Select your downloaded model from the dropdown and start chatting.
Open WebUI can also connect to OpenAI's API if you occasionally want to use cloud models for complex tasks — but that's optional and defeats the privacy purpose.
Practical Use Cases
Local AI isn't just a novelty. Here are things you can actually do with it:
- Private brainstorming: Bounce ideas off the AI without worrying about data leaks. Business plans, journal entries, personal questions — all stay on your machine.
- Code assistance: Use Codellama or DeepSeek Coder as a local Copilot alternative. Many code editors (VS Code, Neovim) have plugins that connect to Ollama.
- Document summarization: Feed in PDFs, articles, or notes and get summaries. Open WebUI supports file uploads for this.
- Email drafting: Write emails without your drafts passing through a third-party server.
- Learning and research: Ask questions about topics you're studying. The AI's knowledge is baked into the model weights — no internet needed.
PewDiePie's Setup (Advanced)
For context on the extreme end: PewDiePie runs vLLM on a 10-GPU rig. vLLM is a high-performance inference engine that's much faster than Ollama for serving large models. He built a custom YouTube browser extension that uses his local AI to summarize videos, answer questions about video content, and help with research.
This is a serious setup — we're talking about thousands of dollars in GPUs and significant technical knowledge. It's impressive, but it's not what most people need. The point is that the same concept applies at any scale: whether you're running a 7B model on a laptop or a 70B model on a GPU cluster, your data stays local.
Beyond Chatbots
Local AI isn't just about text chat. Here are other things you can run locally:
- Stable Diffusion (image generation): Run ComfyUI or Automatic1111 locally to generate images without sending your prompts to Midjourney or DALL-E. Needs 8GB+ VRAM.
- Whisper (speech-to-text): OpenAI's Whisper model runs locally and transcribes audio with remarkable accuracy. Great for transcribing meetings, lectures, or your own recordings.
- Coqui TTS (text-to-speech): Generate natural-sounding speech from text, locally.
- LocalAI: A drop-in replacement for OpenAI's API that runs everything locally. Any tool that integrates with the OpenAI API can be pointed at LocalAI instead.
Honest Downsides
- You need decent hardware. Running AI locally requires a GPU with enough VRAM. If your computer has no dedicated GPU or an older card with 4GB VRAM, the experience will be painfully slow. Apple Silicon Macs are an exception — they handle local AI well thanks to unified memory architecture.
- Models are large. A 7B model is about 4 GB. A 13B model is about 7 GB. A 70B model is 40+ GB. If you want to try several models, you'll need significant disk space. Ollama stores models in its cache, so they add up.
- Local models aren't as good as GPT-4 or Claude for complex tasks. Open-source models have improved dramatically, but for tasks that require deep reasoning, nuanced understanding, or very long contexts, cloud models still have an edge. A local 7B model is roughly comparable to GPT-3.5 in capability. A local 70B model approaches GPT-4 for many tasks but still falls short on the hardest ones.
- No internet access. Local models can't browse the web, check current events, or access real-time information. Their knowledge is frozen at whatever date the training data cut off. This is a feature for privacy, but a limitation for research.
- Power consumption. Running a GPU at full load while chatting with an AI uses significantly more power than typing a query into ChatGPT. A high-end GPU can draw 300-450 watts. PewDiePie's 10-GPU rig probably uses as much power as a small apartment.
- It's evolving fast. The local AI landscape changes monthly. Models get better, tools get replaced, new approaches emerge. What's the best setup today might not be in six months. That's exciting but also means more maintenance and learning.