/ LVL 4 Advanced

8 — Own Your AI

Run AI models locally. No data leaves your machine.

Mark as completed

Why Local AI Matters

Every time you use ChatGPT, Claude, or Gemini, your prompts — your questions, your writing, your code, your private thoughts — travel to a company's server. OpenAI, Anthropic, and Google process your data on their infrastructure. Their privacy policies say various reassuring things. But the point of building a tech fence is to not have to trust promises. The point is to not send the data in the first place.

Running AI locally means your conversations never leave your machine. No server logs, no training data extraction, no corporate access. The AI runs on your hardware, processes your prompt, and the data stays with you.

PewDiePie took this to an extreme. He built a 10-GPU rig running vLLM and created a custom YouTube extension powered by his local AI setup. That's overkill for most people. But the beauty of local AI in 2025-2026 is that you don't need a rig like that. A single decent GPU and a tool called Ollama gets you surprisingly far.

What You Need

Local AI runs on your GPU (graphics card). The more GPU memory (VRAM) you have, the larger and smarter the models you can run. Here's a rough guide:

8 GB VRAM (RTX 3060/4060): Can run 7B parameter models well. Think basic assistant tasks, summarization, simple coding help.
12 GB VRAM (RTX 3060 12GB/4070): Can run 13B models. Noticeably smarter, better at nuanced tasks.
16-24 GB VRAM (RTX 4070 Ti/4090): Can run 30B+ models with quantization. Approaching cloud AI quality for many tasks.
No dedicated GPU: You can still run small models (3B-7B) on CPU, but it will be slow. Apple Silicon Macs (M1/M2/M3) actually handle this well thanks to unified memory.

Getting Started

Step 1

Install Ollama

Ollama is the easiest way to run LLMs locally. It handles downloading models, managing them, and running them with a simple command-line interface. Think of it as the "Docker for AI models."

Go to ollama.com and download the installer for your OS. On Linux, you can install it with one command:

curl -fsSL https://ollama.com/install.sh | sh

That's it. Ollama runs as a service in the background and provides a local API that other tools can connect to.

Mark step as done

Step 2

Download and Run Your First Model

Open a terminal and run:

ollama run llama3.1 — Meta's Llama 3.1 8B. Great all-rounder. Good at conversation, summarization, and general tasks.

The first time you run a model, Ollama downloads it (a few GB). After that, it starts instantly. You're now chatting with an AI that runs entirely on your hardware. Nothing leaves your machine.

Other models to try:

mistral — Mistral 7B. Fast, good at structured tasks and coding.
phi3 — Microslop 's Phi-3. Small but surprisingly capable. Great for limited hardware.
codellama — Specialized for code generation and understanding.
llama3.1:70b — The 70B parameter version of Llama. Much smarter, but needs ~40GB VRAM or heavy quantization.
deepseek-coder-v2 — Excellent for programming tasks.
gemma2 — Google's open model. Good general performance.

Browse all available models at ollama.com/library. Start with the default (usually 7B-8B parameter) versions and move up if your hardware can handle it.

Mark step as done

Step 3

Set Up Open WebUI

Chatting in the terminal works, but a web interface is much nicer. Open WebUI (formerly Ollama WebUI) gives you a ChatGPT-like interface for your local models. It supports conversation history, multiple chats, model switching, system prompts, and file uploads.

Install it with Docker (assuming you have Docker from the Self-Hosting guide):

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Open http://localhost:3000 in your browser, create an account (local only — no data leaves your machine), and you're in. Select your downloaded model from the dropdown and start chatting.

Open WebUI can also connect to OpenAI's API if you occasionally want to use cloud models for complex tasks — but that's optional and defeats the privacy purpose.

Mark step as done

Step 4

Practical Use Cases

Local AI isn't just a novelty. Here are things you can actually do with it:

Private brainstorming: Bounce ideas off the AI without worrying about data leaks. Business plans, journal entries, personal questions — all stay on your machine.
Code assistance: Use Codellama or DeepSeek Coder as a local Copilot alternative. Many code editors (VS Code, Neovim) have plugins that connect to Ollama.
Document summarization: Feed in PDFs, articles, or notes and get summaries. Open WebUI supports file uploads for this.
Email drafting: Write emails without your drafts passing through a third-party server.
Learning and research: Ask questions about topics you're studying. The AI's knowledge is baked into the model weights — no internet needed.

Mark step as done

PewDiePie's Setup (Advanced)

For context on the extreme end: PewDiePie runs vLLM on a 10-GPU rig. vLLM is a high-performance inference engine that's much faster than Ollama for serving large models. He built a custom YouTube browser extension that uses his local AI to summarize videos, answer questions about video content, and help with research.

This is a serious setup — we're talking about thousands of dollars in GPUs and significant technical knowledge. It's impressive, but it's not what most people need. The point is that the same concept applies at any scale: whether you're running a 7B model on a laptop or a 70B model on a GPU cluster, your data stays local.

Beyond Chatbots

Local AI isn't just about text chat. Here are other things you can run locally:

Stable Diffusion (image generation): Run ComfyUI or Automatic1111 locally to generate images without sending your prompts to Midjourney or DALL-E. Needs 8GB+ VRAM.
Whisper (speech-to-text): OpenAI's Whisper model runs locally and transcribes audio with remarkable accuracy. Great for transcribing meetings, lectures, or your own recordings.
Coqui TTS (text-to-speech): Generate natural-sounding speech from text, locally.
LocalAI: A drop-in replacement for OpenAI's API that runs everything locally. Any tool that integrates with the OpenAI API can be pointed at LocalAI instead.

Honest Downsides

You need decent hardware

Running AI locally requires a GPU with enough VRAM. No dedicated GPU or older 4GB card = painfully slow. Apple Silicon Macs are an exception thanks to unified memory.

Models are large

A 7B model is ~4 GB, 13B is ~7 GB, 70B is 40+ GB. Trying several models requires significant disk space. They add up in Ollama's cache.

Local models aren't as good as GPT-4 or Claude for complex tasks

Open-source models have improved dramatically, but for deep reasoning and very long contexts, cloud models still have an edge. A local 7B ≈ GPT-3.5, a local 70B approaches GPT-4 for many tasks.

No internet access

Local models can't browse the web or check current events. Knowledge is frozen at the training data cutoff. A feature for privacy, but a limitation for research.

Power consumption

Running a GPU at full load uses significantly more power than a cloud query. A high-end GPU draws 300-450 watts. PewDiePie's 10-GPU rig probably uses as much power as a small apartment.

It's evolving fast

The local AI landscape changes monthly. Models get better, tools get replaced, new approaches emerge. Exciting but means more maintenance and learning.

Want to go deeper? Privacy Guides has comprehensive, crowd-sourced reviews on these topics:

AI Chat Tools

← Previous 7 — Control Your Phone Next → 9 — Rice Your Desktop