How to Run Local LLMs with Ollama in 10 Minutes

April 10, 2026 2 min read 10 min

Running a large language model locally gives you complete privacy, zero API costs, and offline access. With Ollama, it takes about 10 minutes from zero to a working local AI assistant.

Step 1: Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from ollama.com and run it.

Verify the install:

ollama --version
# ollama version 0.3.14

Step 2: Pull a Model

# Llama 4 Scout (requires ~8GB disk, runs on 8GB RAM)
ollama pull llama4:scout

# Mistral 7B (smaller, faster — great for testing)
ollama pull mistral

# Gemma 3 2B (very small, runs on any machine)
ollama pull gemma3:2b

Step 3: Run Your First Chat

ollama run llama4:scout

You’ll see a >>> prompt. Type a message and press Enter:

>>> Explain quantum entanglement in simple terms

Step 4: Use the REST API

Ollama runs a local server on port 11434. You can use it like any API:

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama4:scout",
    "prompt": "Write a haiku about machine learning",
    "stream": False
})
print(response.json()["response"])

Step 5: OpenAI-Compatible Endpoint

Ollama also exposes an OpenAI-compatible endpoint, so you can drop it into any app that supports OpenAI:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="llama4:scout",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Hardware Tips

Model	RAM Required	GPU Needed?
Gemma 3 2B	4GB	No
Mistral 7B	8GB	No
Llama 4 Scout (4-bit)	12GB	Optional
Llama 4 Maverick (4-bit)	48GB	Recommended

Without a GPU, models run on CPU — expect 5–15 tokens/second for 7B models, which is perfectly usable.