ai-ml

๐Ÿš€ How to Install Ollama Locally and Run LLMs on Your PC (with UI Setup)

Learn how to install Ollama locally, pull and run models like LLaMA, Mistral, and Gemma, and connect to a web UI to visually chat with your local LLM, boosting your workflow and privacy.

SuperML.dev
Share this article

Share:

Run your own LLMs like Mistral, LLaMA, or Gemma on your laptop with Ollama, and see them working in your browser.

Why run LLMs locally?

  • Privacy and data sovereignty.
  • Experiment with prompt engineering without API costs.
  • Run fine-tuning experiments for your use case.
  • Reduce reliance on cloud LLM APIs for fast workflows.

Step 1: System Requirements

  • OS: macOS, Windows, or Linux
  • RAM: At least 8 GB (16 GB recommended for larger models)
  • Disk: 10โ€“30 GB free for models
  • GPU: Optional, CPU-only works.

Step 2: Install Ollama

On macOS:

brew install ollama
ollama serve

On Windows:

ollama serve

On Linux:

curl -fsSL https://ollama.com/install.sh | sh
ollama serve

Step 3: Pull and Run a Model

Example to pull llama3:

ollama pull llama3

Then run:

ollama run llama3

You can replace llama3 with mistral, gemma, or any supported model.


Step 4: Accessing via API or Terminal

  • Terminal: Continue interacting in the terminal for chats.
  • API: Ollama provides an OpenAI-compatible API on:
http://localhost:11434

Use your favorite LLM client or test using:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Write a short poem about Ollama"
}'

Step 5: Setting up a Web UI for Ollama

You can view and chat with your local LLM in your browser using lightweight frontends like:

git clone https://github.com/open-webui/open-webui.git
cd open-webui
docker-compose up -d

2๏ธโƒฃ LM Studio (Optional alternative)

Download LM Studio, connect it to your local Ollama instance, and chat visually.


Step 6: Fine-Tuning Your Model (Optional)

Ollama currently allows model configuration but not full parameter fine-tuning directly. For fine-tuning:

  • Use lora or qlora methods externally.
  • Merge the fine-tuned weights into your local models using tools like ggml or gguf formats and load them into Ollama.

Alternatives to OLLAMA

If you want to run local LLMs or test other frameworks:

  1. LM Studio โ€“ GUI for local LLMs using GGUF, supports Mistral, LLaMA, Gemma.
  2. GPT4All โ€“ Local LLM runner with downloadable models.
  3. llm (by Simon Willison) โ€“ CLI-based local LLM runner.
  4. LocalAI โ€“ API-compatible local LLM runner, compatible with OpenAI endpoints.
  5. lmql โ€“ Run quantized models locally with constraints-based prompting.
  6. WebLLM โ€“ Run LLMs directly in your browser with WebGPU.

Summary

โœ… Installed Ollama locally.
โœ… Pulled and ran LLaMA, Mistral, or Gemma models.
โœ… Set up a Web UI to interact visually.
โœ… Explored other alternatives to run LLMs locally.


๐ŸŽฏ Next Steps

  • Try prompt engineering with your local Ollama setup.
  • Experiment with lightweight quantized models for faster inference.
  • Connect your local LLM to your personal projects and VS Code workflows.

If you need help with integrating Ollama into your workflow or creating agent workflows, let us know!


This structured, clear step-by-step blog will help your audience on crazyaiml immediately run Ollama locally while providing broader context for local LLM exploration.

Back to Blog