Local private LLM: Difference between revisions

Latest revision as of 16:40, 1 June 2025

Motivation

Large language models (LLMs) are very good at analyzing (large) texts
There are multiple free-to-test providers in the internet
Sensitive private or confidential data (like diary/journal, medical records, etc.) should NOT be shared with any external service

Solution

Run your own LLM

Installation

I use the nice open-source tool Ollama to manage and host the LLM.

Running

Start in terminal/command window by passing a model (the llama3.2 was not working well for me, see below)

ollama run llama3.2:latest

A list of supported LLM models can be found here.

For me the qwen3 model worked best, so far.

Besides the model, you need to select a model size, that fits to your hardware.

For me:

MacBook Pro M1 16GB
-> models up to 8b work, but with patience 
Windows PC with gaming graphics card (GPU) GeForce RTX 3060 12GB
-> models up to 14b run very smooth

To check if the model fits into you GPU memory, first run

ollama run <model>

than open a second terminal window and run

ollama ps

to see where the model is stored.

Usage

chat commands

/clear : clear current session
/bye   : exit

command line commands

show currently running model(s) and where it is stored (RAM or GPU RAM)

ollama ps

stop model (automatically done after 5min inactivity)

ollama stop llama3.2:latest

delete model

ollama rm llama3.2:latest

Remote access

Set env variable

OLLAMA_HOST=192.168.0.123:11434

Local storage of the models

where-are-models-stored

macOS: ~/.ollama/models 
Linux: /usr/share/ollama/.ollama/models
Windows: C:\Users\%username%\.ollama\models

change via env var OLLAMA_MODELS (ollama service restart needed)