<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://entorb.net//wiki/index.php?action=history&amp;feed=atom&amp;title=Local_private_LLM</id>
	<title>Local private LLM - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://entorb.net//wiki/index.php?action=history&amp;feed=atom&amp;title=Local_private_LLM"/>
	<link rel="alternate" type="text/html" href="https://entorb.net//wiki/index.php?title=Local_private_LLM&amp;action=history"/>
	<updated>2026-05-06T10:24:37Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>https://entorb.net//wiki/index.php?title=Local_private_LLM&amp;diff=5254&amp;oldid=prev</id>
		<title>Torben: /* Installation */</title>
		<link rel="alternate" type="text/html" href="https://entorb.net//wiki/index.php?title=Local_private_LLM&amp;diff=5254&amp;oldid=prev"/>
		<updated>2025-06-01T14:40:41Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Installation&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Motivation ==&lt;br /&gt;
* Large language models (LLMs) are very good at analyzing (large) texts&lt;br /&gt;
* There are multiple free-to-test providers in the internet&lt;br /&gt;
* Sensitive private or confidential data (like diary/journal, medical records, etc.) should &amp;#039;&amp;#039;&amp;#039;NOT&amp;#039;&amp;#039;&amp;#039; be shared with any external service&lt;br /&gt;
&lt;br /&gt;
== Solution ==&lt;br /&gt;
Run your own LLM&lt;br /&gt;
&lt;br /&gt;
== Installation ==&lt;br /&gt;
I use the nice open-source tool [https://ollama.com Ollama] to manage and host the LLM.&lt;br /&gt;
&lt;br /&gt;
== Running ==&lt;br /&gt;
Start in terminal/command window by passing a model (the llama3.2 was not working well for me, see below)&lt;br /&gt;
 ollama run llama3.2:latest&lt;br /&gt;
&lt;br /&gt;
A list of supported LLM models can be found [https://ollama.com/search here].&lt;br /&gt;
&lt;br /&gt;
For me the [https://ollama.com/library/qwen3 qwen3] model worked best, so far.&lt;br /&gt;
&lt;br /&gt;
Besides the model, you need to select a model size, that fits to your hardware.&lt;br /&gt;
&lt;br /&gt;
For me:&lt;br /&gt;
 MacBook Pro M1 16GB&lt;br /&gt;
 -&amp;gt; models up to 8b work, but with patience &lt;br /&gt;
 Windows PC with gaming graphics card (GPU) GeForce RTX 3060 12GB&lt;br /&gt;
 -&amp;gt; models up to 14b run very smooth&lt;br /&gt;
&lt;br /&gt;
To check if the model fits into you GPU memory, first run&lt;br /&gt;
 ollama run &amp;lt;model&amp;gt;&lt;br /&gt;
than open a second terminal window and run&lt;br /&gt;
 ollama ps&lt;br /&gt;
to see where the model is stored.&lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
=== chat commands === &lt;br /&gt;
 /clear : clear current session&lt;br /&gt;
 /bye   : exit&lt;br /&gt;
&lt;br /&gt;
=== command line commands === &lt;br /&gt;
show currently running model(s) and where it is stored (RAM or GPU RAM)&lt;br /&gt;
 ollama ps&lt;br /&gt;
&lt;br /&gt;
stop model (automatically done after 5min inactivity)&lt;br /&gt;
 ollama stop llama3.2:latest&lt;br /&gt;
&lt;br /&gt;
delete model&lt;br /&gt;
 ollama rm llama3.2:latest&lt;br /&gt;
&lt;br /&gt;
===Remote access===&lt;br /&gt;
Set env variable&lt;br /&gt;
 OLLAMA_HOST=192.168.0.123:11434&lt;br /&gt;
&lt;br /&gt;
===Local storage of the models ===&lt;br /&gt;
[https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored where-are-models-stored]&lt;br /&gt;
 macOS: ~/.ollama/models &lt;br /&gt;
 Linux: /usr/share/ollama/.ollama/models&lt;br /&gt;
 Windows: C:\Users\%username%\.ollama\models&lt;br /&gt;
change via env var OLLAMA_MODELS (ollama service restart needed)&lt;br /&gt;
&lt;br /&gt;
=== Links ===&lt;br /&gt;
* [https://ollama.com/library/deepseek-r1 Deepseek Models]&lt;br /&gt;
* [https://github.com/ollama/ollama/blob/main/docs/faq.md FAQ]&lt;/div&gt;</summary>
		<author><name>Torben</name></author>
	</entry>
</feed>