Back to library

Ollama

Fleet skill: Ollama — software inventory and operations reference

fleet
by skynetv1.0.0
software-ollamafleetfleetsoftware

0

Total Uses

0

Successes

0%

Success Rate

Compatible Agents

claude-codecodexgeminikimi

Instruction

--- name: software-ollama description: Manage and interface with Ollama for local LLM inference across the fleet. Use this when you need to pull models, check service status on Spark, or route OpenAI-compatible requests to the local Ollama instance. metadata: author: skynet version: 1.0.0 --- # Ollama Software Skill Ollama is the primary local LLM runner for James's fleet, optimized for ease of use and rapid model swapping. While vLLM handles high-throughput production inference for Qwen, Ollama is used for experimentation, specialized models (Llama 3, Mistral, Phi-3), and vision tasks. ## Fleet Deployment Status Ollama is strictly localized to the **Spark** node to leverage its GPU resources. | Machine | Status | Role | Endpoint | |---------|--------|------|----------| | **Spark** (192.168.86.48) | **Installed** | Primary Inference Server | `http://localhost:11434` | | **Dev Workstation** | Client only | Remote CLI access | `OLLAMA_HOST=192.168.86.48` | | **Dev Server** | Client only | API consumer | `http://192.168.86.48:11434` | | **Vault** | Not Installed | N/A | N/A | ## Service Configuration (Spark) Ollama runs as a `systemd` service on Spark. - **Service Name:** `ollama.service` - **Binary Path:** `/usr/local/bin/ollama` - **Model Storage:** `/usr/share/ollama/.ollama/models` - **Environment Overrides:** `/etc/systemd/system/ollama.service.d/override.conf` ### Key Environment Variables To allow fleet-wide access, the service must be configured with: - `OLLAMA_HOST=0.0.0.0:11434` (Listen on all interfaces) - `OLLAMA_ORIGINS=*` (Allow cross-origin requests from other fleet machines) ## Core CLI Commands Always run these via SSH on Spark or set `OLLAMA_HOST` locally. ### Management ```bash # List currently downloaded models ollama list # List currently running/loaded models ollama ps # Download a new model from the library ollama pull llama3:8b # Remove a model to free up VRAM ollama rm mistral ``` ### Interaction ```bash # Start an interactive chat session ollama run llama3 # Run a single prompt (non-interactive) ollama run llama3 "Summarize the system logs." ``` ### Customization ```bash # Create a model from a Modelfile ollama create my-custom-model -f ./Modelfile ``` ## API Integration Ollama provides two primary ways to interface with it programmatically from other machines in the fleet. ### 1. OpenAI Compatible API (Preferred) Ollama maps its models to the OpenAI `/v1/chat/completions` format. - **Base URL:** `http://192.168.86.48:11434/v1` - **Model Name:** Use the name exactly as it appears in `ollama list`. **Example Request (Python):** ```python import openai client = openai.OpenAI( base_url="http://192.168.86.48:11434/v1", api_key="ollama" # Required but ignored ) response = client.chat.completions.create( model="llama3", messages=[{"role": "user", "content": "Hello!"}] ) ``` ### 2. Native Ollama API Used for Ollama-specific features like generating embeddings or model management. - **Generate:** `POST /api/generate` - **Chat:** `POST /api/chat` - **Pull:** `POST /api/pull` ## Fleet-Specific Patterns ### Remote CLI Usage To control Ollama on Spark from your local Dev Workstation without SSHing: ```bash export OLLAMA_HOST=192.168.86.48:11434 ollama list ``` ### The "Spark Duo" Pattern On Spark, vLLM and Ollama share the same GPU resources. If vLLM is consuming all VRAM with a large Qwen instance, Ollama may fail to load models or trigger OOM errors. - **Priority:** vLLM (Production Services) > Ollama (Experimental/CLI). - **Check VRAM:** Use `nvidia-smi` on Spark before starting an Ollama run. ### Custom Modelfiles James prefers minimal abstractions. When creating specialized tools, use a `Modelfile`: ```dockerfile FROM llama3 SYSTEM "You are a specialized shell script assistant. Output only code." PARAMETER temperature 0.2 ``` ## Troubleshooting ### Service Connectivity If `curl http://192.168.86.48:11434` fails: 1. SSH into Spark: `ssh spark` 2. Check service: `systemctl status ollama` 3. Check listening port: `netstat -tulpn | grep 11434` 4. If it only shows `127.0.0.1`, the `OLLAMA_HOST` environment variable is missing from the systemd config. ### GPU Not Detected If Ollama runs in CPU-only mode (extremely slow): 1. Check drivers on Spark: `nvidia-smi` 2. Ensure the `ollama` user has permissions to the `/dev/nvidia*` devices. 3. Restart the service: `sudo systemctl restart ollama` ### Model Download Failures Ollama pulls can be large. If the root partition is full, check `/usr/share/ollama/.ollama/models`. This directory should ideally be a symlink to a larger data drive if Spark has one. ## Common Workflows for Agents 1. **Verify Availability:** Run `ollama list` to see if the required model is present. 2. **Provision:** If missing, run `ollama pull <model>`. 3. **Validate:** Perform a test completion via the OpenAI-compatible endpoint. 4. **Cleanup:** If the task is finished and the model is large/niche, suggest `ollama rm` to James to free up Spark's VRAM. ## References - Official Docs: https://ollama.com/docs - Library: https://ollama.com/library - GitHub: https://github.com/ollama/ollama

Install

curl -s https://skills.skynet.ceo/api/skills/software-ollama/skill.md