Ollama — SKILL.md
Raw skill file that agents receive when using this skill
---
name: "Ollama"
description: "Fleet skill: Ollama — software inventory and operations reference"
version: "1.0.0"
author: "skynet"
category: "fleet"
agents: ["claude-code", "codex", "gemini", "kimi"]
tags: ["software-ollama", "fleet", "fleet", "software"]
---
# Ollama
---
name: software-ollama
description: Manage and interface with Ollama for local LLM inference across the fleet. Use this when you need to pull models, check service status on Spark, or route OpenAI-compatible requests to the local Ollama instance.
metadata:
author: skynet
version: 1.0.0
---
# Ollama Software Skill
Ollama is the primary local LLM runner for James's fleet, optimized for ease of use and rapid model swapping. While vLLM handles high-throughput production inference for Qwen, Ollama is used for experimentation, specialized models (Llama 3, Mistral, Phi-3), and vision tasks.
## Fleet Deployment Status
Ollama is strictly localized to the **Spark** node to leverage its GPU resources.
| Machine | Status | Role | Endpoint |
|---------|--------|------|----------|
| **Spark** (192.168.86.48) | **Installed** | Primary Inference Server | `http://localhost:11434` |
| **Dev Workstation** | Client only | Remote CLI access | `OLLAMA_HOST=192.168.86.48` |
| **Dev Server** | Client only | API consumer | `http://192.168.86.48:11434` |
| **Vault** | Not Installed | N/A | N/A |
## Service Configuration (Spark)
Ollama runs as a `systemd` service on Spark.
- **Service Name:** `ollama.service`
- **Binary Path:** `/usr/local/bin/ollama`
- **Model Storage:** `/usr/share/ollama/.ollama/models`
- **Environment Overrides:** `/etc/systemd/system/ollama.service.d/override.conf`
### Key Environment Variables
To allow fleet-wide access, the service must be configured with:
- `OLLAMA_HOST=0.0.0.0:11434` (Listen on all interfaces)
- `OLLAMA_ORIGINS=*` (Allow cross-origin requests from other fleet machines)
## Core CLI Commands
Always run these via SSH on Spark or set `OLLAMA_HOST` locally.
### Management
```bash
# List currently downloaded models
ollama list
# List currently running/loaded models
ollama ps
# Download a new model from the library
ollama pull llama3:8b
# Remove a model to free up VRAM
ollama rm mistral
```
### Interaction
```bash
# Start an interactive chat session
ollama run llama3
# Run a single prompt (non-interactive)
ollama run llama3 "Summarize the system logs."
```
### Customization
```bash
# Create a model from a Modelfile
ollama create my-custom-model -f ./Modelfile
```
## API Integration
Ollama provides two primary ways to interface with it programmatically from other machines in the fleet.
### 1. OpenAI Compatible API (Preferred)
Ollama maps its models to the OpenAI `/v1/chat/completions` format.
- **Base URL:** `http://192.168.86.48:11434/v1`
- **Model Name:** Use the name exactly as it appears in `ollama list`.
**Example Request (Python):**
```python
import openai
client = openai.OpenAI(
base_url="http://192.168.86.48:11434/v1",
api_key="ollama" # Required but ignored
)
response = client.chat.completions.create(
model="llama3",
messages=[{"role": "user", "content": "Hello!"}]
)
```
### 2. Native Ollama API
Used for Ollama-specific features like generating embeddings or model management.
- **Generate:** `POST /api/generate`
- **Chat:** `POST /api/chat`
- **Pull:** `POST /api/pull`
## Fleet-Specific Patterns
### Remote CLI Usage
To control Ollama on Spark from your local Dev Workstation without SSHing:
```bash
export OLLAMA_HOST=192.168.86.48:11434
ollama list
```
### The "Spark Duo" Pattern
On Spark, vLLM and Ollama share the same GPU resources. If vLLM is consuming all VRAM with a large Qwen instance, Ollama may fail to load models or trigger OOM errors.
- **Priority:** vLLM (Production Services) > Ollama (Experimental/CLI).
- **Check VRAM:** Use `nvidia-smi` on Spark before starting an Ollama run.
### Custom Modelfiles
James prefers minimal abstractions. When creating specialized tools, use a `Modelfile`:
```dockerfile
FROM llama3
SYSTEM "You are a specialized shell script assistant. Output only code."
PARAMETER temperature 0.2
```
## Troubleshooting
### Service Connectivity
If `curl http://192.168.86.48:11434` fails:
1. SSH into Spark: `ssh spark`
2. Check service: `systemctl status ollama`
3. Check listening port: `netstat -tulpn | grep 11434`
4. If it only shows `127.0.0.1`, the `OLLAMA_HOST` environment variable is missing from the systemd config.
### GPU Not Detected
If Ollama runs in CPU-only mode (extremely slow):
1. Check drivers on Spark: `nvidia-smi`
2. Ensure the `ollama` user has permissions to the `/dev/nvidia*` devices.
3. Restart the service: `sudo systemctl restart ollama`
### Model Download Failures
Ollama pulls can be large. If the root partition is full, check `/usr/share/ollama/.ollama/models`. This directory should ideally be a symlink to a larger data drive if Spark has one.
## Common Workflows for Agents
1. **Verify Availability:** Run `ollama list` to see if the required model is present.
2. **Provision:** If missing, run `ollama pull <model>`.
3. **Validate:** Perform a test completion via the OpenAI-compatible endpoint.
4. **Cleanup:** If the task is finished and the model is large/niche, suggest `ollama rm` to James to free up Spark's VRAM.
## References
- Official Docs: https://ollama.com/docs
- Library: https://ollama.com/library
- GitHub: https://github.com/ollama/ollama
curl -s https://skills.skynet.ceo/api/skills/software-ollama/skill.md