Ollama — SKILL.md

Raw skill file that agents receive when using this skill
---
name: "Ollama"
description: "Fleet skill: Ollama — software inventory and operations reference"
version: "1.0.0"
author: "skynet"
category: "fleet"
agents: ["claude-code", "codex", "gemini", "kimi"]
tags: ["software-ollama", "fleet", "fleet", "software"]
---

# Ollama

---
name: software-ollama
description: Manage and interface with Ollama for local LLM inference across the fleet. Use this when you need to pull models, check service status on Spark, or route OpenAI-compatible requests to the local Ollama instance.
metadata:
  author: skynet
  version: 1.0.0
---

# Ollama Software Skill

Ollama is the primary local LLM runner for James's fleet, optimized for ease of use and rapid model swapping. While vLLM handles high-throughput production inference for Qwen, Ollama is used for experimentation, specialized models (Llama 3, Mistral, Phi-3), and vision tasks.

## Fleet Deployment Status

Ollama is strictly localized to the **Spark** node to leverage its GPU resources.

| Machine | Status | Role | Endpoint |
|---------|--------|------|----------|
| **Spark** (192.168.86.48) | **Installed** | Primary Inference Server | `http://localhost:11434` |
| **Dev Workstation** | Client only | Remote CLI access | `OLLAMA_HOST=192.168.86.48` |
| **Dev Server** | Client only | API consumer | `http://192.168.86.48:11434` |
| **Vault** | Not Installed | N/A | N/A |

## Service Configuration (Spark)

Ollama runs as a `systemd` service on Spark.

- **Service Name:** `ollama.service`
- **Binary Path:** `/usr/local/bin/ollama`
- **Model Storage:** `/usr/share/ollama/.ollama/models`
- **Environment Overrides:** `/etc/systemd/system/ollama.service.d/override.conf`

### Key Environment Variables
To allow fleet-wide access, the service must be configured with:
- `OLLAMA_HOST=0.0.0.0:11434` (Listen on all interfaces)
- `OLLAMA_ORIGINS=*` (Allow cross-origin requests from other fleet machines)

## Core CLI Commands

Always run these via SSH on Spark or set `OLLAMA_HOST` locally.

### Management
```bash
# List currently downloaded models
ollama list

# List currently running/loaded models
ollama ps

# Download a new model from the library
ollama pull llama3:8b

# Remove a model to free up VRAM
ollama rm mistral
```

### Interaction
```bash
# Start an interactive chat session
ollama run llama3

# Run a single prompt (non-interactive)
ollama run llama3 "Summarize the system logs."
```

### Customization
```bash
# Create a model from a Modelfile
ollama create my-custom-model -f ./Modelfile
```

## API Integration

Ollama provides two primary ways to interface with it programmatically from other machines in the fleet.

### 1. OpenAI Compatible API (Preferred)
Ollama maps its models to the OpenAI `/v1/chat/completions` format.

- **Base URL:** `http://192.168.86.48:11434/v1`
- **Model Name:** Use the name exactly as it appears in `ollama list`.

**Example Request (Python):**
```python
import openai

client = openai.OpenAI(
    base_url="http://192.168.86.48:11434/v1",
    api_key="ollama" # Required but ignored
)

response = client.chat.completions.create(
    model="llama3",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

### 2. Native Ollama API
Used for Ollama-specific features like generating embeddings or model management.

- **Generate:** `POST /api/generate`
- **Chat:** `POST /api/chat`
- **Pull:** `POST /api/pull`

## Fleet-Specific Patterns

### Remote CLI Usage
To control Ollama on Spark from your local Dev Workstation without SSHing:
```bash
export OLLAMA_HOST=192.168.86.48:11434
ollama list
```

### The "Spark Duo" Pattern
On Spark, vLLM and Ollama share the same GPU resources. If vLLM is consuming all VRAM with a large Qwen instance, Ollama may fail to load models or trigger OOM errors.
- **Priority:** vLLM (Production Services) > Ollama (Experimental/CLI).
- **Check VRAM:** Use `nvidia-smi` on Spark before starting an Ollama run.

### Custom Modelfiles
James prefers minimal abstractions. When creating specialized tools, use a `Modelfile`:
```dockerfile
FROM llama3
SYSTEM "You are a specialized shell script assistant. Output only code."
PARAMETER temperature 0.2
```

## Troubleshooting

### Service Connectivity
If `curl http://192.168.86.48:11434` fails:
1. SSH into Spark: `ssh spark`
2. Check service: `systemctl status ollama`
3. Check listening port: `netstat -tulpn | grep 11434`
4. If it only shows `127.0.0.1`, the `OLLAMA_HOST` environment variable is missing from the systemd config.

### GPU Not Detected
If Ollama runs in CPU-only mode (extremely slow):
1. Check drivers on Spark: `nvidia-smi`
2. Ensure the `ollama` user has permissions to the `/dev/nvidia*` devices.
3. Restart the service: `sudo systemctl restart ollama`

### Model Download Failures
Ollama pulls can be large. If the root partition is full, check `/usr/share/ollama/.ollama/models`. This directory should ideally be a symlink to a larger data drive if Spark has one.

## Common Workflows for Agents

1. **Verify Availability:** Run `ollama list` to see if the required model is present.
2. **Provision:** If missing, run `ollama pull <model>`.
3. **Validate:** Perform a test completion via the OpenAI-compatible endpoint.
4. **Cleanup:** If the task is finished and the model is large/niche, suggest `ollama rm` to James to free up Spark's VRAM.

## References
- Official Docs: https://ollama.com/docs
- Library: https://ollama.com/library
- GitHub: https://github.com/ollama/ollama
curl -s https://skills.skynet.ceo/api/skills/software-ollama/skill.md