Qdrant Vector Database — SKILL.md

Raw skill file that agents receive when using this skill
---
name: "Qdrant Vector Database"
description: "Skill for Qdrant Vector Database — auto-generated from documentation"
version: "1.0.0"
author: "skynet"
category: "infrastructure"
agents: ["claude-code", "codex", "gemini"]
tags: ["qdrant", "infrastructure", "auto-generated"]
---

# Qdrant Vector Database

---
name: Qdrant Vector Database
description: Use this skill when you need to set up, manage, and operate Qdrant vector database for similarity search, embeddings storage, and vector operations. Essential for AI applications requiring semantic search, recommendation systems, and vector-based machine learning workflows.
metadata:
  author: skynet
  version: 1.0.0
category: infrastructure
---

# Qdrant Vector Database

## Installation & Setup

### Docker Installation
```bash
# Pull and run Qdrant
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant
```

### Local Installation
```bash
# Download binary (Linux)
wget https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-gnu.tar.gz
tar -xzf qdrant-x86_64-unknown-linux-gnu.tar.gz
./qdrant

# Or via package manager
cargo install qdrant
```

### Python Client Setup
```bash
pip install qdrant-client
```

## Collection Management

### Create Collection
```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient("localhost", port=6333)

# Create collection with vector configuration
client.create_collection(
    collection_name="my_collection",
    vectors_config=VectorParams(size=100, distance=Distance.COSINE)
)

# Create collection with multiple vectors
client.create_collection(
    collection_name="multi_vector",
    vectors_config={
        "text": VectorParams(size=384, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.EUCLID)
    }
)
```

### Collection Operations
```python
# List collections
collections = client.get_collections()

# Get collection info
info = client.get_collection("my_collection")

# Delete collection
client.delete_collection("my_collection")

# Update collection parameters
client.update_collection(
    collection_name="my_collection",
    optimizer_config=models.OptimizersConfigDiff(
        indexing_threshold=10000
    )
)
```

## Vector Operations

### Insert Vectors
```python
from qdrant_client.models import PointStruct

# Single vector insert
client.upsert(
    collection_name="my_collection",
    points=[
        PointStruct(
            id=1,
            vector=[0.1, 0.2, 0.3, ...],  # 100-dimensional vector
            payload={"title": "Document 1", "category": "tech"}
        )
    ]
)

# Batch insert
points = []
for i in range(1000):
    points.append(PointStruct(
        id=i,
        vector=[random.random() for _ in range(100)],
        payload={"doc_id": i, "timestamp": time.time()}
    ))

client.upsert(
    collection_name="my_collection",
    points=points,
    wait=True  # Wait for operation to complete
)
```

### Search Vectors
```python
# Basic similarity search
search_result = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, 0.3, ...],
    limit=10
)

# Search with filters
from qdrant_client.models import Filter, FieldCondition, MatchValue

search_result = client.search(
    collection_name="my_collection",
    query_vector=[0.1, 0.2, 0.3, ...],
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="tech")
            )
        ]
    ),
    limit=10,
    with_payload=True,
    with_vectors=True
)
```

## Advanced Filtering

### Complex Filter Conditions
```python
from qdrant_client.models import Filter, FieldCondition, Range

# Range and multiple conditions
complex_filter = Filter(
    must=[
        FieldCondition(key="price", range=Range(gte=10.0, lt=100.0)),
        FieldCondition(key="category", match=MatchValue(value="electronics"))
    ],
    must_not=[
        FieldCondition(key="status", match=MatchValue(value="discontinued"))
    ]
)

# Search with complex filter
results = client.search(
    collection_name="products",
    query_vector=query_vector,
    query_filter=complex_filter,
    limit=20
)
```

### Geo-filtering
```python
from qdrant_client.models import GeoRadius, GeoPoint

geo_filter = Filter(
    must=[
        FieldCondition(
            key="location",
            geo_radius=GeoRadius(
                center=GeoPoint(lon=13.4050, lat=52.5200),  # Berlin
                radius=1000.0  # 1km radius
            )
        )
    ]
)
```

## Performance Optimization

### Indexing Configuration
```python
from qdrant_client.models import TextIndexParams, PayloadSchemaType

# Create payload index
client.create_payload_index(
    collection_name="my_collection",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

# Create text index for full-text search
client.create_payload_index(
    collection_name="my_collection",
    field_name="description",
    field_schema=TextIndexParams(
        type="text",
        tokenizer="word",
        min_token_len=2,
        max_token_len=20
    )
)
```

### HNSW Parameters
```python
from qdrant_client.models import HnswConfigDiff

# Update HNSW configuration
client.update_collection(
    collection_name="my_collection",
    hnsw_config=HnswConfigDiff(
        m=16,  # Number of bi-directional links
        ef_construct=100,  # Size of dynamic candidate list
        full_scan_threshold=10000
    )
)
```

## Decision Tree: Collection Setup Strategy

```
Collection Setup Decision Tree:
│
├── Vector Size < 100 dimensions?
│   ├── Yes: Use Distance.COSINE, m=16, ef_construct=100
│   └── No: Vector Size > 1000?
│       ├── Yes: Use Distance.DOT, m=32, ef_construct=200
│       └── No: Use Distance.EUCLID, m=24, ef_construct=150
│
├── Expected Collection Size?
│   ├── < 10K vectors: indexing_threshold=1000
│   ├── 10K-1M vectors: indexing_threshold=10000
│   └── > 1M vectors: indexing_threshold=50000
│
└── Query Pattern?
    ├── Frequent filtering: Create payload indexes
    ├── Geo queries: Use geo fields + indexes
    └── Text search: Create text indexes
```

## Backup and Recovery

### Create Snapshots
```bash
# Create collection snapshot
curl -X POST "http://localhost:6333/collections/my_collection/snapshots"

# Create full cluster snapshot
curl -X POST "http://localhost:6333/snapshots"
```

```python
# Python client snapshot
snapshot_info = client.create_snapshot(collection_name="my_collection")
print(f"Snapshot created: {snapshot_info.name}")

# Download snapshot
client.download_snapshot(
    collection_name="my_collection",
    snapshot_name=snapshot_info.name,
    output_path="./backup.snapshot"
)
```

### Restore from Snapshot
```bash
# Restore collection from snapshot
curl -X PUT "http://localhost:6333/collections/my_collection/snapshots/upload" \
    -H "Content-Type: application/octet-stream" \
    --data-binary @backup.snapshot
```

## Monitoring and Health Checks

### Health Endpoints
```bash
# Check cluster health
curl http://localhost:6333/health

# Get cluster info
curl http://localhost:6333/cluster

# Check collection info
curl http://localhost:6333/collections/my_collection
```

### Performance Metrics
```python
# Get collection cluster info
cluster_info = client.get_cluster_info()
print(f"Peer count: {len(cluster_info.peers)}")

# Collection statistics
collection_info = client.get_collection("my_collection")
print(f"Vectors count: {collection_info.vectors_count}")
print(f"Indexed vectors: {collection_info.indexed_vectors_count}")
```

## Troubleshooting

### Common Errors and Fixes

**Error: "Collection already exists"**
```python
# Check if collection exists before creating
try:
    client.get_collection("my_collection")
    print("Collection exists")
except Exception:
    client.create_collection(...)
```

**Error: "Vector dimension mismatch"**
```python
# Verify vector dimensions match collection config
collection_info = client.get_collection("my_collection")
expected_size = collection_info.config.params.vectors.size
assert len(vector) == expected_size, f"Expected {expected_size} dimensions"
```

**Error: "Service unavailable"**
```bash
# Check Qdrant service status
docker ps | grep qdrant
# Restart if needed
docker restart qdrant_container

# Check logs
docker logs qdrant_container
```

**Performance Issues:**
```python
# Check if indexing is complete
collection_info = client.get_collection("my_collection")
indexed_ratio = collection_info.indexed_vectors_count / collection_info.vectors_count
if indexed_ratio < 0.9:
    print("Indexing in progress, performance may be affected")

# Optimize collection
client.update_collection(
    collection_name="my_collection",
    optimizer_config=models.OptimizersConfigDiff(
        deleted_threshold=0.2,
        vacuum_min_vector_number=1000
    )
)
```

**Memory Issues:**
```bash
# Increase memory limits in Docker
docker run -p 6333:6333 -m 4g qdrant/qdrant

# Or adjust HNSW parameters to reduce memory usage
```

### Debug Commands
```bash
# Check storage usage
du -sh ./qdrant_storage/

# Monitor Qdrant logs
tail -f ./qdrant_storage/logs/qdrant.log

# Check open files (if hitting limits)
lsof -p $(pgrep qdrant) | wc -l
```
curl -s https://skills.skynet.ceo/api/skills/qdrant/skill.md