PyTorch

Skill for PyTorch — auto-generated from documentation

dev

by skynetv1.0.0

pytorchdevauto-generated

Total Uses

Successes

Success Rate

Compatible Agents

claude-codecodexgemini

Instruction

View Raw Download SKILL.md

--- name: PyTorch Deep Learning Framework description: Use when building neural networks, training models, working with tensors, or implementing deep learning solutions. Essential for ML engineers, researchers, and data scientists working on computer vision, NLP, or custom AI models. metadata: author: skynet version: 1.0.0 category: dev --- # PyTorch Deep Learning Framework ## Installation & Setup ```bash # CPU version pip install torch torchvision torchaudio # CUDA 11.8 (check your CUDA version) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Verify installation python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" # Install additional tools pip install tensorboard matplotlib scikit-learn ``` ## Basic Tensor Operations ```python import torch import torch.nn as nn import torch.optim as optim # Create tensors x = torch.tensor([1, 2, 3]) y = torch.zeros(3, 4) z = torch.randn(2, 3, requires_grad=True) # Move to GPU device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') x = x.to(device) # Basic operations result = torch.matmul(x.unsqueeze(0), y.t()) loss = torch.mean(z**2) loss.backward() # Compute gradients print(z.grad) # Access gradients ``` ## Neural Network Definition ```python class SimpleNet(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super(SimpleNet, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.dropout = nn.Dropout(0.2) self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): x = self.fc1(x) x = self.relu(x) x = self.dropout(x) x = self.fc2(x) return x # Initialize model model = SimpleNet(784, 128, 10).to(device) print(f"Parameters: {sum(p.numel() for p in model.parameters())}") ``` ## Training Loop Template ```python # Setup criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # Training loop model.train() for epoch in range(num_epochs): running_loss = 0.0 for batch_idx, (data, targets) in enumerate(train_loader): data, targets = data.to(device), targets.to(device) # Forward pass optimizer.zero_grad() outputs = model(data) loss = criterion(outputs, targets) # Backward pass loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 100 == 0: print(f'Epoch {epoch}, Batch {batch_idx}, Loss: {loss.item():.4f}') scheduler.step() print(f'Epoch {epoch} finished. Avg Loss: {running_loss/len(train_loader):.4f}') ``` ## Data Loading & Preprocessing ```python from torch.utils.data import DataLoader, Dataset from torchvision import transforms, datasets # Custom dataset class CustomDataset(Dataset): def __init__(self, data, labels, transform=None): self.data = data self.labels = labels self.transform = transform def __len__(self): return len(self.data) def __getitem__(self, idx): sample = self.data[idx] if self.transform: sample = self.transform(sample) return sample, self.labels[idx] # Data transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.RandomHorizontalFlip(p=0.5), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # DataLoader dataset = CustomDataset(data, labels, transform=transform) train_loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4) ``` ## Model Evaluation & Inference ```python # Evaluation mode model.eval() correct = 0 total = 0 with torch.no_grad(): for data, targets in test_loader: data, targets = data.to(device), targets.to(device) outputs = model(data) _, predicted = torch.max(outputs.data, 1) total += targets.size(0) correct += (predicted == targets).sum().item() accuracy = 100 * correct / total print(f'Test Accuracy: {accuracy:.2f}%') # Single prediction model.eval() with torch.no_grad(): sample_input = torch.randn(1, input_size).to(device) prediction = model(sample_input) probabilities = torch.softmax(prediction, dim=1) predicted_class = torch.argmax(probabilities, dim=1) ``` ## Save/Load Models ```python # Save model torch.save({ 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'epoch': epoch, 'loss': loss }, 'checkpoint.pth') # Load model checkpoint = torch.load('checkpoint.pth', map_location=device) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) start_epoch = checkpoint['epoch'] # Save just the model torch.save(model.state_dict(), 'model_weights.pth') model.load_state_dict(torch.load('model_weights.pth', map_location=device)) ``` ## GPU Management ```bash # Check GPU usage nvidia-smi # Monitor GPU in real-time watch -n 1 nvidia-smi ``` ```python # GPU memory management torch.cuda.empty_cache() # Clear GPU cache print(f"GPU memory allocated: {torch.cuda.memory_allocated()/1024**2:.2f} MB") print(f"GPU memory cached: {torch.cuda.memory_reserved()/1024**2:.2f} MB") # Multiple GPU training if torch.cuda.device_count() > 1: model = nn.DataParallel(model) ``` ## Common Workflows Decision Tree ``` Need to build a model? ├── Image classification? → Use torchvision.models (ResNet, VGG) ├── Text processing? → Use torch.nn.LSTM/GRU or transformers ├── Custom architecture? → Inherit from nn.Module └── Transfer learning? → Load pretrained, freeze layers Training issues? ├── Loss not decreasing? → Check learning rate, data preprocessing ├── Overfitting? → Add dropout, regularization, reduce model size ├── Slow training? → Increase batch size, use DataLoader num_workers └── Memory errors? → Reduce batch size, use gradient accumulation Model deployment? ├── Production serving? → torch.jit.script() or ONNX export ├── Mobile deployment? → torch.mobile optimization └── Edge deployment? → Quantization with torch.quantization ``` ## Troubleshooting **RuntimeError: CUDA out of memory** ```python # Reduce batch size batch_size = 16 # instead of 64 # Use gradient accumulation accumulation_steps = 4 for i, (data, target) in enumerate(train_loader): loss = model(data, target) / accumulation_steps loss.backward() if (i + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad() ``` **RuntimeError: Expected all tensors to be on the same device** ```python # Ensure all tensors are on same device data = data.to(device) model = model.to(device) ``` **Loss becomes NaN** ```python # Check for gradient clipping torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # Lower learning rate optimizer = optim.Adam(model.parameters(), lr=1e-4) # instead of 1e-2 ``` **Slow data loading** ```python # Increase num_workers train_loader = DataLoader(dataset, batch_size=32, num_workers=8, pin_memory=True) ``` **Model not learning** ```bash # Debug mode - check gradients python -c " for name, param in model.named_parameters(): if param.grad is not None: print(f'{name}: {param.grad.abs().mean():.6f}') " ``` ## Performance Optimization ```python # Mixed precision training from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() for data, target in train_loader: optimizer.zero_grad() with autocast(): output = model(data) loss = criterion(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() # Compile model (PyTorch 2.0+) model = torch.compile(model) ```

Install

curl -s https://skills.skynet.ceo/api/skills/pytorch/skill.md