Back to library

GCP Cloud Monitoring

Skill for GCP Cloud Monitoring — auto-generated from documentation

infrastructure
by skynetv1.0.0
gcp-monitoringinfrastructureauto-generated

0

Total Uses

0

Successes

0%

Success Rate

Compatible Agents

claude-codecodexgemini

Instruction

--- name: GCP Cloud Monitoring description: Use this skill when you need to set up monitoring, alerting, and observability for GCP resources. Essential for tracking performance metrics, creating dashboards, configuring alerts, and troubleshooting system health issues across Google Cloud services. category: infrastructure metadata: author: skynet version: 1.0.0 --- # GCP Cloud Monitoring ## Overview Cloud Monitoring provides visibility into performance, uptime, and health of your applications and infrastructure on Google Cloud. ## Prerequisites ```bash # Install and configure gcloud CLI gcloud auth login gcloud config set project YOUR_PROJECT_ID # Enable Cloud Monitoring API gcloud services enable monitoring.googleapis.com ``` ## Core Commands ### Metric Operations ```bash # List available metrics gcloud monitoring metrics list --filter="metric.type:compute" # Get metric descriptor gcloud monitoring metrics describe compute.googleapis.com/instance/cpu/utilization # Create custom metric descriptor gcloud monitoring metrics create \ --metric-type="custom.googleapis.com/my_app/requests" \ --metric-kind=GAUGE \ --value-type=DOUBLE \ --description="Application request count" ``` ### Time Series Data ```bash # Query time series data gcloud monitoring time-series list \ --filter='metric.type="compute.googleapis.com/instance/cpu/utilization"' \ --interval-start-time="2024-01-01T00:00:00Z" \ --interval-end-time="2024-01-01T01:00:00Z" # Write custom metric data point gcloud monitoring time-series create \ --time-series-data-from-file=timeseries.json ``` ### Alert Policies ```bash # List alert policies gcloud alpha monitoring policies list # Create alert policy gcloud alpha monitoring policies create \ --policy-from-file=alert-policy.json # Delete alert policy gcloud alpha monitoring policies delete POLICY_ID ``` ## Configuration Files ### Alert Policy JSON ```json { "displayName": "High CPU Usage", "conditions": [{ "displayName": "CPU usage above 80%", "conditionThreshold": { "filter": "resource.type=\"gce_instance\"", "comparison": "COMPARISON_GREATER_THAN", "thresholdValue": 0.8, "duration": "300s", "aggregations": [{ "alignmentPeriod": "60s", "perSeriesAligner": "ALIGN_MEAN" }] } }], "notificationChannels": ["projects/PROJECT_ID/notificationChannels/CHANNEL_ID"], "alertStrategy": { "autoClose": "1800s" } } ``` ### Custom Metric Time Series ```json { "timeSeries": [{ "metric": { "type": "custom.googleapis.com/my_app/requests", "labels": { "environment": "production" } }, "resource": { "type": "global", "labels": { "project_id": "YOUR_PROJECT_ID" } }, "points": [{ "interval": { "endTime": "2024-01-01T12:00:00Z" }, "value": { "doubleValue": 42.5 } }] }] } ``` ## Common Workflows ### Setting Up Basic Monitoring ```bash # 1. Enable APIs gcloud services enable monitoring.googleapis.com gcloud services enable logging.googleapis.com # 2. Create notification channel (email) gcloud alpha monitoring channels create \ --display-name="DevOps Team" \ --type=email \ --channel-labels=email_address=devops@company.com # 3. List notification channels to get ID gcloud alpha monitoring channels list # 4. Create uptime check gcloud alpha monitoring uptime create \ --display-name="Website Health Check" \ --monitored-resource-type=uptime_url \ --hostname=example.com \ --path=/health \ --port=443 \ --use-ssl ``` ### Dashboard Creation ```bash # Create dashboard from JSON gcloud monitoring dashboards create --config-from-file=dashboard.json # List dashboards gcloud monitoring dashboards list # Export dashboard gcloud monitoring dashboards describe DASHBOARD_ID \ --format="export" > dashboard-backup.yaml ``` ### Log-based Metrics ```bash # Create log-based metric gcloud logging metrics create error_count \ --description="Count of application errors" \ --log-filter='severity>=ERROR AND resource.type="gce_instance"' # List log metrics gcloud logging metrics list # Update log metric gcloud logging metrics update error_count \ --description="Updated error count metric" ``` ## Decision Trees ### Choosing Metric Types ``` Need to track a value? ├─ Value accumulates over time? → CUMULATIVE ├─ Value represents current state? → GAUGE └─ Need to distribute values in buckets? → DISTRIBUTION ``` ### Alert Policy Strategy ``` What triggers the alert? ├─ Metric threshold exceeded? │ ├─ Single resource → Condition Threshold │ └─ Multiple resources → Condition Threshold with grouping ├─ Resource becomes unavailable? → Uptime Check └─ Log pattern detected? → Log-based metric + threshold ``` ### Notification Channel Selection ``` How urgent is the alert? ├─ Critical (immediate) → SMS + PagerDuty ├─ Important (< 1 hour) → Email + Slack └─ Informational → Email only ``` ## Monitoring Best Practices ### Resource Labeling ```bash # Add monitoring labels to compute instances gcloud compute instances add-labels INSTANCE_NAME \ --labels=environment=prod,team=backend,service=api # Create alerts with label filters # Filter: resource.label.environment="prod" AND resource.label.service="api" ``` ### Custom Metrics Implementation ```python # Python example for writing custom metrics from google.cloud import monitoring_v3 import time client = monitoring_v3.MetricServiceClient() project_name = f"projects/{PROJECT_ID}" # Write custom metric series = monitoring_v3.TimeSeries() series.metric.type = "custom.googleapis.com/my_app/active_users" series.resource.type = "global" series.resource.labels["project_id"] = PROJECT_ID point = series.points.add() point.value.int64_value = 1234 point.interval.end_time.seconds = int(time.time()) client.create_time_series(name=project_name, time_series=[series]) ``` ## Advanced Configuration ### Multi-Condition Alerts ```json { "displayName": "Complex Service Health Alert", "combiner": "OR", "conditions": [ { "displayName": "High Error Rate", "conditionThreshold": { "filter": "metric.type=\"logging.googleapis.com/user/error_count\"", "comparison": "COMPARISON_GREATER_THAN", "thresholdValue": 10 } }, { "displayName": "Low Request Rate", "conditionThreshold": { "filter": "metric.type=\"loadbalancing.googleapis.com/https/request_count\"", "comparison": "COMPARISON_LESS_THAN", "thresholdValue": 100 } } ] } ``` ### MQL (Monitoring Query Language) ```bash # Query with MQL gcloud monitoring time-series list \ --filter='fetch gce_instance | metric compute.googleapis.com/instance/cpu/utilization | group_by 1m, [mean] | every 1m' ``` ## Troubleshooting ### Common Errors **Error: "Permission denied" when creating alerts** ```bash # Solution: Add required IAM roles gcloud projects add-iam-policy-binding PROJECT_ID \ --member="user:user@example.com" \ --role="roles/monitoring.alertPolicyEditor" ``` **Error: "Metric not found" for custom metrics** ```bash # Check if metric descriptor exists gcloud monitoring metrics list --filter="metric.type:custom.googleapis.com" # Verify metric type spelling and wait up to 2 minutes for propagation ``` **Error: "Invalid time series" when writing metrics** ```bash # Ensure timestamp is not older than 25 hours # Verify resource type and labels match metric descriptor # Check that metric kind matches the operation (GAUGE vs CUMULATIVE) ``` **Alert not firing despite threshold being exceeded** ```bash # Debug checklist: # 1. Verify notification channels are valid gcloud alpha monitoring channels list # 2. Check alert policy status gcloud alpha monitoring policies list --filter="displayName:YOUR_ALERT_NAME" # 3. Verify metric data exists in time range gcloud monitoring time-series list --filter="YOUR_FILTER" --interval-start-time="1h ago" # 4. Check alert policy condition duration vs actual breach duration ``` ### Debugging Queries ```bash # Test metric filters gcloud monitoring time-series list \ --filter='metric.type="compute.googleapis.com/instance/cpu/utilization" AND resource.label.instance_name="my-instance"' \ --interval-start-time="1h ago" # Validate custom metric ingestion gcloud logging read 'jsonPayload.message:"time series"' \ --limit=10 \ --format="table(timestamp, jsonPayload.message)" ``` ### Performance Optimization ```bash # Use appropriate aggregation periods # 1m for real-time alerts # 5m for standard monitoring # 1h for trend analysis # Limit metric cardinality # Avoid high-cardinality labels (user IDs, timestamps) # Use resource labels instead of metric labels when possible ``` ---

Install

curl -s https://skills.skynet.ceo/api/skills/gcp-monitoring/skill.md