🚨

Incident Response Automation

DevOpsAdvanced1.5 hours

Detect production issues and coordinate incident response

Prerequisites

OpenClaw installed and running
Monitoring tools (Sentry, Datadog, or Pingdom)
Slack workspace
On-call rotation defined

Required Skills

sentry-debugger

openclaw install sentry-debugger

slack-digest

openclaw install slack-digest

docker-manager

openclaw install docker-manager

Installation Steps

Install required skills

Install the Sentry debugger, Slack digest, and Docker manager skills.

openclaw install sentry-debugger slack-digest docker-manager

Configure alert sources

Set up webhooks from Sentry, Datadog, and/or Pingdom to your OpenClaw instance.

Define severity levels

Configure the severity assessment rules and corresponding actions (page, Slack mention, message, or log).

Set up on-call rotation

List the on-call engineers and configure the escalation path.

Add the config snippet

Copy the configuration below and customize the alert sources, severity levels, and on-call team.

Configuration

{
  "webhooks": {
    "alert": {
      "url": "/webhooks/alert",
      "sources": ["sentry", "datadog", "pingdom"],
      "actions": [
        "assess-severity",
        "create-incident-channel",
        "notify-on-call",
        "gather-diagnostics",
        "suggest-remediation"
      ]
    }
  },
  "incidentResponse": {
    "onCall": ["alice", "bob"],
    "severityLevels": {
      "critical": "page-immediately",
      "high": "slack-mention",
      "medium": "slack-message",
      "low": "log-only"
    }
  }
}

Add this to your openclaw.json and customize the values for your setup.

SOUL.md

## Incident Response Behavior
- Stay calm in all messaging. No exclamation marks, no "URGENT!!!" — a measured tone helps the team think clearly.
- In the first message to the incident channel, state only what you know for certain. Separate confirmed facts from hypotheses.
- Don't page for issues that self-resolve within 2 minutes (transient spikes, single-request failures). Wait, re-check, then escalate.
- If multiple alerts fire within 60 seconds, treat them as one incident. Look for a common cause before creating separate channels.
- When suggesting remediation, always include the rollback option first. The fastest fix is usually undoing the last deploy.
- Never restart services or roll back automatically — suggest it and wait for human confirmation. You don't have full context.
- Post a timeline of events in the incident channel as you gather information. This becomes the post-mortem foundation.

Add this to your SOUL.md to define the agent's behavior for this workflow.

Expected Behavior

When a production alert fires, OpenClaw assesses severity, creates a dedicated Slack incident channel, pages the on-call engineer, gathers system diagnostics, and provides remediation suggestions based on similar past incidents.

Usage Guide

The incident response is fully automated. When an alert fires, OpenClaw creates a #incident-XXXX Slack channel, pages the on-call person for critical issues, and starts gathering diagnostics. The remediation suggestions improve over time as the system learns from past incidents.

Community Use Cases

All Use Cases →

Deploy Monitoring + Root Cause Analysis While Walking the Dog

Put OpenClaw agent on a Hetzner server. It checked on the deployment of a Railway project, reviewed logs, identified root cause of failed builds, updated configs, redeployed, and confirmed everything worked — all while walking the dog.

George Dagg @georgedagg_

More DevOps Recipes

All Recipes →

🐛

Sentry → Auto-Debug → Open PR

Automatically analyze Sentry errors, generate fixes, and create pull requests

Advanced1 hour

👀

PR Review Automation

Automatically review pull requests with security scanning and style checks

Intermediate45 minutes

📝

Daily Standup Summarizer

Collect GitHub activity and generate standup summaries for the team

Beginner20 minutes