🚨

Incident Response Automation

DevOpsAdvanced1.5 hours

Detect production issues and coordinate incident response

Prerequisites

  • OpenClaw installed and running
  • Monitoring tools (Sentry, Datadog, or Pingdom)
  • Slack workspace
  • On-call rotation defined

Required Skills

sentry-debugger
openclaw install sentry-debugger
slack-digest
openclaw install slack-digest
docker-manager
openclaw install docker-manager

Installation Steps

1

Install required skills

Install the Sentry debugger, Slack digest, and Docker manager skills.

openclaw install sentry-debugger slack-digest docker-manager
2

Configure alert sources

Set up webhooks from Sentry, Datadog, and/or Pingdom to your OpenClaw instance.

3

Define severity levels

Configure the severity assessment rules and corresponding actions (page, Slack mention, message, or log).

4

Set up on-call rotation

List the on-call engineers and configure the escalation path.

5

Add the config snippet

Copy the configuration below and customize the alert sources, severity levels, and on-call team.

Configuration

{
  "webhooks": {
    "alert": {
      "url": "/webhooks/alert",
      "sources": ["sentry", "datadog", "pingdom"],
      "actions": [
        "assess-severity",
        "create-incident-channel",
        "notify-on-call",
        "gather-diagnostics",
        "suggest-remediation"
      ]
    }
  },
  "incidentResponse": {
    "onCall": ["alice", "bob"],
    "severityLevels": {
      "critical": "page-immediately",
      "high": "slack-mention",
      "medium": "slack-message",
      "low": "log-only"
    }
  }
}

Add this to your openclaw.json and customize the values for your setup.

SOUL.md

## Incident Response Behavior
- Stay calm in all messaging. No exclamation marks, no "URGENT!!!" — a measured tone helps the team think clearly.
- In the first message to the incident channel, state only what you know for certain. Separate confirmed facts from hypotheses.
- Don't page for issues that self-resolve within 2 minutes (transient spikes, single-request failures). Wait, re-check, then escalate.
- If multiple alerts fire within 60 seconds, treat them as one incident. Look for a common cause before creating separate channels.
- When suggesting remediation, always include the rollback option first. The fastest fix is usually undoing the last deploy.
- Never restart services or roll back automatically — suggest it and wait for human confirmation. You don't have full context.
- Post a timeline of events in the incident channel as you gather information. This becomes the post-mortem foundation.

Add this to your SOUL.md to define the agent's behavior for this workflow.

Expected Behavior

When a production alert fires, OpenClaw assesses severity, creates a dedicated Slack incident channel, pages the on-call engineer, gathers system diagnostics, and provides remediation suggestions based on similar past incidents.

Usage Guide

The incident response is fully automated. When an alert fires, OpenClaw creates a #incident-XXXX Slack channel, pages the on-call person for critical issues, and starts gathering diagnostics. The remediation suggestions improve over time as the system learns from past incidents.

Community Use Cases

All Use Cases →

More DevOps Recipes

All Recipes →