Overview

VictorOps, now known as Splunk On-Call, is an incident management and on-call automation platform that helps development and operations teams manage system incidents from detection to resolution. Originally founded in 2012, VictorOps was acquired by Splunk in 2018 and has since been integrated into Splunk's Observability Cloud offerings, maintaining its core functionality for incident response and on-call scheduling. The platform serves as a central hub for alerts originating from various monitoring systems, consolidating them to prevent alert fatigue and ensure that critical issues are addressed promptly by the correct personnel.

The primary function of Splunk On-Call is to provide a structured approach to incident response. This includes sophisticated on-call scheduling capabilities, allowing teams to define primary and secondary responders, escalation paths, and holiday overrides. When an alert is triggered from an integrated monitoring tool, Splunk On-Call evaluates the severity and context, then routes the notification through pre-defined escalation policies. This automation aims to reduce the Mean Time To Acknowledge (MTTA) and Mean Time To Resolve (MTTR) incidents, which are key metrics in site reliability engineering (SRE) practices, as discussed in industry resources like Atlassian's guide on incident management metrics.

Beyond simple notification, Splunk On-Call facilitates collaborative incident resolution through real-time communication channels, incident timelines, and runbook automation. Teams can communicate within the platform, attach relevant diagnostic information, and execute automated actions to mitigate or resolve issues. After an incident is closed, the platform supports post-incident analysis with detailed timelines and reporting features, enabling teams to identify root causes, improve processes, and prevent recurrence. This focus on continuous improvement aligns with the principles of blameless post-mortems, a practice widely adopted in organizations with mature incident response processes. The platform is best suited for organizations that require a comprehensive solution for managing complex on-call rotations, aggregating alerts from a diverse set of monitoring tools, and improving their overall incident response workflow.

Key features

  • On-Call Scheduling and Routing: Define complex on-call rotations, escalation policies, and overrides. Ensures alerts reach the right person at the right time based on a flexible schedule, supporting global teams and various time zones.
  • Alert Aggregation and Correlation: Consolidates alerts from multiple monitoring tools into a single pane of glass. Intelligent deduplication and correlation reduce alert noise, presenting only actionable incidents to on-call engineers.
  • Incident Timeline and Collaboration: Provides a chronological record of all events related to an incident, from initial alert to resolution. Facilitates real-time communication and collaboration among responders within the platform.
  • Runbook Automation: Automates common incident response tasks and diagnostic steps. Allows teams to define and execute pre-built or custom actions directly from an incident, accelerating resolution.
  • Post-Incident Analysis and Reporting: Generates detailed reports and timelines for completed incidents. Supports blameless post-mortems by providing data for root cause analysis and process improvement.
  • Integrations with Monitoring and IT Service Management (ITSM) Tools: Connects with a wide array of observability platforms, logging tools, and ITSM solutions to create a unified incident management workflow.
  • Mobile App: Dedicated mobile applications for iOS and Android provide on-call teams with the ability to receive, acknowledge, manage, and resolve incidents from anywhere.
  • API and Webhook Support: Offers a programmatic interface for sending alerts, managing incidents, and integrating with custom tools, enhancing extensibility. Learn more about the VictorOps API documentation.

Pricing

Splunk On-Call (formerly VictorOps) offers tiered pricing based on features and user count. All plans are billed annually. A free trial is typically available for evaluation.

Plan Price per User/Month (billed annually) Key Features
Standard $10 On-call scheduling, alert routing, incident timelines, 100+ integrations, basic reporting.
Advanced Contact Sales All Standard features, plus advanced reporting, automated runbooks, service dependencies, custom roles, dedicated support.
Enterprise Contact Sales All Advanced features, plus unlimited custom roles, advanced security and compliance, premium support, single sign-on (SSO).

Pricing as of May 2026. For the most current details, refer to the Splunk On-Call pricing page.

Common integrations

  • Monitoring Tools: Datadog, New Relic, Prometheus, Grafana, AWS CloudWatch, Nagios, Zabbix. (See Splunk On-Call integrations list)
  • Logging and APM: Splunk Enterprise, LogRhythm, Sentry, Dynatrace.
  • Communication Platforms: Slack, Microsoft Teams, PagerDuty (for specific workflows), Jira Service Management.
  • Cloud Providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure.
  • ITSM and Ticketing: Jira, ServiceNow, Zendesk.

Alternatives

  • PagerDuty: A widely used incident management platform offering on-call scheduling, alerting, and incident response automation.
  • Opsgenie (Atlassian): Provides similar capabilities for alert management, on-call scheduling, and incident communication, often integrated with Jira.
  • Grafana OnCall: An open-source friendly on-call management system, part of the Grafana ecosystem, focusing on alert routing and incident response.
  • DigitalOcean Monitoring and Alerts: Offers basic alerting and notification capabilities for resources hosted on DigitalOcean.

Getting started

While VictorOps (now Splunk On-Call) primarily focuses on alert routing from existing monitoring tools, you can manually send test alerts or integrate a basic script to simulate an incident. Here's an example using curl to send a test alert to a Splunk On-Call REST Endpoint, which requires a configured API key and routing key.

First, ensure you have a Splunk On-Call REST Endpoint created in your service with a routing key.

# Replace YOUR_API_KEY and YOUR_ROUTING_KEY with your actual values
# The message_type can be CRITICAL, WARNING, INFO, ACK, RESOLVE

curl -X POST 'https://alert.victorops.com/integrations/vmd/20140401/alert/YOUR_API_KEY/YOUR_ROUTING_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "message_type": "CRITICAL",
  "entity_id": "my-app-server-01",
  "state_message": "High CPU usage detected on production server",
  "monitoring_tool": "cloudpicker-monitor",
  "severity": "critical",
  "host": "prod-web-01",
  "service": "web-application",
  "incident_number": "INC-20260509-001",
  "details": {
    "cpu_usage": "95%",
    "threshold": "80%",
    "link_to_dashboard": "http://dashboard.example.com/server-status"
  }
}'

This curl command sends a JSON payload to your Splunk On-Call REST Endpoint. The message_type field determines the initial state of the incident (e.g., CRITICAL will trigger an alert based on your routing policies). The entity_id helps track the specific component experiencing the issue. Additional details can be included to provide context to the on-call engineer, aiding in quicker diagnosis and resolution.