Incident Management

From IT Process Wiki
Jump to navigation Jump to search

share this page on LinkedInshare this page on Twittershare this page
diese Seite auf Deutschesta página en españolDE - ES - Incident Management


Objective: Incident Management aims to manage the lifecycle of all Incidents (unplanned interruptions or reductions in quality of IT services). The primary objective of this ITIL process is to return the IT service to users as quickly as possible.

Part of: Service Operation

Process Owner: Incident Manager

 

ITIL 4 Incident Management

The Incident Management process described here (fig. 1) follows the specifications of ITIL V3, where Incident Management is a process in the service lifecycle stage of Service Operation.

ITIL V4 is no longer prescriptive about processes but shifts the focus on 34 'practices', giving organizations more freedom to define tailor-made processes.

ITIL 4 therefore refers to Incident Management as a service management practice, describing the key activities, inputs, outputs and roles. Based on this guidance, organizations are advised to design a process for managing Incidents in line with their specific requirements.

Since the processes defined in ITIL V3 have not been invalidated with the introduction of ITIL V4, organizations can still use the ITIL V3 process of Incident Management as a template.

In our YaSM Service Management Wiki we describe a leaner set of 19 service management processes that are more in tune with ITIL 4 and its focus on simplicity and "just enough process". The YaSM service management model includes a process for managing incidents that is a good starting point for organizations that wish to adopt ITIL 4.

 

Process Description

ITIL distinguishes between Incidents (service interruptions) and Service Requests (customer or user requests that do not represent a service disruption, such as a password reset). Service interruptions are handled through Incident Management, and Service Requests through Request Fulfilment.

The Incident Management process can be triggered in various ways: A user, customer or supplier may report an issue, technical staff may notice a (potential or actual) failure, or an Incident may be raised automatically by an event monitoring system.

All Incidents should be logged as Incident Records, where their status can be tracked, and a complete historical record maintained. Initial categorization and prioritization of Incidents is a critical step for determining how the Incident will be handled and how much time is available for its resolution (see checklist Incident Prioritization Guideline).

If possible, Incidents should be matched to other Incidents, Problems and Known Errors.

Organizations should use automated resolution tools and provide support portals with self-help information so users can resolve simple Incidents themselves. For other Incidents, 1st Level Support will try to diagnose and resolve the issue, typically using information from a knowledge base or pre-defined Incident Models.

If 1st Level Support is unable to resolve an Incident, it must be escalated to an appropriate specialist support group in 2nd Level Support ("functional escalation"). If required, 2nd Level Support may in turn involve external parties such as suppliers and vendors (in ITIL referred to as "3rd Level Support").

ITIL defines a special process for dealing with Major Incidents (emergencies that affect business-critical services and require immediate attention). Major Incidents typically require a temporary Major Incident Team to identify and implement the resolution.

Once Incidents are resolved, 1st Level Support will formally close them. This includes verifying that the users are satisfied and ensuring that the Incident Record is fully documented (see Incident Closure and Evaluation). Any new Problems, Workarounds or Known Errors identified during Incident resolution should be forwarded to the Problem Management process.

Incident Management interfaces with a number of other ITIL processes:

The overview diagram of 'ITIL Incident Management' (fig. 1) shows the key information flows and interfaces of the process.

 

ITIL 4 refers to "Incident management" as a service management practice (see above). The service desk activities are described in the ITIL4 practice of "Service desk".

 

Sub-Processes

These are the ITIL Incident Management sub-processes and their process objectives:

 

Incident Management Support

  • Process Objective: ITIL Incident Management Support aims to provide and maintain the tools, processes, skills and rules for an effective and efficient handling of Incidents.


Incident Logging and Categorization

  • Process Objective: To record and prioritize the Incident with appropriate diligence, in order to facilitate a swift and effective resolution.


Immediate Incident Resolution by 1st Level Support

  • Process Objective: To solve an Incident (service interruption) within the agreed time schedule. The aim is the fast recovery of the IT service, where necessary with the aid of a Workaround. As soon as it becomes clear that 1st Level Support is not able to resolve the Incident itself or when target times for 1st level resolution are exceeded, the Incident is transferred to a suitable group within 2nd Level Support.


Incident Resolution by 2nd Level Support

  • Process Objective: To solve an Incident (service interruption) within the agreed time schedule. The aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the error-correction transferred to Problem Management.


Handling of Major Incidents

  • Process Objective: To resolve a Major Incident. Major Incidents cause serious interruptions of business activities and must be resolved with greater urgency. The aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the error-correction transferred to Problem Management.


Incident Monitoring and Escalation

  • Process Objective: To continuously monitor the processing status of outstanding Incidents, so that counter-measures may be introduced as soon as possible if service levels are likely to be breached.


Incident Closure and Evaluation

  • Process Objective: To submit the Incident Record to a final quality control before it is closed. The aim is to make sure that the Incident is actually resolved and that all information required to describe the Incident's life-cycle is supplied in sufficient detail. In addition to this, findings from the resolution of the Incident are to be recorded for future use.


Pro-Active User Information

  • Process Objective: To inform users of service failures as soon as these are known to the Service Desk, so that users are in a position to adjust themselves to interruptions. Proactive user information also aims to reduce the number of inquiries by users. This process is also responsible for distributing other information to users, e.g. security alerts.


Incident Management Reporting

  • Process Objective: ITIL Incident Management Reporting aims to supply Incident-related information to the other Service Management processes, and to ensure that that improvement potentials are derived from past Incidents.

 

Definitions

The following ITIL terms and acronyms (information objects) are used in the ITIL Incident Management process to represent process outputs and inputs:

 

Incident

  • An Incident is defined as an unplanned interruption or reduction in quality of an IT service (a Service Interruption).


Incident Escalation Rules

  • A set of rules defining a hierarchy for escalating Incidents, and triggers which lead to escalations. Triggers are usually based on Incident severity and resolution times. See also: Checklist Incident Priority


Incident Management Report

  • A report supplying Incident-related information to the other Service Management processes.


Incident Model

  • An Incident Model contains the pre-defined steps that should be taken for dealing with a particular type of Incident. This is a way to ensure that routinely occurring Incidents are handled efficiently and effectively.


Incident Prioritization Guideline


Incident Record

  • A set of data with all details of an Incident, documenting the history of the Incident from registration to closure. An Incident is defined as an unplanned interruption or reduction in quality of an IT service. Every event that could potentially impair an IT service in the future is also an Incident (e.g. the failure of one hard-drive of a set of mirrored drives). See also: ITIL Checklist Incident Record


Incident Status Information

  • A message containing the present status of an Incident sent to a user who earlier reported a service interruption. Status information is typically provided to users at various points during an Incident's lifecycle.


Major Incident


Major Incident Review

  • A Major Incident Review takes place after a Major Incident has occurred. The review documents the Incident's underlying causes (if known) and the complete resolution history, and identifies opportunities for improving the handling of future Major Incidents.


Notification of Service Failure

  • The reporting of a service failure to the Service Desk, for example by a user via telephone or e-mail, or by a system monitoring tool.


Pro-Active User Information

  • A notification to users of existing or imminent service failures even if the users are not yet aware of the interruptions, so that users are in a position to prepare themselves for a period of service unavailability.


Status Inquiry

  • An inquiry regarding the present status of an Incident or Service Request, usually from a user who earlier reported an Incident or submitted a request.


Support Request

  • A request to support the resolution of an Incident or Problem, usually issued from the Incident or Problem Management processes when further assistance is needed from technical experts.


User Escalation

  • Escalation regarding the processing of an Incident or Service Request, initiated by a user experiencing delays or a failure to restore their services.


User FAQs

  • Self-help information for users supplied by the Service Desk, usually as part of the Support Pages on the intranet.

 

Templates | KPIs

 

Roles | Responsibilities

Incident Manager - Process Owner

  • The Incident Manager is responsible for the effective implementation of the Incident Management process and carries out the corresponding reporting. He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels.


1st Level Support

  • The responsibility of 1st Level Support is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also keeps users informed about their Incidents' status at agreed intervals.


2nd Level Support

  • 2nd Level Support takes over Incidents which cannot be solved immediately with the means of 1st Level Support. If necessary, it will request external support, e.g. from software or hardware manufacturers. The aim is to restore a failed IT service as quickly as possible. If no solution can be found, the 2nd Level Support passes on the Incident to Problem Management.


3rd Level Support

  • 3rd Level Support is typically located at hardware or software manufacturers (third-party suppliers). Its services are requested by 2nd Level Support if required for solving an Incident. The aim is to restore a failed IT Service as quickly as possible.


Major Incident Team

  • A dynamically established team of IT managers and technical experts, usually under the leadership of the Incident Manager, formulated to concentrate on the resolution of a Major Incident.

 

Responsibility Matrix: ITIL Incident Management
ITIL Role / Sub-Process [Details] Incident Manager 1st Level Support 2nd Level Support Major Incident Team Applications Analyst[3] Technical Analyst[3] IT Operator[3]
Incident Management Support A[1]R[2] - - - - - -
Incident Logging and Categorization A R - - - - -
Immediate Incident Resolution by 1st Level Support A R - - - - -
Incident Resolution by 2nd Level Support A - R - R[4] R[4] R[4]
Handling of Major Incidents AR R - R - - R
Incident Monitoring and Escalation AR R - - - - -
Incident Closure and Evaluation A R - - - - -
Pro-Active User Information A R - - - - -
Incident Management Reporting AR - - - - - -

 

Remarks

[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Incident Management process.

[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Incident Management.

[3] see → Role descriptions...

[4] In cooperation, as required. 2nd Level Support Groups often include Applications Analysts and/ or Technical Analysts.

 

Example

Video: Introduction - ITIL Process Templates

The introductory ITIL Process Map video shows samples of the ITIL process templates with contents from Service Operation and Incident Management processes, including the

  • high-level view of the ITIL Service Lifecycle (Level 0)
  • overview of the Service Operation process (Level 1)
  • overview of ITIL Incident Management (Level 2)
  • detailed process flow for the process "Incident Management: Incident Resolution by 1st Level Support" (Level 3)

Watch the video: "The ITIL Process Map - Introduction" (10:58 min.)

 

Notes

By:  Stefan Kempter , IT Process Maps.

 

ITIL 4 Incident Management Process Description Sub-Processes Definitions