Menu
1 Copy the BMC Remedy ITSM PDF files to a folder. 2 Open Adobe Acrobat. 3 Choose Edit = Search 4 In the Search PDF pane, type the word or phrase to search for. 5 Select the All PDF Documents in option and browse to the folder containing the PDF files for BMC Remedy ITSM. 6 Click Search. ITSM Process Description- Incident Management 3 1. Introduction The purpose of this document is to provide a general overview of the Office of Information Technology (OIT) Incident Management Process. It includes Incident Management goals, objectives, scope, benefits.
In this tutorial, we will discuss the ITIL Problem Management Process. In this chapter, you will learn what is a problem in ITIL? and the Definition, Objective, Scope, Lifecycle, Activities, Roles, and Sub-Process of Problem Management - ITIL V3 Process. Moreover, we will also discuss the concept of ITIL Proactive Problem Management and Reactive Problem Management.
What is ITIL Problem Management Process?
Problem Management is one of the main processes under Service Operation module of ITIL framework.
ITIL Problem Management Process is responsible for managing the lifecycle of all problems that happen or could happen in an IT service.
Though the ITIL Problem Management process is closely related to managing incidents, it is a step beyond Incident Management. The Problem Management and Incident Management processes are so much similar in nature that in many organizations they are combined together and handled by the same team.
Although both the processes look very similar from outside, the true difference lies in their inner objectives. While the Incident Management works to restore the affected services to their normal state, the Problem Management (ITIL V3) works to find & resolve the root cause. For purpose of understanding, you may think the problem as a Disease and Incidents as the Symptoms of that Disease.
What is a Problem in ITIL?
Before going deep into the ITIL Problem Management Process, let us first understand that “what is a problem?”
In ITIL V3, the term “problem” refers to one or more related incidents for which root cause is yet to be identified. As officially defined by ITIL v3 documentation, a ‘problem’ is an underlying cause of one or more incidents.
Small incidents of consumable resources, such as the mouse or keyboard issues are not considered as a problem. But incidents like repeated network outages, repeated failure of server hardware/applications are considered as problems and investigated by Problem Management.
ITIL Problem Management Objective:
The primary objective of ITIL Problem Management Process is to prevent incidents from happening, and to minimize the impact of incidents that cannot be prevented.
Some other important objectives of this process are as follows:
- Find the root cause of any problem.
- Resolve all problems as fast as possible (at least according to agreed service levels) and monitor the effectiveness of the implemented solution.
- Proactively prevent the reoccurrence of incidents based upon underlying problems, taking into account data of Incident Management and problem suspicions.
- Maintain information about problems and the appropriate workarounds and resolutions.
ITIL Problem Management Purpose:
If an incident is occurring periodically and service desk is not able to provide a permanent solution, the issue is transferred to problem management. The purpose of transferring issues to Problem Management is to identify, troubleshoot, resolve, and document the root causes of repeated incidents.
As described in ITIL, problem management provides the service desk with the known error (KEDB Entry) and workaround information necessary to mitigate issues in the short term.
So, we can also realize that another important purpose of ITIL Problem management is to reduce the frequency of incidents over the long term. The reduction in total number of Incident reduce the load on the service desk, improves customer/user satisfaction, and decreases the long-term costs associated with downtime.
In case if any problem cannot be solved immediately, problem management works jointly the service desk to reduce the impact of the related incidents. The final goal of problem management process (ITIL V3) should always be to bring down the total number of preventable incidents and thereby increase the service quality.
The Scope of ITIL Problem Management Process:
Problem Management is having a very limited scope and activities within the purview of ITIL V3. To fulfill all its objective and purpose it continuously coordinates with other ITIL processes and functions. To identify the scope of ITIL problem management process we will now discuss some of the process interactions:
Service Desk: The most important function that interacts with the problem management. Because of the nature of work done by the Service Desk, they become the central point of contact for both the end-users and other ITIL processes. Hence, in process of providing resolution to any of the reported problems, the problem management team has to work side by side with the service desk.
Change Management: After finding the root cause of a problem, its necessary to fix the root cause so that the issue doesn't occur again. For this reason, sometimes it becomes necessary to make some changes to the service/component. Hence, it calls the Change Management Process to achieve this.
Release and Deployment Management: It is called by ITIL problem management process, in case the proposed change requires any new release to be developed and deployed.
Knowledge Management: The KEDB that is created by the problem management is managed and maintained by the knowledge management. Hence, a seamless communication channel has to be created and maintained between this two processes.
ITIL Proactive Problem Management and Reactive Problem Management:
As described by ITIL v3, this process can be divided into two types depending upon the nature of operation they have. They are (i) Reactive Problem Management and (ii) Proactive Problem Management.
(i) ITIL Reactive Problem Management:
This is the most common type of Problem Management we observe within the day-to-day operations. It is the means of finding the root cause of Incidents and solving the problem as quickly as possible. It works as an integral part of ITIL Service Operation.
At the time when incidents occur, incident management starts working on the incident as early as possible to resolve those incidents and restore service to usable levels. Eventually, during this process, some important indications and symptoms about root cause get lost.
So, in order to precisely identify the root cause, there should be a defined and agreed timeframe for the handover process of incidents from the Incident Management to Problem Management.
(ii) ITIL Proactive Problem Management:
It is associated with the activities of identifying and solving problems and known errors before further incidents related to them can reoccur. It often includes reviewing reports from other processes to identify patterns and trends of recurring incident symptoms that may point to any of the underlying problem factors.
Proactive problem management also identifies any training opportunities for IT staff, customers, and end users. At this point, it may also coordinate with Availability Management and Capacity Management for taking actions to prevent potential incidents from happening.
ITIL Problem Management Lifecycle Activities:
The ITIL Problem Management Process describes a ten-step process for managing problems. These are also called as Problem Management Life-cycle activities. Those activities or steps are listed below and usually followed in the sequential order:
(i) Problem Detection: This is the step where the problem is detected. The problem can be detected through any of the following practices:
- Detection or Suspicion of a cause of one or more incidents by the Service Desk
- Analysis of incident by technical support group to find incidents that are occurring repeatedly despite rigorous troubleshooting.
- Automated detection and reporting of infrastructure or application issues by Event Management tools.
- A notification from supplier informing an existing problem that has to be resolved.
(ii) Problem logging: In this step, problems are logged in a Problem Record. The problem record should contain the following information:
- User details
- Service details
- Equipment details
- Priority and categorization details
- Date/time initially logged
- Problem Summary
- Related Incident Tickets
(iii) Problem Categorization: Here problems are assigned to pre-defined categories according to the type, nature, attributes, SLA of the underlying incidents. But the assigned category should match the category of related incidents.
(iv) Problem Prioritization: In this step, priorities are assigned to problems. A problem’s priority is determined in the same way as incidents, by its impact on users & business and its urgency.
Problem prioritization should also consider the severity of the problems, taking into account that how serious the problem is in an infrastructure perspective (or service or customer perspective). Some questions to be asked in this context are:
- Can the system be recovered, or does it need to be replaced?
- How much will it cost?
- How many people, with what type of skills, will be needed to fix the problem?
- How long will it take to fix the problem?
- How extensive is the problem (e.g.. how many CIs are affected?)?
(v) Problem Investigation and Diagnosis: This step involves in the investigation and diagnostics of the reported problems. The speed of the investigation depends on the assigned category and priority. If the identified problem is related to any incidents, then this step analyzes those incident records to find the trend and any possible patterns to identify the root cause.
(vi) Identify Workaround: This step helps to restore the service by identifying any possible workaround. Because problems are usually critical in nature, the problems can take hours or even months to solve permanently.
Workaround helps the organization to restore services to the user even if the original problem is not resolved. The workaround should be considered only as a temporary solution until problem resolved.
(vii) Creating a Known Error Record: Once the workaround is identified, that should be marked as Known Error. It is essential to record a known error Known Error Database (KEDB) and uploaded to Service Knowledge Management System (SKMS).
Documenting the workaround allows the service desk to resolve incidents quickly and avoid further problems being raised on the same issue.
(viii) Problem Resolution: This step is responsible for providing the actual resolution of a problem. It is the means of resolving the underlying cause of a set of incidents and prevents those incidents from reoccurring. After the resolution is identified, the same should be documented in knowledgebase along with steps taken and problem details.
(ix) Problem Closure: It is the means of confirming the permanent resolution of a problem. It should also ensure that the problem record contains full historical detail of all events.
(x) Major Problem Review: In case of Major problems, this step is initiated after the problem closure step. It is an important activity to prevent future problems in future. Furthermore, it verifies whether the Problems marked as closed have actually been eliminated. It is used to review and document the following:
- Those things that were done correctly.
- Those things that were done wrong.
- What could be done better in future?
- How to prevent recurrence?
- Document lessons learned.
The below diagram shows the activities of ITIL Problem Management and also describes the interrelationship between them:
ITIL Problem Management Sub-Process:
As described by ITIL v3, this process is having Seven sub-processes. The Objective and descriptions about those sub-processes are given below, followed by a diagram illustrating the ITIL Problem Management Process flow. Please note that unlike activities, these sub-processes are usually NOT sequential:
1) Proactive Problem Identification:
Responsible for proactively identify Problems and provide suitable workaround before the actual incident occurs. It helps to improve the overall availability of services as well as improving customer satisfaction.
2) Problem Categorization and Prioritization:
Used to record, categorize and prioritize reported Problems. It helps to facilitate a swift and effective resolution.
3) Problem Diagnosis and Resolution:
Responsible for identifying the underlying root cause of a Problem and implement the most appropriate and economical solution to the Problem. This sub-process is also responsible for providing a temporary Workaround if possible. It is a vital sub-process of ITIL problem management, responsible for restoring user services within SLA.
4) Problem and Error Control:
Used to continually monitor outstanding Problems with regards to their processing status so that corrective measures can be introduced whenever required.
5) Problem Closure and Evaluation:
Responsible for ensuring that the Problem Record documents full historical description of the problem and that related Known Error Records are updated. This sub-process is only initiated after getting a successful resolution.
6) Major Problem Review:
In case of Major problems, this step is initiated after the problem closure step. It is an important activity to prevent future problems in future. Furthermore, it verifies whether the Problems marked as closed have actually been eliminated.
7) Problem Management Reporting:
Problem Management Reporting is responsible for communicating outstanding Problems, their processing-status, and existing Workarounds to other IT Service Management processes and as well as to IT Management.
The following image shows the process flow between sub-processes and their interrelationships:
Important Terminologies and Definitions:
Problem Record:
- The document containing all details of a Problem, documenting the history of the Problem from detection to closure.
Workaround:
- Workarounds are temporary solutions provided to users, for reducing or eliminating the impact of Known Errors (and thus Problems) for which a full resolution is not yet available.
- Workarounds are often applied to reduce the impact of Incidents or Problems if their root causes cannot be readily identified or removed.
Known Error:
- Can be defined as a previously recorded problem for which now has a documented Root Cause & a Workaround.
- Known Errors are managed throughout their lifecycle by the Problem Management process.
- Usually, Known Errors are identified by Problem Management, but Known Errors may also be pointed out by other Service Management disciplines, e.g. Incident Management, or by suppliers.
Known Error Database (KEDB):
- It is a database consisting of previous knowledge of requests and known errors.
- It is created by Problem Management and used by Incident and Problem Management to manage all Known Error Records.
- Though KEDB is created by problem management, it is also a part of SKMS (See Also:What is SKMS?)
Problem Management Report:
- A document to report Problem-related information to the other Service Management processes.
ITIL Problem Management Roles and Responsibilities:
Problem Manager:
- This role is the Process Owner of this ITIL Problem Management Process.
- The Problem Manager is responsible for managing the lifecycle of all Problems.
- The primary objective of this problem manager role is to prevent recurrence of incidents and to minimize the impact of Incidents that cannot be prevented.
- This role is also responsible for maintaining information about Known Errors and Workarounds.
We hope that you have enjoyed the above article describing the Problem Management (ITIL V3) process. Be with us to explore free training on Leading Technologies and Certifications.
Leave us some comments if you have any question or doubts about ITIL Problem Management process, we will be happy to help you.
If you like our articles please like our facebook and twitter page to receive notifications on recent and updated contents.
Suggested Reading:
Related
Let’s start with something truly scandalous. In fact, it’s probably the most controversial thing ever written about ITSM, particularly Problem Management. I’m giddy with excitement, imagining the Twitter uproar (please use hashtag #ITSMwtf) this is going to cause.
Hide your kids. Pull the blinds. And don’t say I didn’t warn you. Okay, here goes:
Problem management isn’t about finding and fixing problems.
There, I said it. And I stand behind it. In day-to-day operations, it’s easy to get hyper-focused on root cause analysis and forget the much bigger picture. So let’s take a look at some of the most common obstacles that IT teams run into as they work relentlessly to keep all the alarms and sirens from going off at once. You'll walk away with some great tips not only for troubleshooting problems, but for preventing them altogether.
Problem #1: Falling into the reactive, root-cause trap
When incoming tickets are bombarding you all day long on the front lines of IT, it’s common to fall into an autopilot “find it and fix it” mode. In fact, many standard service desk metrics encourage agents to resolve as many issues as possible, and rightfully so.
So what’s my beef with root cause analysis? Nothing, except that it’s only a fraction of the true responsibility (and opportunity to add value back to the business) of the Problem Management process. As opposed to just reacting to problems, the true purpose of Problem Management is and always will be to prevent recurrence of incidents, so that IT service can be continuous and problem-free.
To do this, we recommend broadening the definition of Problem Management in your organization. Root cause analysis is part of the picture, but here is the full scope of what your Problem Management practice should be held accountable for:
- Preventing recurring incidents, and the service disruptions they can cause
- Keeping the impact of incidents to a minimum when they can’t be prevented altogether
- Updating information about problems and workarounds religiously, and ensuring that agents know where to find it and how to use it.
- Making sure the right processes are followed at every step.
To be fair, this is completely consistent with many best practices frameworks such as ITIL. In practice, though, high ticket volume and limited resources can make it easy to overlook critical functions like updating the knowledge base, or predictive monitoring that can cut the likelihood and severity of future outages by a significant percentage. Don’t fall into that trap! You simply can’t afford to.
Problem #2: Failing to share your problems.
No, I’m not asking you to read a self-help book (nothing in this article should be seen as a substitute for clinical therapy, if needed). Instead, I’m calling out another common misconception in the practice of Problem Management: that it is a single role, or the responsibility of a small subset of your service desk team.
Oh, the shame. Yes, Problem Manager is a role. And although ITIL’s responsibility matrix makes it look like the Problem Manager is both accountable for the process and responsible for doing the work at every step of the way, you’ll note that applications and technical analysts step in and do a ton of heavy lifting throughout the entire problem diagnosis and resolution phase.
Our stance? It takes a village, friends. As an IT team, we collectively reject the idea of a single root cause. Usually, several distinct failures lead up to a problem, so we encourage our specialized teams (networks, infrastructure, virtual hosting, etc.) to attack the challenge from multiple angles. Together, they work as an agile problem response team, uncovering and exploring a variety of theories or potential avenues for resolution.
On the surface, it might seem a bit luxurious or resource intensive. But time and time again, we’ve solved complex incidents that turned out to be culminations of many distinct failures. Without these collaborative work streams, it would have taken us days (instead of hours) to uncover the complexity behind the real root cause.
Our recommendations? First, if you don’t have the resources you need to deploy your own agile response teams during the diagnosis and resolution phase, you’ll need to assert yourself until you do. Trust me: the implications (in service disruption and lost productivity) of not having these resources far outweigh the costs of being prepared.
Second, don’t fence people in. Encourage an open, interactive environment where service desk agents help, mentor, and encourage each other. Trust your highly specialized experts, but ask for contributions and perspective from even your most junior analysts, too. The job is to prevent problems, and minimize the impact of those you can’t prevent altogether. And you’ll get there faster if you your team works together.
Also, don’t underestimate the importance of having the right tools for the job. Atlassian-made or otherwise, you need strong collaboration tools for chatting and sharing knowledge, tracking processes, and auditing your performance.
Problem #3: Asking far too few questions.
I have a bone to pick with a few mechanics I’ve taken my car to recently. Here’s why. While I describe the symptom, they pretend to listen and nod their heads, and then hook my car up to a machine that tells them nothing is wrong. I pay for the “check up,” and two days later, I’m stranded on the side of the road.
At some point, it seems that they forgot how to think and ask questions – most importantly, “why?”
Ironically, it was the auto industry that developed one of the best techniques ever to determining the root cause of a problem. It’s called “The 5 Whys,” and it was pioneered by Sakichi Toyoda and used extensively at the Toyota Motor Corporation.
It’s a simple, brilliant methodology that works just as well in IT as it does in manufacturing. The best way to explain “Five Why’s” is with an example. I stole this one from Wikipedia:
First, state the problem:The vehicle will not start.
Why? - The battery is dead. (First why)
Why? - The alternator is not functioning. (Second why)
Why? - The alternator belt has broken. (Third why)
Why? - The alternator belt was well beyond its useful service life and not replaced. (Fourth why)
Why? - The vehicle was not maintained according to the recommended service schedule. (Fifth why, a root cause)
Unlike in Six Degrees of Kevin Bacon, it’s okay to take more than the prescribed number of steps to get to the answer. If you need seven “whys” to get to the root cause, use seven. Five is just a generally sufficient number that sounds nice for marketing purposes. The point is to take a logical, stepwise approach that encourages troubleshooters to set aside their assumptions and carefully trace the possible causes until they arrive at the root problem.
In fact, we recommend using “Five Whys” as an early exercise within your agile problem response teams, to help identify some of the possible angles you will approach the problem from. Many times, the answers to each “why” can reveal one or more hypotheses that are worth testing and exploring.
Problem #4 Not spreading the knowledge.
And finally, problem #4. You could also call this “Not closing the deal,” because it speaks directly to the tendency of problem management teams to “resolve and run.”
In your own life, when you learn something new or solve a tough problem, you can commit the result to memory so you benefit from it again. This generally works really well unless you are a teenager, in which case, all bets are off.
In a team environment, though, your own memory is the least beneficial place to retain the knowledge you learn. At minimum, it shouldn’t be the only place. Which is exactly where a knowledge base comes in. It’s a centralized place to store and search for articles that can aid in the problem-solving process.
I could write an entire blog post singing high praises for knowledge-centered support, but as luck would have it, Sarah Zorah already did. She tells you why knowledge management should be at the heart of your service desk, and how to get started.
My favorite point from her post: knowledge-based support is not something to do in addition to solving issues. It’s actually the way in which you resolve issues. Writing or updating knowledge base articles, then, isn’t a burdensome extra step preventing you from moving on to the next problem: It’s the most critical step to preventing (or minimizing the impact of) future ones.
If maintaining and updating knowledge base isn’t already a central part of your service desk process, or you’re just missing the right software to make it happen, drop everything and close this gap today. It’s never too late to get started — but you are losing valuable knowledge (and increasing exposure to the business) every extra day you wait.
Practice
My conclusion today is far less scandalous than my intro, I’m afraid. Problem Management, like every discipline of ITSM, is a practice, which means you won’t be inherently perfect at it from the start. By simply looking at it as the sum of its parts – with an eye toward preventing problems, not just troubleshooting them – you’ll be building a much stronger, more sustainable service desk, which leads to happier customers and more profitable business, too. And I see absolutely no problems with that.