Find White Papers
Home About Contact Help
Free Membership Member Login
Search the Library                  Advanced Search

IT Service Management Metrics That Matter

Tripwire
By : Tripwire
INFORMATION
Published : Nov 07, 2007
Length : 6
Type : White Paper
 
Download Now
Save for Later
  Email This Page
Overview :
High performing IT organizations didn't get that way by accident, as revealed in a recent benchmark study by the IT Process Institute (ITPI). They operate on specific controls and measurements that prove significant to overall IT performance. Other organizations can also achieve better service management by replicating the practices of high performers.  Measuring against the four factors that have a significant impact on the operations of IT organizations, companies can develop a baseline from which to improve performance.

This whitepaper, authored by Tripwire CTO Gene Kim, provides insights on how companies can improve their performance:
  • The two foundational IT controls that keep the high-performance engine running smoothly
  • The four metrics that matter in ensuring IT performance
  • The ROI of implementing a culture of change management and causality
Download this whitepaper and learn from Gene Kim, well-known authority on IT processes, about the metrics that matter most toward improving operational results, and which two controls any organization can adopt that will put them on path to high performance.

View All Items By This Company
Browse Related Categories :

Change Management

,

High Availability

,

IT Management

,

ITIL

,

Network Performance

,

Network Performance Management

,

Networking

,

Service Management

,

Software Compliance

 

Do you think great IT performance is achieved through luck or chance? You can bet real money it’s not! High performing organizations have figured out which processes and controls really help them achieve their operational effectiveness and efficiency objectives. They have integrated those processes and controls into how they manage almost every aspect of their daily work, helping them achieve their business goals and find variance before it causes a catastrophic outage, failed change, security incident or something that can impact the customer.


High performing organizations, as identified by the landmark IT Process Institute® (ITPI) IT Controls Performance Study published in April 2006, are able to achieve these results by implementing and enforcing two controls. These two controls, which every high performer practiced, but none of the medium and low performers did, are:


1. Actively monitor systems for unauthorized change
2. Have defined consequences for intentional, unauthorized changes
These two controls help high performers foster their desired culture of change and causality, and provide a mechanism for key metrics that determine their efficiency and effectiveness. They also make the difference between high performing organizations and those that struggle.


The ITPI study, which was conducted in cooperation with Carnegie Mellon University, Florida State University and University of Oregon, identified the key metrics, which I call “Metrics That Matter,” that provide definitive guidance on where to start with IT best practices, and that give the highest rate of performance return for your organization. These metrics have significant impact on your organization’s ability to control system availability, compliance, risk and operational performance. These metrics are Mean Time to Repair, First Fix Rate, Change Success Rate, and Server to System Administration Ratio.

Mean Time to Repair
High performers know that 80% of all outages are due to a change, and that 80% of mean time to repair (MTTR) is spent trying to figure out what changed. Therefore, the first question that high performers ask when a system outage occurs is “What changed?” Contrast this behavior to how low performers work. When a system goes down, the first thing they do is reboot the server in question. If that doesn’t work, they’ll reboot the server next to it. That didn’t work? Reboot all the servers! Still not working? Reboot the firewall.


The two extremes of diagnosing and resolving outages have a dramatic impact on how quickly the problem will be found and how long the outage will last. It also serves as an incredibly accurate predictor of the processes, procedures and controls the IT organization will have in place. Analyzing the MTTR of the high, medium and low performers revealed some truly startling insights. The following figures show the MTTR of high, medium and low performers for small, medium and large outages.


For small incidents, all performers experienced similar MTTR rates. These are outages that typically require one to three people to fix, and all incidents are usually resolved in 15 minutes or less.


For medium severity outages (requiring up to eight people to fix), there starts to appear a growing difference in MTTR between the three groups. High performers are almost always able to resolve the issue in minutes; medium performers’ resolution times begin creeping into minutes and hours.


In large outages, the differences are significant. High performers again resolve issues in minutes or hours, but medium performers resolve issues in a low number of hours. Low performers resolve issues in a much higher number of hours, sometimes taking even days to fix a problem.


Considering that large incidents are mobilizing somewhere between 25 to 50 people, low performers are sustaining the mobilization of an “all hands on deck” situation for a considerable portion of the workday. The level of disruption that this causes for an IT organization is difficult to overstate. While outages will occur, the frequency of incidents can be reduced, the problems quickly fixed, and the duration of the outage shortened, if change control is avidly practiced and enforced.

Search the Library                  Advanced Search
About Us Contact Us List Your Papers Partner With Us Site Map