Rebel with a Cause
At the user level, the systems powering the business applications need to be the ever available, fully optimal, always-on magic behind the scenes that demands nothing of the user and is devoted to a life of service, duly processing jobs efficiently, with a minimum of fuss. As most System Managers would agree, when this utopian vision starts to fray around the edges, a fair amount of finger pointing is levied at the system as the ‘cause’ of issues that are felt directly on the user level, regardless of the user actions that may or may not have contributed. The system, as such, is not necessarily to blame, it is only guilty of not controlling the rebellious jobs in its charge. Rebel jobs, when provoked, have the potential to do things they should not. Rebel jobs loop, become inactive or gorge themselves on temporary storage. Rebel jobs gang up to cause mischief and hide out in obscure subsystems where they can avoid detection.
System Managers may be well aware of the chaos rebel jobs cause, the evidence of their rampage can be seen in user complaints, important jobs that do not run or resources that are being consumed. The challenge for them is to get the inside track on how to identify and round up the trouble makers before the damage is done. When faced with a system or network containing hundreds of thousands of jobs, this can become a task of overwhelming proportions – in terms of time and money.
Calculating the Cost
A system without an efficient means of identifying and resolving problematic jobs shifts the burden of these tasks onto what is likely an already stretched team. The team is forced to respond to problems as they occur and sift through the masses of jobs to find the culprit and ultimately, resolve the issue. Jobs that go awry can so readily impact the system environment around them. Anomalies in these environmental conditions or in routine processes involving the job often can be the first sign of job issues, that left unchecked or unattended, add up to a significant financial loss over an annual period. Two essential considerations for job monitoring include:
- Job Performance
- Job Status
Case Study
Company x is a large Financial Services organization that is struggling with job issues on their centrally managed System i network. The network supports 21,000 users nationwide and the company generates $4.2 billion in revenues annually. In a review to outline the extent of this issue, they calculated the associated financial impact over the previous year’s operations. The cumulative effect of these problems throughout the network and the impact on users’ productive time was far more than they originally thought.
The Usual Suspects – Users Vs Jobs Pt 1
Whilst systems differ enormously from company to company, there are a few jobs that seem to attract problems and utilize vast amounts of resource until they are detected, regardless of the system set up. Managers will be well aware of them and the legacy of trouble they have caused. If such jobs are well known, then it is a significant benefit to attach appropriate automation on the back of identifying a problem job. CCSS has designed a specific job check feature for just such occasions. MONCHKJCP runs in the background and keeps an eye on potentially problematic jobs (or jobs that would cause substantial issues if problems went undetected). This command checks the CPU usage of the job and then takes the appropriate action, e.g. hold, lower its priority or take no action. In this case, Managers have the option to configure the level of CPU usage at which action will be taken. To specify further, Managers can include or exclude generics and users within the check.
A good example of this check saving managers endless hours of investigation is when a user logs on to the iSeries then switches off the PC, believing he is ‘logged out’. The user may have entered the iSeries under the generic QUSER profile and as there could be potentially hundreds of QUSER’s on any given system (let alone network), the task to hunt down the particular culprit that is consuming resource becomes a painfully drawn out and expensive process.