DATA CENTRE MANAGEMENT FEATURE: AVERT DISASTER TOMORROW

 Is it time to tighten-up your maintenance approach?

When it comes to ensuring your data centre is resilient, regular maintenance of your UPS is essential. The maintenance process is designed to minimise risk and keep your UPS operating in a fail-safe, efficient manner. So far, so good but, what happens if the very act of carrying out maintenance poses a risk itself? What checks and balances can you put in place to ensure peace of mind and a watertight approach?

Leo Craig, general manager at Riello UPS, talks to Data Centre Management

Minimise human error

As British Airways discovered to their cost in the summer of 2017, human error is the main cause of problems occurring during UPS maintenance procedures; engineers may throw a wrong switch, or carry out a procedure in the wrong order. But, whilst it’s easy to lay blame solely at the feet of the engineer in these instances, errors of this kind are often the result of poor operational procedures, poor labelling or even poor training. By ironing out these issues at the start of a UPS installation, risks can be avoided.

For example, if the system being installed is a critical system comprising large UPS’s in parallel and a complex switchgear panel, castell interlocks should be incorporated into the design. Castell interlocks force the user to switch in a controlled and safe fashion, but are often left out of the design to save costs at the start of the project.

Simple things can make a difference. By ensuring that basic labelling and switching schematics are up-to-date, disaster can be averted. Having clearly documented switching procedures available is recommended. If the site is extremely critical, the procedure of Pilot – Co Pilot (two engineers both check the procedure before carrying out each action) will prevent most human errors.

Embrace technology

Any maintenance is typically intrusive into the UPS or switchgear, so reducing this is always a good thing. Most problems arising, including the failure of electrical components, are proceeded with an increase in heat. If a connect point isn’t tightened properly, for example, it will start to heat up and eventually fail in some way. Short of checking every connection physically, the most effective solution is thermal imaging. Thermal image technology can identify potential issues that wouldn’t necessarily be picked up using conventional techniques, without the need of physical intervention.

Monitor equipment and competency

Round-the-clock equipment monitoring also offers robust protection and should be part of your maintenance package.  Rigorous training is also vital, as is ensuring that the attending engineer can carry out the work competently. Never be afraid to ask questions of your maintenance provider – it is your responsibility to request proof of competency levels – pertaining both to the company itself and the engineers it uses. And always check ‘on the day’ that the engineer on site is competent and isn’t a last-minute sub-contractor sent in because the original engineer is off sick.

Read the small print

A strong maintenance package should ensure that when the UPS does fail, the response is timely and effective. Service level agreements need to be appropriate to the criticality of the application. There is no point having a maintenance contract for a UPS 24/7 response if access to the UPS can

 

only be gained during normal business hours. Transversely, if operations are 24/7 and very critical to the business, then 24/7 response is a must.

Be clear on exactly what the ‘response’ constitutes – will it just be a phone call or will it be someone coming to site, and, if so, will that someone be a competent engineer?

Review today, protect tomorrow

Undertaking a review of your current UPS maintenance procedure will help to identify and reduce risk to critical operations, that you may not have previously anticipated. By applying an extra level of due diligence today, you can help to avert disaster tomorrow.