Random Failures, Systematic Failures and the Systematic Capability

Last edit: 02/08/2023

In Functional safety, Failures are classified as either random (in hardware) or systematic (in hardware or software).

Random Failures are normally attributed to hardware. They are failures occurring at a random time, which result in one or more of degradation of the component capability to perform its scope. Based upon historical data, Random Failures can be characterized by a parameter called Failure Rate λ that we discussed so far. In other words, a random hardware failure involves only the equipment; random failures can occur suddenly without warning or be the outcome of slow deterioration over time. These failures can be characterized by a single reliability parameter, the device failure rate, which can be controlled and managed using an asset integrity program.

Systematic failures are in essence due to mistakes. They can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, or other relevant factors. Examples of causes of systematic failures include human error in the design, manufacture, installation and operation of the hardware. If, for example, a product is used in the wrong environment, the risk of systematic failures exists. A systematic failure involves both the equipment and a human error; systematic failures exist from the time that human errors were made and continue to exist until they are corrected. A systematic failure can be eliminated after being detected, while random hardware failures cannot.

Failures are therefore either random or systematic; the latter can be hidden in the hardware or in the software program. A major difference between random hardware failures and systematic failures, is that system failure rates (or other appropriate measures), arising from random hardware failures, can be predicted with reasonable accuracy, while systematic failures, by their very nature, cannot be accurately predicted. That means, system failure rates arising from random hardware failures can be quantified with reasonable accuracy while those arising from systematic failures cannot be statistically quantified, because the events leading to them cannot easily be predicted. In other words, the reliability parameters of random hardware failures can be estimated from field feedbacks, while it is very difficult to do the same for systematic failures: a qualitative approach is preferred for systematic failures.

Random failures are taken into considerations with the calculation of the different types of failure rates. For Systematic failure the concept of Systematic Capability of a component is used.

A component Systematic Capability is a measure (expressed on a scale of SC 1 to SC 4) of the confidence that the systematic safety integrity of the component meets the requirements of the specified SIL. In other words, a component with Systematic Capability SC 2 can only be used in safety instrumented systems with reliability up to SIL 2, regardless how much redundancy is used for that component. That means, if a component has SC 1 (SIL 1), even if I use two of them in parallel, the maximum reliability level that subsystem can reach is SIL 1.

In order to assess a component systematic capability, the laboratory that estimates it, looks for example into how good was its development process as well how good is its production process.

The concept of systematic capability was introduced in the second edition of IEC 61508. Since the term was not present in the first edition, terminology like “SIL n capable” component, or “SIL n compliant” component appeared in datasheets and documents: which generated confusion! The confusion comes from the fact that, thanks to the concept of Systematic Safety Integrity, it is possible to give a SIL level to a component. That is unusual, since the SIL concept belongs to a safety function and not to a component.

 

With the new edition of IEC 61508 series, that was clarified: a device, with a systematic capability of SIL 2 (SC 2), for example, meets the systematic safety integrity of SIL 2 when applied in accordance with the instructions stated by its manufacturer. That means, even if its failure rates and SFF allow the component to reach SIL 3, it can only be used in a SIL 2 safety subsystem. That also means, even if it is used in redundancy with another component and, together, their subsystem can reach SIL3, the safety system can only reach SIL 2.

Once again: safety integrity means both Hardware safety integrity and Systematic safety integrity. Systematic failures (hardware or software), and consequently Systematic safety integrity, cannot be quantified. On the other hand, Random Hardware Failures usually can.

A Safety Control System needs a certain level of Safety Integrity in order to be “reliable” or, in other terms, to be “immune” from both Systematic and Random failures.

Hardware random failures are quantifiable and are taken into considerations thanks to values given by the component manufacturer, like Failure Rates, MTTFD and PFHD. The issue is how to tackle the Systematic Failures. That is done by guaranteeing a certain level of Systematic Capability, that is the terminology used in IEC 61508. The Systematic Capability applies to a safety component with respect to its confidence that the Systematic Safety Integrity meets the requirements of the specified Safety Integrity Level (SIL).