Chapter 1 - The Basics of Reliability Engineering

Last edit: 11/08/2023

In Chapter 1 the basics of Reliability Engineering are summarised. The concept of Random and Systematic failure is detailed, because Functional safety is not only about Reliability Data and Formulas (the random part of failures), but also about a correct design, engineering, production and maintenance of a Safety System.

If the electrical control panel is not correctly designed (systematic failure), all the assumptions and calculations about random failures become meaningless. The R(t), Reliability function is the starting point; F(t) = 1-R(t) is the unreliability function. PFHD and PFDavg, the key parameters, are based upon the unreliability function F(t). The key concepts needed for a correct understanding of those two parameters are presented: the Failure Rates, the Mean Time to Failure (MTTF), the importance of a constant failure rate, the Weibull distribution and the Markov modelling.

Eventually, safety systems working in high and low demand mode of operation are presented. If you are not so interested in the mathematical background of functional safety, you can start directly with the next chapter.


Hereafter some excerpts from the chapter.

1.1.1 Safety Critical Systems
A part of the Reliability studies deals with Safety Critical Systems. Those are systems whose failure could result in the loss of lives or significant damage to properties or to the environment [58].
In the 1970s, the design principles of safety-critical systems, both in Machinery and in the process Industry, were the following:

  • Single-channel system (no redundancy). This architecture would be regarded as a basic design having minimum safety performance.
  • Dual-channel system (redundancy) applicable to sensors, for example pressure switches, logic units, and final elements, like contactors and valves.
  • 2 out of 3 voting systems (2oo3). Those systems were used originally in the petrochemical industry: they give a good level of both Reliability and of Availability. Reliability measures
    the ability of a system to function correctly, whereas Availability measures how often the system is available for use, even though it may not be functioning correctly. For example, a server may run forever and so have ideal Availability, but may be unreliable, with frequent data corruption.
  • All systems were using the concept of Fail Safe: a failure in any part of the system would lead to a safe state of the process or the machinery under control.

In the 1990s, a part of the Reliability of Critical System studies became known as Functional Safety and focussed on Electrical, Electronic, and Programmable Electronic (E/E/PE) systems.
The reference standard became the IEC 61508 series.


1.13 Logical and Physical Representation of a Safety Function
The Blocks used to represent a Safety Function are a logical view of the subsystem architectures.
Blocks can be in a Series configuration (i.e. any failure of a block causes the failure of the relevant safety function) or in a Parallel configuration (i.e. coincident block failures are necessary
for the relevant safety function to fail). However, they do not necessarily represent a specific physical connection scheme.
A Hardware Fault Tolerance of 1 is represented by parallel subsystem elements or blocks, but the corresponding physical connections will depend upon the application of the subsystem.