CSC365 : Critical Systems Engineering

 

Introduction

This course covers the important topic of critical systems which are systems where the failure costs are particularly high. These systems are not just software systems but are more general socio-technical computer-based systems incorporating hardware, software and people.

It is an important course because of the increasing dependence of our national infrastructure on computer-based systems and on the now almost universal use of computer-based control systems in all kinds of electrical and electronic device from microwave ovens to aircraft engines.

The course is an introductory course whose aim is to introduce you to the problems of developing critical systems and to some of the techniques that can be used in the development of these systems. It is organised as 10 weekly class sessions where each class session lasts 2 hours (we do have a comfort break in the middle) and has 2 components. The first hour or so is (normally) a conventional lecture on the topics that I have summarised below. The 2nd part of the session (normally a bit less than an hour) is focused around some example. This may either be a running example which is used in a number of lectures (a portable insulin pump) or may be based on real examples of critical systems and critical systems failures. Among the real systems that I look at are the Space Shuttle, the Ariane 5 rocket and the London Ambulance Control System.

Course topics last either 1 or 2 weeks. For each course topic, there is an introductory page which includes a short-list of self-assessment questions. The answers to these questions are all in the lecture notes or in the associated chapters of the course textbook.

Textbook: 'Software Engineering, 6th edition'. Ian Sommerville, Addison Wesley, 1996.
The relevant chapters are chapters 16, 17, 18 and 21. You should also understand the material in chapter 13, Real-time software design.

Knowledge of Z as a formal specification notation is assumed.

Links

Lecture Summary

Lecture 1 - Introduction

A general introduction to the concept of critical systems. These are systems where failure may jeopordise people or the environment (safety-critical systems),  the achievement of some organisational goal (mission-critical systems) or the successful functioning of a business (business-critical systems).

Critical Systems - intro page and self-assessment questions

Lecture 2 - Dependability

Introduces the key dependability attributes of a system. These are availability, reliability, safety and security. These are non-functional system attributes i.e. they are emergent properties of the system as a whole and they are not concerned with the functions provided by the system.

Dependability - intro page and self-assessment questions

Lectures 3 and 4 - Critical Systems Specification

In these two lectures, I will introduce techniques for the specification of critical systems. I will discuss metrics that may be used for reliability specification, derivative requirements that must be included so that a system meets its dependability requirements and the use of hazard analysis to derive specifications for safety-critical systems. The case studies focus on the insulin delivery systems and I will discuss its operational specification and its dependability specification.

Critical Systems Specification - intro page and self-assessment questions

Lectures 5 and 6 - Critical Systems Development

In these two lectures, I will discuss development techniques for critical systems. The two principal strategies that I will cover is fault minimisation and fault tolerance. Under fault minimisation, I will discuss programming language constructs that should be avoided as they are potentially error-prone and approaches to programming that reduce the probability of errors. Under fault tolerance, I discuss exception management, defensive programming and fault-tolerant architectures. In the case studies, I will cover the architecture of the Airbus A330/340 flight control system and will introduce the notion of survivability with a case study of a mental health support system.

Critical Systems Development - intro page and self-assessment questions

Lectures 7 and 8 - Critical Systems Validation

In these lectures, I look at static and dynamic techniques for dependability validation. In the first lecture, I focus mostly on safety validation and the use of reviews and safety proofs in this process. In the 2nd lecture, I cover reliability validation using statistical testing. I discuss operational profiles and reliability growth modelling in this lecture.

Critical Systems Validation - intro page and self-assessment questions

Lecture 9 - Human and Organisational Factors

The dependability of a system does not just depend on the computers and other equipment. It also depends on the ways in which these are used and the approaches used to avoid and recover from operator errors. These lectures are are introduction to human and organisational factors and how these relate to system dependability. In the 2nd lecture, I discuss the failure of the automated despatching system for the London Ambulance Service which was a classic example of how failure to consider human and organisational issues resulted in complete system failure.

Human and organisational factors - intro page and self-assessment questions

Lecture 10 - Ethics and Professional Issues

Using a video of the crash of an Airbus 320 which may have been related to software failure, I discuss the ethical issues that have to be taken into account when developing critical systems. This session will be more of a discussion than a lecture and so there are no associated slides. I have produced some brief notes on some key issues.

Notes on ethics and professional issues

Remember - Everything that has been distributed is examinable. Papers as well as lecture notes