Software safety engineering leveson systems computer




















Scroll to main body. Home People Nancy Leveson. Safeware: System Safety and Computers. Leveson, University of Washington, Seattle. If You're an Educator Additional order info. Overview Contents Order Authors Overview. Description We are building systems today-and using computers to control them-that have the potential for large-scale destruction of life and environment. Risk In Modern Society. No technique in use now can guarantee the safety of a design, but some can increase the probability of having a safe design at the end of the development cycle.

Most of the design techniques deal with reducing complexity, promoting system wide understanding, and compensating for the idiosyncrasies of software. Good management is critical to having a successful safety plan. Management must promote the organizations safety culture.

Having a strong safety culture means that everyone in the organization-- from project managers to administrative assistants-- must buy into safety, and be safety conscious while doing their jobs. The safety culture must also extend from the people who designed the product, to the people who operate and maintain the product.

If the safety culture declines, and people become sloppy when doing their jobs, major disasters can result. NASA had become somewhat complacent and overconfident in their systems.

They were on a tight shuttle launch schedule and ignored many of the warnings about potential problems with the booster rockets.

Had a stronger safety culture been in place, NASA may have decided to fix the booster rocket engines rather continue the Challenger launch. Managers should also foster communication between project teams. Every team must understand how their part of the project affects system safety. Without communication, teams may not have a clear enough picture of the system to make informed decisions about safety. Since safety is an emergent property, some of the boundaries between sub-systems must be blurred to help safety.

Also, a great deal of formal analysis must be done to try and catch as many safety situations as possible before the system is implemented. This can only be done with clear communication between groups. The first part of safety design is realizing that safety must be dealt with at the system level. Each design team on a project must be made aware of the safety issues they face, and a small group of engineers with system wide knowledge is needed to monitor safety issues. This group of safety engineers should have an understanding of the system as a whole, and have system safety as their primary goal.

They must also have enough control of the design process to change the design if safety issues arise. Without such power, the safety engineers will be unable to deal with safety problems if those problems conflict with the design schedule or profit margins.

The second principle is that safety should be considered from the start of the design process. Its much easier to change ideas before a system has been built. In other words, the worst, most costly mistakes are made on the first day of design. Changes are always needed in a complex project, but the changes get more expensive later in the design cycle. Every important architectural change should be examined by the safety engineers for possible impact on system safety.

Complexity is the core challenge of any large project. Unfortunately, many of the hazardous systems we create are quite complicated, and our society depends on its nuclear power plants, oil refineries, jet planes, and automobiles. These systems become more complex with every generation. Modern industrial societies seem to drive the trend towards more and more complicated systems, and the system engineers are forced to confront the problems that come with the added complexity.

Complexity affects the project at all levels, and any method that can reduce the complexity of the design should be considered. Safety is an extremely complex property because no one subsystem is responsible for it, and it is completely relative to the environment in which it operates. Their is nothing close to a generalized way to make a system safe, especially in a unique product.

Software has thrown another level of complexity into modern systems because of it is so difficult to verify it works properly. Our ability to correctly design software does not match the power a software system can wield. It may seem that software has no power in a system, but the mechanical devices it controls can produce deadly effects if they are controlled improperly. It is clear, however, that simple systems are easier to make safe then complicated systems. Any technique that can reduce the amount of safety related functionality in a system is useful.

Keeping the system software simple will make verification easier, and may allow the designers to formally verify some parts of the safety system. A simple, clean design is also much easier to modify then a complicated one. Of course sometimes the safest thing to do in an emergency situation is not the simplest, and even if software is correct it may still situations for which it has no correct action.

Identifying the safety critical sections of the software and focusing the development and testing effort on them can be a beneficial in light of schedule problems.

The trouble comes when the wrong safety critical functions are chosen. Software alone should not be depended on to keep a system safe. Diversity in system safety features can help makeup for software's frailties. There are many way to have diversity in a system, and all of them should be considered, if feasible. The goal of diversifying the system is to gain failure independence. That is, if you add redundant systems that fail independently of each other, the overall reliability of a system will increase.

The same principle applies to safety. If you have redundant safety systems that have failure independence, its more likely that one of the systems will work and do the correct thing in an emergency. One method would be to diversify the software. I believe that having two pieces of software implemented with different levels of complexity in mind-- i.

Jet engine controllers could use one loop to be the highly optimized, fuel efficient, software control loop, and also have a very simple, easy to test and possible prove, backup software control loop. Software should also be used in conjunction with other types of safety systems. The most obvious choice is to use mechanical backups along with the safety software.

Mechanical devices fail differently then software and are easier to test. I do not believe there is ever a justifiable reason, at least from a safety standpoint, to remove them. Its usually about cost. There can even be more diversity in the sensing systems. Alarms can be set off not just by redundant temperature sensors, but by temperature or pressure sensors.

The diversity in what each sensor is reading may help the safety systems discover unsafe states more quickly. Use as much diversity in a system as can be afforded. Its one of the most effective ways to increase system safety and reliability.

Few tools exist to help the software engineer cope with safety problems, and many of the traditional methods of software engineering aren't effective. Reliability engineering concentrates on component failure as the cause of accidents and a variety of techniques including redundancy and overdesign are used to minimize them.

As early missile systems showed, however, losses may arise from interactions among system components; serious accidents have occurred when the system components were all functioning exactly as specified. The Mars Polar Lander loss is an example. Each component worked as specified but problems arose in the interactions between the landing leg sensors and the software logic responsible for shutting down the descent engines.

Reliability analysis considers only the possibility of accidents related to failures; it does not investigate potential damage that could result from successful operation of individual components. Software, ubiquitous in space systems today, is an important consideration here. In most software-related accidents, the software operates exactly as intended. Focusing on increasing the reliability with which the software satisfies its requirements will have little impact on system safety.

Reliability and safety may even conflict. Sometimes, in fact, increasing safety can decrease system reliability. Under some conditions, for instance, shutting down a system may be an appropriate way to prevent a hazard.

That increasing reliability can diminish safety may be a little harder to see. For example, increasing the reliability reducing the failure rate of a tank by increasing the burst pressure—to—working pressure ratio may result in worse losses if the tank does rupture at the higher pressure. System safety analyses start from hazards, not failures and failure rates, and include dysfunctional interactions among components and system design errors.

The events leading to an accident may be a complex combination of equipment failure, faulty maintenance, instrumentation and control inadequacies, human actions, design errors, and poor management decision making. All these factors must be considered. System safety emphasizes analysis in addition to past experience and codes of practice. Standards and codes of practice incorporate experience and knowledge about how to reduce hazards, usually accumulated over long periods of time from previous mistakes.

While the use of such standards and learning from experience is essential in all aspects of engineering, including safety, the pace of change today does not always allow for such experience to accumulate. System safety analysis attempts to anticipate and prevent accidents and near misses before they occur, in addition to learning from the past.

Seated at the witness table before the Senate Committee on Aeronautical and Space Services hearing on the Apollo 1 accident are left to right Dr. Robert C. George E. Mueller, associate administrator for Manned Space Flight; and Maj.



0コメント

  • 1000 / 1000