A High Dependability Computing Consortium
James H. Morris, Dean
Carnegie Mellon School of Computer Science
April 13, 2000
This essay suggests that universities, government, and industry should
initiate a long-term research and education program to make computing and
communication systems dependable enough for people to trust with their everyday
lives and livelihoods.
Why is something needed?
We're about to reach the end of what might be known as the golden age of
personal computer software. Like the automobiles of the 1950's, the software of
the 1990's delighted and amused us despite its many flaws and shortcomings. In
the 1950's what was good for the car industry was good for the US¾ an argument that in ways has applied to the software and
"dot com" industries in the 1990's. As with car quality in the 1950's, it is
widely argued that it is a disservice to stockholders to make software more
reliable than the market has demanded. Instead of solid engineering values,
fancy features and horsepower are the two factors used to sell computing
systems. While this euphoric era of desktop computing will be remembered fondly
by many, its days are numbered.
The current era of desktop computing will pass soon, just as it did for
automobiles when the combination of oil shortages and Japanese manufacturing
prowess threw Detroit from the leading force in the economy into part of the
rust belt. It will only be possible to pin down the triggering factors for
demise of the current "golden era" of computer software in retrospect. But, it
seems likely that the shift to a new era will involve factors such as the
globalization of the software business, the adoption of desktop computers as an
essential business tool rather than an occasional productivity enhancer, and the
continuing proliferation of computers into embedded systems that form the new
infrastructure of our society. The issue of what is eventually said to cause the
transition to this new era of computing systems is, however, not as important as
the fact that it is inevitable in our changing world.
A central theme of the new era of computing will be an absolute requirement
for high dependability (but, without the traditionally exorbitant price tag
usually associated with critical systems). Public computing fiascoes such as
probes crashing into Mars or on-line services of all sorts going belly up for a
hours at a time are only the tip of the iceberg of this need. Every one of us
personally experiences computing system meltdowns on a regular basis, and it
would be no surprise if our children develop a quick reflex for pressing
control-alt-delete before they've memorized their multiplication tables. While
stories of bad software killing people are still rare, they exist and may
portend the future. The lessons of the Y2K experience are twofold: such problems
can indeed be overcome by dint of extraordinary effort and expenditures, but
just as importantly, we rely upon computers far more than we fully realize until
we're forced to step back and take notice of the true situation.
The point is that enthusiasm for computers has progressed to the point that
our society is already completely committed to using them, and is becoming
utterly dependent on them working correctly and continuously. But, commercial
computer systems, as we currently build them, simply aren't worthy of our
unreserved trust and confidence.
A number of related, long-term trends draw attention to the need for HDCC,
- Increased reliance on software to optimize everything from business
processes to engine fuel economy
- Relentlessly growing scale and complexity of systems and
- Near-universal reliance on a commodity technology base that is not
specifically designed for dependability
- Growing stress on legacy architectures (both hardware and software) due to
ever-increasing performance demands
- Worldwide interconnectivity of systems
- Continual threats of malicious attacks on critical systems
What is it?
We propose to create a consortium of universities, government agencies, and
corporationsthe High Dependability Computing and Communication Consortium¾ to undertake basic, empirical, and engineering research
aimed at making the creation and maintenance of computer systems a true
professional discipline comparable to civil engineering and medicine¾ disciplines people stake their lives on without question.
It will have a permanent research and education program that transforms
computing practices over the next 50 years. The researchers and educators should
number about 500 and be contributed by the partners.
It is envisioned to have a central base of operations in the San Francisco
Bay Area, but incorporate activities around the country and, as appropriate,
around the world in member organization locations.
The HDCC research agenda embodies four strategic goals.
Protect the Public. We must assure the nation's critical infrastructure
services upon which individual citizens depend. To meet this strategic goal, we
must identify and promote technologies that can increase confidence in the
safety, reliability, trustworthiness, security, timeliness, and survivability of
systems such as transportation systems and communications systems.
Protect the Consumer. We must find cost-effective means to gain assurance
that enables commercial products to meet certain minimum quality standards. This
includes expedited quality certification, validation, and verification;
shortened times to market; simplicity of use; plug-and-play interconnection;
lower lifecycle costs; and improved customer satisfaction. Confidence is needed
in consumer products and services. Such products could include "smart" cars,
medical devices, consumer electronics, business systems, smart houses, sensor
technologies, Global Positioning System (GPS) receivers, smart cards,
educational technologies, electronic commerce software packages, educational
technologies, and digital libraries.
Preserve Competitiveness. Software production is the ultimate
intellectual industry and there are few barriers to entry. Ten years ago we felt
beleaguered because the Japanese engineering culture seemed to be dominating us
in electronics and semi-conductors. Wise men (Gordon Bell, for one) said we must
change the game; and, indeed, we did by making it a software/network game. But
now the game is clear to all and we can expect crushing competition, not only in
price but also in deep ideas. Educating more hackers will not solve our problem;
we must educate new generations of sophisticated software engineers backed by
new science to stay ahead in the global economic race.
Promote National Security. Dependability is most crucial to military
systems that are used to defend our national interests. National security will
require defense-in-depth protection services and assurance that those services
will perform as required. However, economic reality will dictate that these
services be accomplished using largely commercial rather than specifically
The relentless pressure to keep up with "Internet Time" results in most
organizations using ad hoc approaches to survive on a daily basis, with
no time or energy left for long-term investments in surviving the coming months
and years. While such an approach can be made to work in the short term, it is
inherently inadequate at addressing trends over the span of years or decades.
Instead, it is vital that a concerted effort be made to prepare for downstream
problems in a number of key areas. The long-term scope will evolve as
appropriate to address the hard, long-term problems facing us. Current areas
- Use of off-the-shelf components: Most systems now rely heavily on the
use of commercial off-the-shelf (COTS) technology for hardware and/or software
for reasons of cost and time to market. Many current approaches to creating
dependable systems assume complete control and understanding of system
components¾ an assumption that is simply not
representative of the majority of systems that must be built. And, even if
complete understanding of components were possible, the marketplace is such
that components become obsolete and are replaced many times over during the
production and deployment life of many critical systems. New techniques are
urgently needed to create highly dependable systems from "black-box"
components that continually change. Previously useful approaches and simpler
forms of analysis (e.g., old notions of creating components based on
separation of concerns and creating systems based on synthesis rather than
component composition no longer work for every situation).
- Use of complex, non-dependable components: Achieving high confidence
is becoming more difficult as systems become more complex. Today's trends of
large-scale use of component technology, increased integration, continuous
evolution, and larger scale are yielding more complex systems. Furthermore,
such systems are often build of complex components that are not inherently
dependable. Not only is it difficult to get such systems to work in the first
place, but furthermore such systems frequently exhibit unpredictable emergent
behaviors at inopportune moments. New ways to create dependable systems from
complex components are urgently needed.
- Hostile operating environments: Lacking adequate protection, today's
information and communications systems are being subjected to numerous
malicious attacks. New and advanced techniques are required to achieve
required levels of system integrity and availability. Protection against both
active and insider threats must be developed. Methods are needed for system
monitoring, detection, response, and recovery.
- Embedded Systems: Embedded computer systems are arguably both more
difficult to make dependable, and more in need of complete dependability.
Because they often do not have a human operator acting as a safety net,
embedded systems must achieve absolutely bulletproof operation over years or
decades of time. But, because the actual amount of computational power used is
small, such systems are often perceived as easy to build and are often created
by engineers or technicians with no formal training in software engineering or
critical system design. Whereas desktop computers are built in the tens of
millions per year, embedded microcontrollers are produced in the billions¾ soon to be tens of billions per year. The challenge is
how to scale high assurance methods down to the budgets, timelines, and skill
sets prevalent in the embedded system world.
- Ubiquitous critical systems: The days of critical systems being a
niche market are over. Many everyday safety critical systems will soon have or
already have software in them. Consider, for example, a domestic hot water
heating system, which can cause scalding burns if it drifts even a few degrees
higher than its set point. Or, consider an Internet-based stock trading system
that can bankrupt a user who (foolishly) depends on typical response times
being available during a stock market meltdown. As we entrust our lives and
livelihoods to computers, many systems will effectively become critical. A
challenge here is how to proliferate good practice in highly dependable system
design to everyday practitioners rather than a few select critical system
designers in niche fields such as nuclear power and aerospace applications.
- Indirectly critical systems: As computer systems are becoming highly
complex, so is our society. While the number of critical systems is growing,
the number of indirectly critical systems also grows. For example, the
software that routes messages for a personal pager system becomes indirectly
critical when it transmits the page for an emergency room physician to respond
to a crisis. Similarly, database software becomes indirectly critical when it
identifies owners of vehicles subject to an urgent recall notice or is used to
look up emergency contact information. Even a simple word processor can become
mission critical if it crashes a few minutes before the courier pickup
deadline for a proposal submission. It is vital that even everyday, seemingly
non-critical, applications be raised to a higher level of dependability to
reduce the enormous hidden costs their unreliability levies on businesses and
- International markets: The U.S. is not alone in its growing dependence
on computing throughout industries having safety-critical aspects. This is
especially true in transportation, health care, energy, and manufacturing
sectors. However, many areas do not have the technical and labor
infrastructures to support critical system operation. It will be imperative to
create dependable systems that can operate properly even with shortages of
repair parts, scarce availability of skilled operators/maintainers, and
erratically available infrastructure support.
Six research and education activities will contribute to the HDCC strategic
- Provide a sound theoretical, scientific and technological basis for
assured construction of safe, secure systems. To meet this goal, the
research agenda must:
- achieve the capability to specify, compose, analyze, and assess system
- furnish the capability to enforce specific behavioral properties, and
- furnish the capability to be more predictably tolerant of specified
behavioral failures including malicious attack.
These are still hot topics in universities despite the general acceptance of
C (and perhaps, someday, Java) as do-everything programming languages.
Ultimately the proper and reliable functioning of a system depends upon people
describing their designs in a formal specification, namely a language. When the
language is shaky, the entire edifice will be built on a soft foundation.
Special areas of interest include applications of logic, techniques for
designing and implementing programming languages, and formal specification and
verification of hardware and software systems. It is important to apply these
techniques to problems of realistic scale and complexity, for example:
implementation of high speed network communication software and application of
type theoretic principles in the construction of compilers for proof carrying
code. For Carnegie Mellon activities in principles of programming see http://www.cs.cmu.edu/Groups/pop/pop.html
- Develop hardware, software, and system engineering tools that incorporate
ubiquitous, application-based, domain-based, and risk-based assurance. To
meet this goal the HDCC research agenda must:
- furnish the methods, tools, and environments necessary for the design,
construction, and evaluation of behavioral enforcement mechanisms; and
- establish indicators and characteristics of overall system confidence in
the achieved behavioral properties gained through the application of such
methods, tools and environments.
Software Engineering has grown into a field of Computer Science in its own
right. Its aim is that systems constructed from software can attain the same
reliability and predictability as bridges and other symbols of engineering
excellence. At Carnegie Mellon much of the research and education in this field
is conducted by the Institute for Software Research (http://spoke.compose.cs.cmu.edu/isri/)
and the Software Engineering Institute (http://www.sei.cmu.edu/).
- Reduce the effort, time, and cost of assurance and quality certification
processes. To meet this goal, the HDCC research agenda must:
- furnish the means to improve the productivity of information system
design, development, and analysis,
- while simultaneously improving the levels of confidence that can be
achieved through such productivity enhancements.
The industrial use of system analysis and verification tools has been
limited, but university researchers have made considerable progress in producing
tools that find bugs in real hardware and software. So far, most of the success
has been in hardware where complexity is lower and specifications cleaner; but
there have been promising successes in software as well. For Carnegie Mellon
activities in formal systems see http://www.cs.cmu.edu/Groups/formal-methods/formal-methods.html
- Understand the human problems in creating, maintaining, and using computer
systems. This has become a vital area of research as computers have become
ubiquitous. Seat-of-the-pants design might have been sufficient when the users
of computers were engineers, scientists, and programmers; but now a deep
understanding of human capabilities must be built into design because the
users are often very different from the designers. "Pilot error" is the most
frequently cited cause of airline mishaps, and "programmer error" is similarly
often the purported cause of software defects, except in the frequent case in
which problems are blamed on "user error". We need to understand and account
for the capabilities of both the designers and end users of systems. For
Carnegie Mellon activities in human-computer interaction see http://www.hcii.cmu.edu/.
- Provide measures of results. To meet this goal, the HDCC research
- develop measures of performance and measures of effectiveness for use in
quantifying and qualifying the progress of improvements in system-level
confidence that can be achieved through the application of HDCC
- Further, the agenda must show through such measures that the benefits
achieved are cost effective.
One reason to do system fault discovery is to find a metric. Fault discovery
is only somewhat helpful as a debugging technique¾ it
is much more powerful as a quality assurance technique in support of building
dependable systems. For some Carnegie Mellon research in this area see http://www.ices.cmu.edu/ballista
- Promote software engineering education. Currently, de facto
software engineers coming from universities are emerging from departments of
computer science and engineering. Unfortunately the computer scientists are
often too theoretical while the engineers are often too hardware-oriented.
What is needed is professional education akin to what medical doctors receive,
but nobody is doing it. Both software engineering research and education must
have strong connections to practice: education needs a practical setting to
develop skill, and research needs access to real problems that expose the deep
issues involved in real-world development.
We should create an institution that serves software engineering as a
teaching hospital serves medicine. Students would learn in the context of real
cases. Clinical faculty would both practice and teach. Research would exploit
access to real cases and data. We would provide a development laboratory in
which real software developers produce real software for real clients.
Developers would interact with researchers to infuse the research agenda with
visibility into real problems, and developers can take advantage of research
results. Students would learn through direct experience in a real¾ not just "realistic"¾ setting.
Clinical faculty would be skilled professional software developers and have
significant responsibilities for both teaching and software
Who Should Participate
As shapers of the future, universities should address the software quality
problem now, before the world at large sees a crisis. Just as John Hopkins led a
reform in medical practice in the early 20th century, we can lead a reform in
software practice now. Fortunately, this effort needn't begin from scratch
because computer scientists and academic software engineers have always taken
the issue of software quality seriously. Computer science's first gift to
industry was the programming language, which has now been thoroughly digested
and exploited. It's time to continue that tradition with a practical, but
comprehensive way to create and operate dependable systems.
The university members should include Carnegie Mellon, Cornell, ETH-Zurich,
Karlsruhe, MIT, Stanford, Georgia Tech, and the Universities of California and
Washington. Collectively, these schools have diverse group of researchers
already attacking the problem and a strong commitment to engineering
For inspiration, look to a 15th century character, Prince Henry the Navigator
of Portugal. He was the first great program manager. Intent on finding a
westward route to India, he founded schools for navigators and research into
shipbuilding. Columbus et al. were the ultimate instruments of his
foresighted plan. He died long before 1492. While the government's role should
not really be to seek silver bullets to solve any one problem, they have a
definite role to play in leading and creating a real movement.
The government agency members should include:
- NASA because it has extraordinary requirements for high assurance
- DARPA because of its 50-year commitment to computer science research,
and the military's need for high assurance systems.
The major event in the last twenty years in the computer field is that the
industry has taken the lead in the creation of real systems. The academically
oriented ACM Software and Systems Award has been going to industrial projects
since 1982: UNIX, System R, and the Alto System, to name a few. Some of Software
Engineering's academic leaders (Fred Brooks, Barry Boehm, Watts Humphreys, and
David Garlan) developed their insights in industrial settings and then moved to
continue work in academe. It is essential that experienced engineers from
industry contribute their wisdom to subsequent generations. The following
companies have expressed an interest in the project: SUN, Novell, IBM, Oracle,
Adobe, Cisco, Apple, Autodesk, Amdahl, Micromuse, KLA-Tencor, 3Com, The
Barksdale Group, Veritas, Cadence, Conner, Inktomi, BEA Systems, Symantec,
Copyright ©1998-2005 CNSS
Comments, etc. to [email protected]
This page last updated: Wednesday July 12, 2000 (02:34 PM)