5P41-RRO0785-14 Description of Program Activities II. Description of Program Activities This section corresponds to the predefined forms required by the Division of Research Resources to provide information about our resource activities for their computerized retrieval system. These forms have been submitted separately and are not reproduced here to avoid redundancy with the more extensive narrative information about our resource and progress provided in this report. II.A. Scientific Subprojects Our core research and development activities are described starting on page 16, our training activities are summarized starting on page 77, and the progress of our collaborating projects is detailed starting on page 107. IIB. Books, Papers, and Abstracts The list of recent publications for our core research and development work starts on page 61 and those for the collaborating projects are in the individual reports starting on page 107. II.C. Resource Summary Table The details of resource usage, including a breakdown by the various subprojects, is given in the tables starting on page 79. 3 E. H. Shortliffe 5P41-RR00785-14 Narrative Description III. Narrative Description III.A. Summary of Research Progress II.A.1. Resource Overview This is an annual report for year 14 of the SUMEX-AIM resource (grant RR-00785), the first year of a 3-year renewal period to support further research on applications of artificial intelligence in biomedicine. For both technical and administrative reasons, we merged into the June 1985 SUMEX renewal application the continuation of work on the development and dissemination of medical consultation systems (ONCOCIN) that had been supported as resource-related research under grant RR-01631. Progress on core ONCOCIN research is therefore now reported here as well. These combined efforts represent an ambitious research program to: + Continue our long-range core research efforts on knowledge-based systems aimed at developing new concepts and methodologies needed for biomedical applications. « Substantially extend ONCOCIN research on developing and disseminating clinical decision support systems. » Develop the core system technology to move the national SUMEX-AIM community from a dependence on the central SUMEX DEC 2060 to a fully distributed, workstation-based computing environment. + Introduce these systems technologies into the SUMEX-AIM community with appropriate communications and managerial assistance to responsibly phase out the central resource and DEC 2060 mainframe in a manner that will support community efforts to become self-sustaining and to continue scientific interactions through fully distributed means. Maintain our aggressive efforts at training and dissemination to help exploit the research potential of this field. I11.A.1.1. SUMEX-AIM as a Resource SUMEX and the AIM Community In the fourteen years since the SUMEX-AIM resource was established in late 1973, computing technology and biomedical artificial intelligence research have undergone a remarkable evolution and SUMEX has both influenced and responded to these changing technologies. It is widely recognized that our resource has fostered highly influential work in biomedical AI -- work from which much of the expert systems field emerged -- and that it has simultaneously helped define the technological base of applied AI research. The focus of the SUMEX-AIM resource continues to emphasize research on artificial intelligence techniques that guide the design of computer programs that can help with the acquisition, representation, management, and utilization of the many forms of medical knowledge in diverse biomedical research and clinical care settings -- tanging from biomolecular structure determination and analysis, to molecular biology, to clinical 5 E. H. Shortliffe Resource Overview 5P41-RRO0785-14 decision support, to medical education. Nevertheless, we have long recognized that the ultimate impact of this work in biomedicine will be realized through its assimilation with the full range of methodologies of medical informatics, such as data bases, biostatistics, human-computer interfaces, complex instrument control, and modeling. From the start, SUMEX-AIM work has been grounded in real-world applications, like systems for the interpretation of mass spectral information about biomolecular structures, chemical synthesis, interpretation of x-ray diffraction data on crystals, cognitive modeling, infectious disease diagnosis and therapy, DNA sequence analysis, experiment planning and interpretation in molecular biology, and medical instruction. Our current work extends this emphasis in application domains such as oncology protocol management, clinical decision support, protein structure analysis, and data base information retrieval and analysis. All of these research efforts have demanded close collaborations with diverse parts of the biomedical research community and the integration of many computational methods from those domains with knowledge-based approaches. Even though in the beginning the “Al-in-medicine” community was quite small, it is perforce no longer limited and easily-defined, but rather is spreading and is inextricably linked with the many biomedical applications communities we have collaborated with over the years. Driven both by the on-going diffusion of AI and by the development of personal computer workstations that signal the practical decentralization of computing resources, we must develop new resource communication and distributed computing technologies that will continue to facilitate wider intra- and inter-community communication, collaboration, and sharing of biomedical information. The SUMEX Project has demonstrated that it is possible to operate a computing research resource with a national charter a: * that the services providable over networks were those that facilitate the growth of AI-..-Medicine. SUMEX now has a reputation as a model national resource, pulling together the best available interactive computing technology, software, and computer communications in the service of a national scientific community. Planning groups for national facilities in cognitive science, computer science, and biomathematical modeling have discussed and studied the SUMEX model and new resources, like the recently instituted BIONET resource for molecular biologists, are closely patterned after the SUMEX example. The projects SUMEX supports have generally required substantial computing resources with excellent interaction. Even today though, with the growing, but by no means ubiquitous availability of workstations, this computing power is still hard to obtain in all but a few universities. SUMEX is, in a sense, a “great equalizer”. A scientist gains access by virtue of the quality of his/her research ideas, not by the accident of where s/he happens to be situated. In other words, the resource follows the ethic of the scientific journal. SUMEX has demonstrated that a computer resource is a useful “linking mechanism” for bringing together and holding together teams of experts from different disciplines who share a common problem focus. AI concepts and software are among the most complex products of computer science. Historically it has not been easy for scientists in other fields to gain access to and mastery of them. Yet the collaborative outreach and dissemination efforts of SUMEX have been able to bridge the gap in numerous cases. Over 36 biomedical AI application projects have developed in our national community and have been supported by SUMEX computing resources over the years. And 9 of these have matured to the point of now continuing their research on facilities outside of SUMEX. For example, the BIONET resource (named GENET while at SUMEX) is being operated by IntelliGenetics; the CADUCEUS project splits their research work between their own IBM PC workstations, a VAX computer, and the SUMEX resource; and the Chemical Synthesis project now operates entirely on a VAX at U.C. Santa Cruz. The integration of AI ideas with other parts of medical informatics and their dissemination into biomedicine is happening largely because of the development in the E. H. Shortliffe 6 5P41-RRO0785-14 Resource Overview 1970's and early 1980's of methods and tools for the application of AI concepts to difficult professional-level problem solving. Their impact was heightened because of the demonstration in various areas of medicine and other life sciences that these methods and tools really work. Here SUMEX has played a key role, so much so that it is regarded as "the home of applied AI." SUMEX has been the nursery, as well as the home, of such well-known AI systems as DENDRAL (chemical structure elucidation), MYCIN (infectious disease diagnosis and therapy), INTERNIST (differential diagnosis), ACT (human memory organization), ONCOCIN (cancer chemotherapy protocol advice), SECS (chemical synthesis), EMYCIN (rule-based expert system tool), and AGE (blackboard-based expert system tool). In the past four years, our community has published a dozen books that give a scholarly perspective on the scientific experiments we have been performing. These volumes, and other work done at SUMEX, have played a seminal role in structuring modern AI paradigms and methodology. IJI.A.1.2. The Future of SUMEX-AIM Given this background, what is the future need and course for SUMEX as a resource -- especially in view of the on-going revolution in computer technology and costs and the emergence of powerful single-user workstations and local area networking? The answers remain clear. Basic Research in AI in Biomedicine At the deepest research level, despite our considerable success in working on medical and biological applications, the problems we can attack are still sharply limited. Our current ideas fall short in many ways against today's important health care and biomedical research problems brought on by the explosion in medical knowledge and for which AI should be of assistance. Just as the research work of the 70's and 80's in the SUMEX-AIM community fuels the current practical and commercial applications, our work of the late 80's will be the basis for the next decade’s systems. The report of the panel on medical informatics [12], convened late in 1985 by the National Library of Medicine to review and recommend twenty-year goals for the NLM, listed among its highest priority recommendations the need to greatly expand and aggressively pursue an interdisciplinary research program to develop computational methods for acquiring, representing, managing, and using biomedical knowledge of all sorts for health care and biomedical research. These are precisely the problems which the SUMEX-AIM community has been working on so successfully and which will require work well beyond the five year funding period we have requested. It is essential that this line of research in the SUMEX-AIM community, represented by our core Al research, the ONCOCIN research, and our collaborative research groups, be continued. The Changing Role of the Central Resource At the resource level, there are changing, but still intense, needs for computing resources for the active AIM research community to continue its work over the next five years. The workstations to which we directed our attention in 1980 have now demonstrated their practicality as research tools and, increasingly, as potential mechanisms for disseminating AI systems as cost-effective decision aids in clinical settings.such as private offices. Over the next half decade we expect the era of highly centralized general machines for AI research will come to an end, and be replaced gradually by networks of distributed but heterogeneous single-user machines sharing common information resources and communication paths among members of the biomedical research community. 7 E. H. Shortliffe Resource Overview 5P41-RRO0785-14 Many of our community groups are still dependent on the SUMEX-AIM resources. For those that have been able to take advantage of newly developed local computing facilities, SUMEX-AIM provides a central cross-roads for communications and. the sharing of programs and knowledge. In its core research and development role, SUMEX-AIM has its sights set on the hardware and software systems of the next decade. We expect major changes in the distributed computing environments that are just now emerging in order to make effective use of their power and to adapt them to the development and dissemination of biomedical AI systems for professional user communities. In its training role, SUMEX is a crucial resource for the education of badly needed new researchers and professionals to continue the development of the biomedical AI field. The "critical mass" of the existing physical SUMEX resource, its development staff, and its intellectual ties with the Stanford Knowledge Systems Laboratory, make this an ideal setting to integrate, experiment with, and export these methodologies for the rest of the AIM community. At the beginning, the SUMEX community was small and idea-limited, and the central SUMEX computer facility was an ideal vehicle for the research. Now the community is large, and the momentum of the science is such that its progress is limited by computing power and research manpower. The size and scientific maturity of the SUMEX community has fully consumed the computing resource in every critical dimension -- CPU power, main memory size, address space, and file space -- and has overflowed to decentralized machines of many types. Much of our work has already been focussed on developing and experimenting with workstation environments for biomedical AI applications. We are fully committed to continuing this line of research for the future hardware thrust of the resource. We will continue our experimental approach to these systems, rejecting articles of faith for real experience. We must learn to build and exploit distributed networks of these machines and to build and manage graceful software for these systems. Since decentralization is central to our future, we must learn its technical characteristics. The resource development directions we have sketched have received substantial external impetus as well [12, 2,7]. For example, another of the key recommendations of the NLM medical informatics planning panel [12] was that high-speed network communication links be established throughout the biomedical research community so that knowledge and information can be shared across diverse research groups and that the required interdisciplinary collaborations can take place. A principal goal from the start of SUMEX-AIM has been to experiment with these electronic links, but SUMEX is only a start toward this broad goal. Nevertheless, it continues to be an important pathfinder to develop the technology and community interaction tools needed to expand community system and communication resources. Highlights of Long-term Goals e Maintain the synergistic relationship between SUMEX core system development, core AI research, our experimental efforts at disseminating clinical decision-making aids, and new applications efforts. « Continue to serve the national AIM research community, less and less as a source of raw computing cycles and more and more as a transfer point for new technologies important for community research and communication. We will also continue our coordinating role within the community through electronic media and periodic AIM workshops. + Maintain our connections to ARPANET, TELENET, and our local Ethernet and assist other community members to establish similar links by example, by integrating and providing enabling software, and by offering advice and support within our resources. E. H. Shortliffe 8 5P41-RRO00785-14 Resource Overview - Focus new computing resource developments on more effective exploitation of distributed workstations through better communication and cooperative computing tools, using transparent digital networking schemes. » Enhance the computing environments of workstations so that minimal dependency on central, general-purpose computing hosts remains and these mainframe time-sharing systems can be phased out eventually. Remaining central resources will include servers for communications, community information resources, and special computing architectures (e.g., shared- or distributed-memory symbolic multiprocessors) justified by cost-effectiveness and unique functionality. « Incrementally phase-in, disseminate, and evaluate those aspects of the local distributed computing resource that are necessary for, continuing national AIM community support within this distributed paradigm. This will ultimately point the way towards the distributed computing resource model that we believe will interlink this community well into the next decade. « Gradually and responsibly phase out the existing DEC 2060 machine as effective distributed computing alternatives become widely available. We expect this to be possible sometime during the fourth through fifth years of the continuation resource. « Continue the central staff and management structure, essentially unchanged in size and function during the five-year transition period, except for the merging of the core part of the ONCOCIN research with the SUMEX resource. 9 E. H. Shortliffe Resource Definitions and Goals 5P41-RRO00785-14 IIl.A.2. Resource Definitions and Goals SUMEX-AIM is a national computer resource with a multiple mission: a) promoting experimental applications of computer science research in artificial intelligence (AI) to biological and medical problems, b) studying methodologies for the dissemination of biomedical AI systems into target user communities, c) supporting the basic AI research that underlies applications, and d) facilitating network-based computer resource sharing, collaboration, and communication among a national scientific community of health research projects. The SUMEX-AIM resource is located physically in the Stanford University Medical School and serves as a nucleus for a community of medical Al projects at universities around the country. SUMEX provides computing facilities tuned to the needs of AI research and communication tools to facilitate remote access, inter- and intra-group contacts, and the demonstration of developing computer programs to biomedical research collaborators. IIJ.A.2.1. Knowledge-Based System Research The SUMEX Project has given strong impetus to the development of knowledge-based system research in biomedicine. Knowledge-based system research is that part of computer science that investigates symbolic reasoning processes, and the representation of symbolic knowledge for use in inference!. A knowledge-based or expert system is a computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant human expertise for their solution. For some fields of work, the knowledge necessary to perform at such a level, plus the inference procedures used, can be thought of as a model of the expertise of the expert practitioners of that field. The knowledge of an expert system consists of facts and heuristics. The facts constitute a body of information that is widely shared, publicly available, and generally agreed upon by experts in a field. The heuristics are the mostly-private, little-discussed tules of good judgment (rules of plausible reasoning and of good guessing) that characterize expert-level decision-making in the field. Our work views heuristic knowledge to be of equal importance with factual knowledge, indeed to be the essence of what we call expertise. The performance level of an expert system is primarily a function of the size and quality of the knowledge base that it possesses. Projects in the SUMEX-AIM community are concerned in some way with the application of AI to biomedical research. Brief abstracts of the various projects currently using the SUMEX resource can be found in Appendix B and more detailed progress summaries in Section TV. The most tangible objective of this approach is the development of computer programs that will be more general and effective consultative tools for the clinician and medical scientist. All of these research efforts have demanded close collaborations with diverse parts of the biomedical research community and the integration of many computational methods from those domains with knowledge-based approaches. We have long recognized that the ultimate impact of this work in biomedicine will be realized through its assimilation with the full range of methodologies of medical informatics, including, for example, data base research, biostatistics, decision support, complex instrument control, and modeling. There have already been promising results in many application areas, even though state- of-the-art programs are far more narrowly specialized and inflexible than the corresponding aspects of human intelligence they emulate. Needless to say, much is yet IMany introductory and survey texts have been written by now on AI and knowledge-based or expert systems. See for example [1, 11, 13, 5, 23, 4, 18]. E. H. Shortliffe 10 5P41-RR00785-14 Resource Definitions and Goals to be learned in the process of fashioning a coherent scientific discipline out of the experimental programs, mathematical procedures, and emerging theoretical structure comprising knowledge-based system research. IIT.A.2.2. Resource Sharing An equally important function of the SUMEX-AIM resource is an exploration of the use of computer communications as a means for interactions and sharing between geographically remote research groups engaged in biomedical computer science research and for the dissemination of AI technology. This facet of scientific interaction is becoming increasingly important with the explosion of complex information sources and the regional specialization of groups and facilities that might be shared by remote researchers [10, 3]. And, as projected, we are seeing a growing decentralization of computing resources with the emerging technology in microelectronics and a correspondingly greater role for digital communications to facilitate scientific exchange. Our community building effort is based upon the developing state of distributed computing and communications technology. While far from perfected, these capabilities offer powerful tools for collaborative linkages, both within a given research project and among them. A number of the active projects on SUMEX are based upon the collaboration of computer and medical scientists at geographically separate institutions, separate both from each other and from the computer resource (see for example, the MENTOR and PathFinder projects). In the early 1970's, the initial model for SUMEX-AIM as a centralized resource was based on the high cost of powerful computing facilities and the infeasibility of being able to duplicate them readily. This central role has already evolved significantly and continues to change with the introduction of more compact and inexpensive computing technology now available at many more research sites. At the same time, the number of active groups working on biomedical AI problems has grown and the established ones have increased in size. This has led to a growth in the demand for computing resources far beyond what SUMEX-AIM could reasonably and effectively provide on a national scale. We have therefore turned our core systems research to actively supporting the development of distributed computing and communications resources to facilitate collaborative project research and continued inter-group communications. Thus, as more remotely available resources have become established, the balance of the use of the SUMEX-AIM resource has shifted toward supporting start-up pilot projects and the growing AI research community at Stanford. III.A.2.3. Significance and Impact in Biomedicine Artificial intelligence is the computer science of representations of symbolic knowledge and its use in symbolic inference and problem-solving processes. For computer applications in medicine and biology, this research path is crucial. Medicine and biology are not presently mathematically-based sciences; unlike physics and engineering, they are seldom capable of exploiting the mathematical characteristics of computation. They are essentially inferential, not calculational, sciences. If the computer revolution is to affect biomedical scientists, computers will be used as inferential aids. The growth in medical knowledge has far surpassed the ability of a single practitioner to master it all, and the computer's superior information processing capacity thereby offers a natural appeal. Furthermore, the reasoning processes of medical experts are poorly understood; attempts to mode! expert decision-making necessarily require a degree of introspection and a structured experimentation that may, in turn, improve the quality of the physician’s own clinicai decisions, making them more reproducible and ll E. H. Shortliffe Resource Definitions and Goals 5P41-RROO785-14 defensible. New insights that result may also allow us more adequately to teach medical students and house staff the techniques for reaching good decisions, rather than merely to offer a collection of facts which they must independently learn to utilize coherently. Perhaps the larger impact on medicine and biology will be the exposure and refinement of the hitherto largely private heuristic knowledge of the experts of the various fields studied. The ethic of science that calls for the public exposure and criticism of knowledge has traditionally been flawed for want of a methodology to evoke and give form to the heuristic knowledge of scientists. AI methodology is beginning to fill that need. Heuristic knowledge can be elicited, studied, critiqued by peers, and taught to students. The importance of AI research and its applications is increasing in general, without regard for the specific areas of biomedical interest. AI is one of the principal fronts along which university computer science groups are expanding. The pressure from student career-line choices is great: to cite an admittedly special case, approximately 80% of the students applying to Stanford’s computer science Ph.D. program cite AI as a possible field of specialization (up from 30% a few years ago). Federal and industrial support for AI research is vigorous and growing, although support specifically for biomedical applications continues to be limited. All of the major computer manufacturers (e.g., IBM, DEC, TI, UNISYS, HP, and others) are using and marketing AI technology aggressively and many software companies are putting more and more products on the market. Many other parts of industry are also actively pursuing Al applications in their own contexts, including defense and aerospace companies, manufacturing companies, financial companies, and others. Despite the limited research funding available, there is also an explosion of interest in medical AI. The American Association for Artificial Intelligence (AAAI), the principal scientific membership organization for the AT field, has 7000 members, over 1000 of whom are members of the medical special interest group known as the AAAI-M. Speakers on medical AI are prominently featured at professional medical meetings, such as the American College of Pathology and American College of Physicians meetings; a decade ago, the words artificial intelligence were never heard at such conferences. And at medical computing meetings, such as the annual Symposium on Computer Applications in Medical Care (SCAMC) and the international MEDINFO conferences, the growing interest in AI and the rapid increase in papers on AI and expert systems are further testimony to the impact that the field is having. AI is beginning to have a similar effect on medical education. Such diverse organizations as the National Library of Medicine, the American College of Physicians, the Association of American Medical Colleges, and the Medical Library Association have all called for sweeping changes in medical education, increased educational use of computing technology, enhanced research in medical computer science, and career development for people working at the interface between medicine and computing. They all cite evolving computing technology and (SUMEX-AIM) AI research as key motivators. At Stanford, we have vigorous special programs for student training and research in AI -- a new graduate program in Medical Information Sciences and the two-year Masters Degree in AI program. All of these have many more applicants than available slots. Demand for their graduates, in both academic and industrial settings, is so high that students typically begin to receive solicitations one or two years before completing their degrees. IJ.A.2.4. Summary of Current Resource Goals The following outlines the specific objectives of the SUMEX-AIM resource during the current three-year award period begun in August 1986. It provides an overall research E. H. Shortliffe 12 5P41-RR00785-14 Resource Definitions and Goals plan for the resource and provides the backdrop against which specific progress is reported. Note that these objectives cover only the resource nucleus; objectives for individual collaborating projects are discussed in their respective reports in Section IV. Specific aims are broken into five categories: 1) Technological Research and Development, 2) Collaborative Research, 3) Service and Resource Operations, 4) Training and Education, and 5) Dissemination. 1) Technological Research and Development SUMEX funding and computational support for core research is complementary to similar funding from other agencies (including DARPA, NASA, NSF, NLM, private foundations, and industry) and contributes to the long-standing interdisciplinary effort at Stanford in basic AI research and expert system design. We expect this work to provide the underpinnings for increasingly effective consultative programs in medicine and for more practical adaptations of this work within emerging microelectronic technologies. Specific aims include: « Basic research on AI techniques applicable to biomedical problems. Over the next term we will emphasize work on blackboard problem-solving frameworks and architectures, knowledge acquisition or learning, constraint satisfaction, and qualitative simulation. « Investigate methodologies for disseminating application systems such as clinical ‘decision-making advisors into user groups. This will include generalized systems for acquiring, representing and reasoning about complex treatment protocols such as are used in cancer chemotherapy and which might be used for clinical trials. - Support community efforts to organize and generalize AI tools and architectures that have been developed in the context of individual application projects. This will include retrospective evaluations of systems like the AGE blackboard experiment and work on new systems such as BBI, MRS, SOAR, EONCOCIN, EOPAL, Meta~-ONYX, and architectures for concurrent symbolic computing. The objective is to evolve a body of software tools that can be used to more efficaciously build future knowledge-based systems and explore other biomedical AI applications. e Develop more effective workstation systems to serve as the basis for research, biomedical application development, and dissemination. We seek to coordinate basic research, application work, and system development so that the AI software we develop for the next 5-10 years will be appropriate to the hardware and system software environments we expect to be practical by then. Our purchases of new hardware will be limited to experimentation with state-of-the-art workstations as they become available for our system developments. 2) Collaborative Research « Encourage the exploration of new applications of AI to biomedical research and improve mechanisms for inter- and intra-group collaborations and communications. While AI is our defining theme, we may consider exceptional applications justified by some other unique feature of SUMEX- AIM essential for important biomedical research. We will continue to exploit community expertise and sharing in software development. « Minimize administrative barriers to the community-oriented goals of 13 E. H. Shortliffe Resource Definitions and Goals 5P41-RRO0785-14 SUMEX-AIM and direct our resources toward purely scientific goals. We will retain the current user funding arrangements for projects working on SUMEX facilities.. User projects will fund their own manpower and local needs; actively contribute their special expertise to the SUMEX-AIM community; and receive an allocation of computing resources under the control of the AIM management committees. We will begin charging "fees for service” to Stanford users as DRR support for the DEC 2060 is phased out. Fees to national users will be delayed as long as financially possible. « Provide effective and geographically accessible communication facilities to the SUMEX-AIM community for remote collaborations, communications among distributed computing nodes, and experimental testing of AI programs. We will retain the. current ARPANET and TELENET connections for at least the near term and will actively explore other advantageous connections to new communications networks and to dedicated links. 3) Service and Resource Operations SUMEX-AIM does not have the computing or manpower capacity to provide routine service to the large community of mature projects that has developed over the years. Rather, their computing needs are better met by the appropriate development of their own computing resources when justified. Thus, SUMEX-AIM ‘has the primary focus of assisting new start-up or pilot projects in biomedical AI applications in addition to its core research in the setting of a sizable number of collaborative projects. We do offer continuing support for projects through the lengthy process of obtaining funding to establish their own computing base. 4) Training and Education « Provide documentation and assistance to interface users to resource facilities and systems. » Exploit particular areas of expertise within the community for assisting in the development of pilot efforts in new application areas. » Accept visitors in Stanford research groups within limits of manpower, space, and computing resources. « Support the Medical Information Science and MS/AI student programs at Stanford to increase the number of research personnel available to work on biomedical AI applications. e Support workshop activities including collaboration with other community groups on the AIM community workshop and with individual projects for more specialized workshops covering specific research, application, or system dissemination topics. 5) Dissemination While collaborating projects are responsible for the development and dissemination of their own AI systems and results, the SUMEX resource will work to provide community-wide support for dissemination efforts in areas such as: - Encourage, contribute to, and support the on-going export of software E. H. Shortliffe 14 5P41-RRO00785-14 Resource Definitions and Goals systems and tools within the AIM community and for commercial development. e Assist in the production of video tapes and films depicting aspects of AIM community research. « Promote the publication of books, review papers, and basic research articles on all aspects of SUMEX-AIM research. 15 E. H. Shortliffe Details of Technical Progress 5P41-RRO00785-14 IJJ.A.3: Details of Technical Progress This section gives an overview of progress for the nucleus of the SUMEX-AIM resource. A more detailed discussion of our progress in specific areas and related plans for further work are presented in Section III.A.3.2. Objectives and progress for individual collaborating projects are discussed in their respective reports in Section IV. These collaborative projects collectively provide much of the scientific basis for SUMEX as a resource and our role in assisting them has been a continuation of that evolved in the past. Collaborating projects are autonomous in their management and provide their own manpower and expertise for the development and dissemination of their AI programs. IlI.A.3.1. Progress Highlights In this section we summarize highlights of SUMEX-AIM resource activities over the past year (May 1986 - April 1987), focusing on the resource nucleus. e We have made significant progress in the core ONCOCIN research work to generalize the tools for clinical trial management from the initial cancer chemotherapy management application. We began examining the structures of protocols across several medical subspecialties other than cancer chemotherapy, concentrating this year on insulin diabetes treatment. Graphical tools are under development to facilitate protocol definition and knowledge base entry and we worked on model-based reasoning to infer protocol therapeutic actions not explicitly encoded in the decision plan. We have also continued to examine the issues of disseminating the ONCOCIN system into actual clinical settings. e We made significant progress in core AI research, primarily in the areas of knowledge representation, blackboard frameworks, parallel symbolic computing architectures, and machine learning. Work has advanced on the representation of explicit strategic knowledge for problem-solving and blackboard control knowledge, including cost/benefit trade-offs of increasingly complex control reasoning. The parallel architectures work has developed a flexible, instrumented simulator of distributed-memory, multiprocessor architectures and two alternative parallel blackboard frameworks for expressing application problems. These have been applied to several signal understanding problems with promising nearly linear problem- solving speedup. The machine learning work has concentrated on explanation-based generalization and chunking work in the SOAR framework, inductive rule learning, and tools for debugging knowledge structures. Work has also continued on reasoning with uncertainty to find ways of combining formal and informal approximate reasoning methods. We also continued work on extending and refining the BB1 blackboard system. « We have made excellent progress on the core system development work targeted at supporting the distributed AIM community. We have continued implementation of uniform network protocol standards for remote workstation access, redirected our virtual graphics work to take advantage of the X window protocol being adopted by many workstation vendors, and implemented prototype communication tools that integrate text and graphics between linked machines. We have concentrated on the NFS protocol for distributed file access and have got experimental versions of this and the underlying remote procedure call facilities working or underway for all of E. H. Shortliffe 16 5P41-RRO00785-14 Details of Technical Progress our workstations. An additional service is being implemented to allow remote database queries through remote procedure calls to a standard relational database. We have a prototype distributed electronic mail system working on Xerox D-machines and will be extending and porting this to other environments shortly. We have also made important progress in extending the general computing environments for text processing, file management, printing, communications, and other services on specific workstation environments, including the support of 6 different operating system environments. « We have continued the dissemination of SUMEX-AIM technology through various media. We have reorganized the distribution system for our Al software tools (EMYCIN, AGE, MRS, SACON, and BB1) to academic, industrial, and federal research laboratories, in order to make it more efficient and require less research staff time. We have also continued to distribute the video tapes of some of our research projects including ONCOCIN, and an overview tape of Knowledge Systems Laboratory work to outside groups. Our group has continued to publish actively on the results of our research, including more than 45 research papers per year in the Al literature and a dozen books in the past 5 years on various aspects of SUMEX-AIM AT research. + The Medical Information Sciences program, begun at Stanford in 1983 under Professor Shortliffe as Director, has continued its strong development over the past year. The specialized curriculum offered by the MIS program focuses on the development of a new generation of researchers able to support the development of improved computer-based solutions to biomedical needs. The feasibility of this program resulted in large part from the prior work and research computing environment provided by the SUMEX-AIM resource. It has recently received enthusiastic endorsement from the Stanford Faculty Senate for an additional five years, has been awarded renewed post-doctoral training support from the National Library of Medicine with high praise for the training and contributions of the SUMEX-AIM environment from the reviewing study section, and has received additional industrial and foundation grants for student support. This past year, MIS students have published many papers, including several that have won conference awards. « While the SUMEX-AIM computing resource hardware has been largely unchanged this past year, we continue to evaluate new workstation technologies of advantage to the AIM community. We continue to operate the DEC 2060 mainframe and file servers for the community. Because of the broad mix of research in the SUMEX-AIM community, no single computer vendor can meet our needs so we have undertaken long-term support of a heterogeneous computing environment, incorporating many types of machines linked through multiprotocol Ethernet facilities. » We have continued to recruit new user projects and collaborators to explore further biomedical areas for applying AI. A number of these projects are built around the communications network facilities we have assembled, bringing together medical and computer science collaborators from remote institutions and making their research programs available to still other remote users. At the same time we have encouraged older mature projects to build their own computing environments thereby freeing up SUMEX resources for newer projects. 17 E. H. Shortliffe Details of Technical Progress 5P41-RR0Q0785-14 e In June 1986, we moved the SUMEX and Medical Computer Science offices into the newly constructed Stanford Medical School Office Building, funded by the university. This space provides us with almost twice the area we previously occupied and it is laid out so as to promote better interactions between our groups and among our students and research staff. « SUMEX user projects have made good progress in developing and disseminating effective consultative computer programs for biomedical research. These systems provide expertise in areas like cancer chemotherapy protocol management, clinical diagnosis and decision-making, and molecular biology. We have worked hard to meet their needs and are grateful for their expressed appreciation (see Section IV). E. H. Shortliffe 18 5P41-RR00785-14 Details of Technical Progress IJ.A.3.2. Core ONCOCIN Research ONCOCIN is a data-management and therapy-advising program for complex cancer chemotherapy experiments. The development of the system began in 1979, following the successful generalization of MYCIN into the EMYCIN expert system shell. The ONCOCIN project has evolved over the last eight years. The original version of ONCOCIN ran on the time-shared DEC computers, using a standard terminal for the time-oriented display of patient data. The current version uses compact, single-user workstations running on the SUMEX Ethernet network with large bit-mapped displays for presentation of patient data. The project has also expanded in scope. There are three major research components: 1) ONCOCIN, the therapy planning program and its graphical interface; 2) OPAL, a graphical knowledge entry system for ONCOCIN; and 3) ONYX, a strategic planning program designed to give advice in complex therapy situations. Each of these research components has been split into two parts: continued development of the cancer therapy versions of the system, and generalization of each of the components for use in other areas of medicine. This section will concentrate on the three core research topics derived from our applied work: 1) design of therapy planning systems for use in clinical trial experiments (E-ONCOCIN), 2) implementation of knowledge acquisition systems for clinical trials, and 3) development of general approaches to strategic therapy planning. The work on continued development of the ONCOCIN cancer chemotherapy advisor system itself is described separately in Section IV.A.3. 1 - Overview of the ONCOCIN Therapy Planning System ONCOCIN is an advanced expert system for clinical oncology. It is designed for use after a diagnosis has been reached, focusing on assisting with the management of cancer patients who are receiving chemotherapy. Because anticancer agents tend to be highly toxic, and because their tumor-killing effects are routinely accompanied by damage to normal cells, the rules for monitoring and adjusting treatment in response to a given patient’s course over time tend to be complex and difficult to memorize. ONCOCIN integrates a temporal record of a patient's ongoing treatment with an underlying knowledge base of treatment protocols and rules for adjusting dosage, delaying treatment, aborting cycles, ordering special tests, and similar management details. The program uses such knowledge to help physicians with decisions regarding the management of specific patients. A major lesson of past work in clinical computing has been the need to develop methods for integrating a system smoothly into the patient-care environment for which it is intended. In the case of ONCOCIN, the goal has been to provide expert consultative advice as a by-product of the patient data management process, thereby avoiding the need for physicians to go out of their way to obtain advice. It is intended that oncologists use ONCOCIN routinely for recording and reviewing patient data on the computer's screen, regardless of whether they feel they need decision-making assistance. This process replaces the conventional recording of data on a paper flowsheet and thus seeks to avoid being perceived as an additive task. In accordance with its: knowledge of the patient’s chemotherapy protocol, ONCOCIN then provides assistance by suggesting appropriate therapy at the time that the day's treatment is to be recorded on the flowsheet. Physicians maintain control of the decision, however, and can override the computer's recommendation if they wish. ONCOCIN also indicates the appropriate interval until the patient’s next treatment and reminds the physician of radiologic and laboratory studies required by the treatment protocol. This core research report begins with our efforts to extend the techniques of ONCOCIN for use in other areas of medicine (E-ONCOCIN). 19 E. H. Shortliffe Details of Technical Progress S5P41-RR00785-14 2 - E-ONCOCIN: Domain Independent Therapy Planning During this. past year, our E-ONCOCIN research has concentrated on understanding how protocols in medicine vary across subspecialties. We felt that the area of insulin treatment for diabetes would be a good area to explore. Like cancer chemotherapy, treatments for diabetes continue over long periods of time and have been the area of intensive protocol development. Unlike cancer chemotherapy, the treatment plan must handle multiple doses over the course of one day and deemphasizes the use of drug combinations (although there are a variety of types of insulin). Other challenges of the diabetes area include consideration of multiple goals, such as finding the "normal dose” of: insulin versus adjusting for short term trends. Diabetes treatment plans must be flexible enough to take into account diet and exercise patterns and their effects on insulin requirements. We performed knowledge acquisition sessions about insulin treatment of diabetes, using the medical literature and several internists in the Medical Computer Science research group (Mark Frisse, Mark Musen, and Michael Kahn). The proposed structure for the knowledge base was implemented using the object-oriented programming language upon which ONCOCIN has been based. These experiments, like those of adding more protocols to ONCOCIN, demonstrated the need for changes in the way that the knowledge base can access the time-oriented data base that stores patient data and previous conclusions. The relationships between the different doses and types of insulin treatments will also require alternative ways of building treatment hierarchies. Thus, our initial experiments have shown that many of the elements of the ONCOCIN design are sufficiently general for other application areas, but that some specific elements (particularly the representation of temporal events) will have to be generalized. During the coming year, we will continue our knowledge acquisition experiments and design a version of the E-ONCOCIN system that is separate from the ongoing "clinic version.” 3 - OPAL: Graphical Knowledge Acquisition Interface OPAL is a graphical environment for use by an oncologist who wishes to enter a new chemotherapy protocol for use by ONCOCIN or to edit an existing protocol. Although the system is designed for use by oncologists who have been trained in its use, it does not require an understanding of the internal representations or reasoning strategies used by ONCOCIN. The system may be used in two interactive modes, depending on the type of knowledge to be entered. The first permits the entry of a graphical description of the overall flow of the therapy process. The oncologist manipulates boxes on the screen that stand for various steps in the protocol. The resulting diagram is then translated by OPAL into computer code for use by ONCOCIN. Thus, by drawing a flow chart that describes the protocol schematically, the physician is effectively programming the computer to carry out the procedure appropriately when ONCOCIN is later used to guide the management of a patient enrolled in that protocol. OPAL’s second interactive mode permits the oncologist to describe the details of the individual events specified in the graphical description. For example, the rules for administering a given chemotherapy will vary greatly depending upon the pattent's Tesponse to earlier doses, intercurrent illnesses and toxicities, hematologic status, etc. Figure 1 shows one of the forms provided by OPAL for this type of specification. It permits the entry of an attenuation schedule for an agent based upon the patient's white count and platelet count at the time of treatment. Tables such as this are generally found in the written version of chemotherapy protocols. Thus, OPAL permits oncologists to enter information using familiar forms displayed on the computer's screen. The contents of such forms are subsequently translated into rules and other knowledge structures for use by ONCOCIN. E. H. Shortliffe 20 5§P41-RRO0785-14 Details of Technical Progress Orug Combination; Pocc _ Subcycia: A Drug: PROCARBAZ INE : Ptaiciets (« 1000) wec « aK 5 (x 1009) >= 1£9 168 - 166 75 - 180 < 75 >= 3.5 186% of STO 75% af STO Delay Delay 3.9-3.5 75% of STD Dalay *% Dalay Delay 2.5 - 3.8 Dalay Datay Oalay Delay ¢€ 2.5 Delay Delay Dalay Delay Soecity Abort Infai Figure 1: A Sample OPAL Form Status of the OPAL System OPAL is one of the few graphical knowledge acquisition systems ever designed for expert systems. Even fewer are designed to be used as the main method for entering knowledge as opposed to a proof of concept implementation. We have pursued three directions in the development of the OPAL system, also in response to the large number of protocols entered through this system during the last year. The first direction is the modification of graphical forms needed to allow the entry of facts that did not show up in the protocols used to test the initial version of OPAL. OPAL continues to assume that most of the knowledge to be entered will have very stereotyped forms, e.g., dose attenuations for most treatment toxicities are based on a comparison of only one laboratory measurement at a timé, such as using the BUN to adjust for renal toxicity. We sometimes need much more complex ways of stating the scenarios in which dose adjustments may be necessary. This need has led us in a second direction, towards a "lower-level" rule entry approaching the syntax of the reasoning component of ONCOCIN, but using graphical input devices where applicable. A prototype version of this rule entry system has been completed, and will soon be evaluated as an adjunct to the basic OPAL system. The OPAL program maps the information provided on the graphical forms into a complex data structure (called the IDS) that is used to represent the contents of the 21 E. H. Shortliffe Details of Technical Progress 5P41-RRO00785-14 protocol. The data structure is used for copying information from one protocol to another, and as the basis for the creation of the ONCOCIN knowledge base. Our experiments with OPAL, and our intention to generalize OPAL for use outside of oncology protocols, suggested that we reorganize the OPAL program to use a relational database to store its knowledge. We have patterned the database after an existing database query syntax. Because no relational database management systems exist for the Interlisp language upon which OPAL is based, we reimplemented the database from its written description. The database structure is now almost complete, and we have begun to design a revised IDS for chemotherapy protocols, and will be determining how an IDS would be created for other areas of medicine (e.g., the insulin example being used in the E-ONCOCIN experiments). Our ability to use the OPAL system for specifying oncology treatments has led us to the design of a new program, named PROTEGE, that will turn an interactive session with an expert and knowledge engineer into the specification of an OPAL-like system for clinical trials in a wide range of medical areas. We have implemented several prototype forms for PROTEGE. These forms are used to specify a general description of the application area. Of particular importance is the need to specify how the therapy planning process will take place, eg., how will the initial dosage of a drug be combined with various adjustments of the dosage due to toxicities to the treatment to form the final recommended dose. Most of this type of “procedural” knowledge is not entered in the OPAL system, and must be hand-coded by the knowledge engineer. A Ph.D. thesis on PROTEGE is in Progress by Mark Musen, M.D., and will be completed during the next year. 4 - ONYX: Strategic Therapy Planning Although the knowledge of cancer chemotherapy is rich and complex, protocols seldom refer directly to underlying models of drug action. The guidelines in a protocol are, rather, high-level composite descriptions of expert advice, based on the study designers’ experience as well as biological models of the therapeutic agents and their mechanisms of action. We have observed, however, that when protocols fail to cover a complex clinical situation that arises for a given patient, expert oncologists will turn to underlying mechanistic models and use them to assist in the decision-making process. ONCOCIN has no such knowledge; it must therefore occasionally decline to make a recommendation and instead refer a physician to the study chairman for a decision about how to manage a particular complex problem. It is accordingly a long-range goal to add model-based expert-level reasoning to ONCOCIN's performance. Our research in model-based reasoning is embodied in a program known as ONYX. This system is based on the observation that creative planning strategies in the oncology domain (and many other fields) appear to involve a three-step process: (1) heuristic generation of a smail number of plans, i.e., plausible responses to the problem at hand, (2) mental simulation (also called "envisionment”) of how the patient would respond over time if each of those plans were carried out, and (3) selection of a preferred plan based upon the likelihood of the various possible outcomes and the value placed on those outcomes by the patient and physician. Step 2 in this process involves patient- specific simulation of tumor pathophysiology and drug action, but it also depends on recognition that the outcomes of interventions cannot be predicted with certainty and that probabilistic predictions are more realistic. Thus, model-based probabilistic simulations in ONYX are coupled to a decision analytic module which assists with the third step in the process. The work outlined here is preliminary. Each of the components in ONYX may be generalized for use in other systems. We have concentrated our work on the decision analysis component. We are building tools that will allow experts to frame the comparison between several possible treatments that could be administered at one point in a patient's course. Often these treatments will de E. H. Shortliffe 22 5P41-RR00785-14 Details of Technical Progress variations on the standard treatment, but with reduced dosages or delayed time of treatment. An important part of the treatment decision concerns the patient's evaluation of the possible outcomes and their likelihood, as represented in the utility of the various plans. The program we have built carries out a dialog with patient to assess the utilities, builds a decision tree, and prints out the “best” choice. A graphical representation of the decision problem is build on the computer display as the dialog takes place. A major problem with decision analysis programs have been the way that the choice is explained to the user. Often, the answer is in the form of one utility number for each choice. Most computer systems for decision trees allow the user to see how much the utilities will change as the probabilities of the expected events are modified. What is not available, is an explanation, in English, of why one choice is better than another. As part of his Ph.D. research, Curtis Langlotz has built a system that can create a rationale for the selection. The program compares various parts of the decision tree, looking for differences in the problem structure that account for the variation in the final utilities for the problem. This explanation program has been tested with several decision problems from different areas of medicine: treatment of heart disease, antibiotic selection, and cancer treatment. 5 - Implementation of the ONCOCIN Workstation in the Stanford Clinic In mid-1986, we placed the workstation version of ONCOCIN into the Oncology Day Care clinic. This version is a completely different program from the version of ONCOCIN that was available in the clinic from 1981-1985 -- using protocols entered through the OPAL program, with a new graphical data entry interface, and revised knowledge representation and reasoning component. One person in the clinic (Andy Zelenetz) became primarily responsible for making sure that our design goals for this version of ONCOCIN were met. His suggestions included the addition of key protocols and the ability to have the program be useful for clinicians as a data management tool if the complete treatment protocol had not yet been entered into the system. Both of these suggestions were carried out during this year, and the program has achieved wider use in the clinic setting. In addition, laser-printed flowsheets and progress notes have been added to the clinic system. The process of entering a large number of treatment protocols in a short period of time led to other research topics including: design of an automated system for producing meaningful test cases for each knowledge base, modification of the design of the time-oriented database and the methods for accessing the database, and the development of methods for graphically viewing multiple protocols that are combined into one large knowledge base. These research efforts will continue into the next year. In addition, some of the treatment regimens developed for the original mainframe version are still in use and can be transferred to the new version of ONCOCIN. The process of converting this knowledge will also be undertaken in the next year. As the knowledge base grows, additional mechanisms will be needed for the incremental update and retraction of protocols. We also developed new insights about the design of the internal structures of the knowledge base (eg., the relationship between the way we refer to chemotherapies, drugs, and treatment visits). We will continue to optimize the question-asking procedure, improve the method for traversing the plan structure in the knowledge base, and consider alternative arrangements used to represent the structure of chemotherapy plans. Although we have concentrated our review of the ONCOCIN design primarily on the data provided by additional protocols, we know that non-cancer therapy problems may also raise similar issues. The E-ONCOCIN effort is designed to produce a domain-independent therapy planning system that includes the lessons learned from our oncology research. 23 E. H. Shortliffe Details of Technical Progress 5P41-RRO0785-14 6 - Personnel The development of the generalized version of each of the ONCOCIN components has been undertaken by a large group of computer scientists and physicians. Samson Tu has had primary responsibility for the extensions to the design of the knowledge base, Clifford Wulfman has had primary responsibility for extensions to the data entry interface. David Combs has had primary responsibility for the knowledge acquisition interface. Janice Rohn has been involved with protocol and data management, and has primary responsibility for the implementation of the program that sets up the QNCOCIN user environment. Christopher Lane has developed the object-oriented systems software upon which the entire ONCOCIN system is designed. E. H. Shortliffe 24 5P41-RR00785-14 Details of Technical Progress IJI.A.3.3. Core AI Research 1 - Rationale Artificial Intelligence (AI) methods are particularly appropriate for aiding in the management and application of knowledge because they apply to information represented symbolically, as well as numerically, and to reasoning with judgmental rules as well as logical ones. They have been focused on medical and biological problems for over a decade with considerable success. This is because, of all the computing methods known, AI methods are the only ones that deal explicitly with symbolic information and problem solving and with knowledge that is heuristic (experiential) as well as factual. Expert systems are one important class of applications of AI to complex problems -- in medicine, science, engineering, and elsewhere. An expert system is one whose performance level rivals that of an human expert because it has extensive domain knowledge (usually derived from an human expert); it can reason about its knowledge to solve difficult problems in the domain; it can explain its line of reasoning much as an human expert can; and it is flexible enough to incorporate new knowledge without reprogramming. Expert Systems draw on the current stock of ideas in AI, for example, about representing and using knowledge. They are adequate for capturing problem- solving expertise for many bounded problem areas. Numerous high-performance, expert systems have resulted from this work in such diverse fields as analytical chemistry, medical diagnosis, cancer chemotherapy management, WLSI design, machine fault diagnosis, and molecular biology. Some of these programs rival human experts in solving problems in particular domains and some are being adapted for commercial use. Other projects have developed generalized software tools for representing and utilizing knowledge (eg., EMYCIN, UNITS, AGE, MRS, BBl1, and GLISP) as well as comprehensive publications such as the three-volume Handbook of Artificial Intelligence and books summarizing lessons learned in the DENDRAL and MYCIN research projects. There is considerable power in the current stock of techniques, as exemplified by the rate of transfer of ideas from the research laboratory to commercial practice. But we also believe that today’s technology needs to be augmented to deal with the complexity of medical information processing. Our core research goals, as outlined in the next section, are to analyze the limitations of current techniques and to investigate the nature of methods for overcoming them. Long-term success of computer-based aids in medicine and biology depend on improving the programming methods available for representing and using domain knowledge. That knowledge is inherently complex: it contains mixtures of symbolic and numeric facts and relations, many of them uncertain; it contains knowledge at different levels of abstraction and in seemingly inconsistent frameworks; and it links examples and exception clauses with rules of thumb as well as with theoretical principles. Current techniques have been successful only insofar as they severely limit this complexity. - As the applications become more far-reaching, computer programs will have to deal more effectively with richer expressions and much more voluminous amounts of knowledge. This report documents progress on the basic or core research activities within the Knowledge Systems Laboratory (KSL), funded in part under the SUMEX resource as well as by other federal and industrial sources. This work explores a broad range of basic research ideas in many application settings, all of which contribute in the long term to improved knowledge based systems in biomedicine. 25 E. H. Shortliffe