SUMEX STANFORD UNIVERSITY MEDICAL EXPERIMENTAL COMPUTER RESOURCE RR - 00785 ANNUAL REPORT - YEAR 08 Submitted to BIOTECHNOLOGY RESOURCES PROGRAM NATIONAL INSTITUTES OF HEALTH June 1, 1981 STANFORD UNIVERSITY SCHOOL OF MEDICINE Edward A. Feigenbaum, Principal Investigator 7. DEPARTMENT OF HEALTH AND HUMAN SERVICES PHS GRANT NUMBER: PUBLIC HEALTH SERVICE NATIONAL INSTITUTES OF HEALTH DIVISION OF RESEARCH RESOURCES BIOTECHNOLOGY RESOURCES PROGRAM ANNUAL PROGRESS REPORT P41RR-00785-08 TITLE OF GRANT: S U MEDICAL EXPERIMENTAL COMPUTER RESOURCE (SUMEX) NAME OF RECIPIENT INSTITUTION: Stanford University HEALTH PROFESSIONAL SCHOOL: School of Medicine REPORTING PERIOD: 5a, FROM: August 1, 1980 5b. TO: July 31, 1981 PRINCIPAL INVESTIGATOR: 6a. NAME: Edward A. Feigenbaum, Ph.D. 6b. TITLE: Professor and Chairman Department of Computer Science 6c. SIGNATURE: Od, Uf A Fea hi Chae 4 Ce Cte DATE SIGNED: ip line? TELEPHONE: (415) 497-4079 P41 RROO785-08 Table of Contents Section List of Figures I. Narrative Description eee ee I.A Summary of Research Progress I.A.1 Overview of Objectives and Rationale I.A.1.1 What is Artificial Intelligence T.A.1.2 Resource Sharing 1.A.1.3 Impact of AI in Biomedicine I.A.2 Synopsis of Recent Progress I.A.3 Details of Technical Progress I.A.3.1 Facility Hardware . T.A.3.2 System Software T.A.3.3 Network Communications I.A.3.4 User Software I.A.3.5 Documentation and Education I.A.3.6 Software Compatibility and Sharing I.A.3.7 Core Research I.A.3.8 Resource Operations Statistics T.A.3.9 SUMEX Staff Publications I.A.3.10 Future Plans 1.B Highlights 1.B.1 Handbook of Artificial Intelligence I.B.2 Tutorial on AI in Clinical Medicine Page 17 19 25 26 26 Loe ew 28 29 46 47 64 64 66 E. A. Feigenbaum Table of Contents IT. qT. I. I. II. Il. I .B.3 GENET - Dissemination of AI Tools for Molecular Genetics .B.4 AGE - A Tool for Knowledge-Based System Development . . . .B.5 ONCOCIN- An Oncology Chemotherapy Advisor Administrative Changes Resource Management and Allocation .D.1 Management Committees . . . . «© «© «© « .D.2 New Project Recruiting .D.3 Stanford Community Building .D.4 Existing Project Reviews .D.5 Resource Allocation Policies Dissemination Efforts . . . . «2. «6 . -E.1 Sixth AIM Workshop . . . . «© «© «© «© « + .E.2 Tutorial on AI in Medicine .E.3. GENET - An Experiment in AI System Dissemination Comments on the Biotechnology Resources Program Description of Scientific Subprojects Scientific Subprojects II.A.1 Stanford Projects II.A.1.1 AGE - Attempt to Generalize II.A.1.2 AI Handbook Project II.A.1.3 DENDRAL Project II.A.1.4 EXPEX Project II.A.1.5 MOLGEN Project II.A.1.6 MYCIN Projects Group II.A.1.7 Protein Structure Project II.A.1.8 RX Project E. A. Feigenbaum ii P41 RROO785-08 67 69 71 73 74 74 75 76 76 77 79 79 82 84 88 89 89 90 91 99 103 129 136 144 166 172 P41 RROO785-08 II Il II .A.2 II II IT. IT. Il. II. Il. II. A.3 II Il. II -A.4 II II II Table of Contents National AIM Projects . . . . . A, vA. 2. 2 .2 1 .2 .8 Pilot A. 3. 3. A.3.1 2 Pilot cA, cA, A. 4, 4, 1 2 4.3 Acquisition of Cognitive Procedures (ACT) CADUCEUS Project (INTERNIST) Hierarchical Models of Human Cognition PUFF-VM Project Rutgers Computers in Biomedicine Project [Rutgers-AIM]} Simulation of Cognitive Processes SECS - Simulation and Evaluation of Chemical Synthesis SOLVER Project . Stanford Projects Protein Secondary Structure Project . Ultrasonic Imaging Project . DECIDER Project: The Psychology of Expert Judgment ee et AIM Projects AI-COAG: Coagulation Expert Project . DATA Project EXCHANGE Project II.A.4.4 MELANOMA Project Books, Papers, and Abstracts Resource Summary Table iii E. A. 181 182 187 194 201 212 226 231 239 245 246 249 257 259 260 265 272 275 276 277 Feigenbaum Table of Contents P41 RROO785-08 Appendix A Community Growth and Project Synopses . . . . . « « « « 278 Appendix B AI Handbook Outline Ce ee ee ee ee 808 Appendix C AIM Management Committee Membership we ee ee ee 809 References SE E. A. Feigenbaum iv P41 RROO785-08 Figure 1. Current SUMEX-AIM KI-10 Computer Configuration 2. Current SUMEX-AIM 2020 Computer Configuration 3. Intermachine Connections via ETHERNET 4. ARPANET Geographical Network Map 5. ARPANET Logical Network Map 6. Total CPU Time Consumed by Month 7. Peak Number of Jobs by Month 8. Peak Load Average by Month 9. Monthly CPU Usage by Community 10. Monthly File Space Usage by Community 11. Monthly Terminal Connect Time by Community 12. TYMNET Terminal Connect Time 13. ARPANET Terminal Connect Time 14. Planned Ethernet System to Integrate System Hardware 16. GENET CPU Usage by Month Et. A. Table of Contents Page 14 15 16 23 24 30 31 31 33 34 35 43 44 55 87a Feigenbaum Table of Contents P41 RROO785-08 17. GENET Connect Time by Month . . . . . we eee 87 18. GENET File Space by Month . . . 2. 0. ww ee ee 87b 15. SUMEX-AIM Growth by Community we ee ee ee ee 278 E. A. Feigenbaum vi P41 RROO785-08 Annual Report I Narrative Description This is an annual report for the Stanford University Medical EXperimental computer resource for applications of Artificial Intelligence in Medicine (SUMEX-AIM). It covers the period between May 1, 1980 and April 30, 1981. We are about to begin a 5-year renewal of the SUMEX resource grant which will launch an important and exciting new phase for SUMEX-AIM community research. Recent successes in developing expert systems, many of them stemming from projects in the SUMEX-AIM community, have stimulated increasing interest in AI research from many fronts. At the same time, the on-going revolution in computational tools, made possible by larger and larger scale microelectronic integration, is making routine applications of AI systems more practical and effective. Our approved renewal goals focus principally on a merging of state-of-the-art community research in biomedical AI applications with these new computing tools and on the challenges they will bring to the SUMEX-AIM community and resource. We expect that the integration and exploitation of these emerging computer technologies that will have a profound effect on the development and export of practical biomedical AI programs. This report on the last year in our current 3-year grant is thus, in a sense, a culmination of the early phase of the SUMEX resource. This phase has been characterized by the building of a national community of biomedical AI collaborators around a central resource located at Stanford University. Beginning with 5 projects in 1973, the AIM community grew to 11 major projects at our renewal in 1978 and currently numbers 16 fully authorized projects plus a group of 7 pilot efforts. Many of the computer programs under development by these groups are maturing into tools increasingly useful to the respective research communities. The demand for production-level use of these programs has surpassed the capacity of the present SUMEX facility and has raised important issues of how such software Systems can be optimized for production environments, exported, and maintained. To be sure, we will continue to seek interesting new AI applications in an expanding community of biomedical and computer scientists interacting through electronic media. However, we expect the SUMEX-AIM community to develop a somewhat different character in the coming years. It will become more decentralized in terms of computing resources, more diverse in scope, and even more heavily dependent on network communication facilities for interactions, collaborations, and sharing. The following sections report on the activities of the SUMEX-AIM resource this past year including brief summaries of the objectives of SUMEX-AIM, a characterization of biomedical AI research, resource organization and operating procedures, recent core progress in system development and basic AI research, and progress in the collaborative projects. 1 E. A. Feigenbaum Summary of Research Progress P41 RROO785-08 I.A Summary of Research Progress I.A.1 Overview of Objectives and Rationale SUMEX-AIM ("SUMEX") is a national computer resource with a dual mission: a) promoting applications of computer science research in artificial intelligence (AI) to biological and medical problems and b) demonstrating computer resource sharing within a national community of health research projects. The central SUMEX-AIM facility is located physically in the Stanford University Medical School and serves as a nucleus for a community of medical AI projects at universities around the country. SUMEX provides computing facilities tuned to the needs of AI research and communication tools to facilitate remote access, inter- and intra-group contacts, and the demonstration of developing computer programs to biomedical research collaborators. 1.A.1.1 What is Artificial Intelligence Artificial Intelligence research is that part of Computer Science concerned with symbol manipulation processes that produce intelligent action [1 - 7]. By "intelligent action” is meant an act or decision that is goal-oriented, is arrived at by an understandable chain of symbolic analysis and reasoning steps, and utilizes knowledge of the world to inform and guide the reasoning. Placing AI in Computer Science A simplified view relates AI research with the rest of computer science. The ways in which people use computers to accomplish tasks can be "one-dimensionalized" into a spectrum representing the nature of the instructions that must be given the computer to do its job; call it the What-to-How spectrum. At the How extreme of the spectrum, the user supplies his intelligence to instruct the machine precisely how to do his job, step-by-step. Progress in computer science may be seen as steps away from that extreme How point on the spectrum: the familiar panoply of assembly languages, subroutine libraries, compilers, extensible languages, etc. illustrate this trend. At the other extreme of the spectrum, the user describes What he wishes the computer to do for him to solve a problem. He wants to communicate what is to be done without having to lay out in detail all necessary subgoals for adequate performance. Still, he demands a reasonable assurance that he is addressing an intelligent agent that is using knowledge of his world to understand his intent, complain or fill in his vagueness, make specific his abstractions, correct his errors, discover appropriate subgoals, and ultimately translate What he wants done into detailed processing steps that define How it shall be done by a real computer. The user wants to provide this specification of What to do in a language that is comfortable to him and the problem domain (perhaps E. A. Feigenbaum 2 P41 RROO785-08 Overview of Objectives and Rationale English) and via communication modes that are convenient for him (including perhaps speech or pictures). The research activity aimed at creating computer programs that act as "intelligent agents" near the What end of the What-to-How spectrum can be viewed aS a long-range goal of AI research. Expert Systems and Applications The national SUMEX-AIM resource is an outgrowth of a long, interdisciplinary line of artificial intelligence research at Stanford concerned with the development of concepts and techniques for building "expert systems” [1]. An "expert system" is an intelligent computer program that uses knowledge and inference procedures to solve probtems that are difficult enough to require significant human expertise for their sotution. For some fields of work, the knowledge necessary to perform at such a level, plus the inference procedures used, can be thought of as a model of the expertise of the expert practitioners of that field. The knowledge of an expert system consists of facts and heuristics. The "facts" constitute a body of information that is widely shared, publicly available, and generally agreed upon by experts in a field. The "heuristics" are the mostly-private, little-discussed rules of good judgment (rules of plausible reasoning, rules of good guessing) that characterize expert-level decision making in the field. The performance ievel of an expert system is primarily a function of the size and quality of the knowledge base that it possesses. Currently authorized projects in the SUMEX community are concerned in some way with the application of AI to biomedical research (*). The tangible objective of this approach is the development of computer programs that will be more general and effective consultative tools for the clinician and medical scientist. There have already been promising results in areas such as chemical structure elucidation and synthesis, diagnostic consultation, and modeling of psychological processes. Needless to say, much is yet to be learned in the process of fashioning a coherent scientific discipline out of the assemblage of personal intuitions, mathematical procedures, and emerging theoretical structure comprising artificial intelligence research. State-of-the-art programs are far more narrowly specialized and inflexible than the corresponding aspects of human intelligence they emulate; however, in special domains they may be of comparable or greater power, e.g., in the solution of formal problems in organic chemistry. (*) Brief abstracts of the various projects can be found in Appendix A on page 278 and more detailed progress summaries in Section II on page 89. 3 E. A. Feigenbaum Overview of Objectives and Rationale P41 RROO785-08 I.A.1.2 Resource Sharing Besides the biomedical AI research theme of SUMEX~AIM, another central goal is an exploration of the use of computer-based communications as a means for interactions and sharing between geographically remote research groups engaged in biomedical computer science research. This facet of scientific interaction is becoming increasingly important with the explosion of complex information sources and the regional specialization of groups and facilities that might be shared by remote researchers [8]. We expect an even greater decentralization of computing resources in the coming years with the emerging VLSI (*) technology in microelectronics and a correspondingly greater role for digital communications. Our community building effort is based upon the current state of computer communications technology. While far from perfected, these developing capabilities offer highly desirable latitude for collaborative linkages, both within a given research project and among them. A number of the active projects on SUMEX are based upon the collaboration of computer and medical scientists at geographically separate institutions; separate both from each other and from the computer resource. The network experiment also enables diverse projects to interact more directly and to facilitate selective demonstrations of available programs to physicians, scientists, and students. We have actively encouraged the development of additional affiliated computing resources within the AIM community and expect such decentralization to become the "way of the 80's". Since 1977, the facility at Rutgers University has allocated a portion of its capacity for national AIM projects and our network connections to Rutgers and common facilities for user terminals have been indispensable for effective interchanges between community members, workshop coordinations, and software sharing. In addition, the "Caduceus" project (**) (page 187) is expecting delivery of their own machine momentarily, the "Simulation of Cognitive Processes" project (page 226) already is doing most of their work on their own VAX computer, and several more projects have proposed machines dedicated to their own use. The proliferation of distributed machines will serve to increase the importance of electronic communications to facilitate interactions and sharing. Even in their current developing state, communication facilities enable effective access to the SUMEX community resources from a great many areas of the United States and to a more limited extent from Canada, Europe, Japan, Australia, and other international locations. (*) Very Large Scale Integration (**) Previously called "Internist". E. A. Feigenbaum 4 P41 RROO785-08 Overview of Objectives and Rationale I.A.1.3 Impact of AI in Biomedicine Artificial Intelligence is the computer science of symbolic representations of knowledge and symbolic inference. There is a certain inevitability to this branch of computer science and its applications, in particular, to medicine and biosciences. The cost of computers will continue to fall drastically during the coming two decades. As it does, many more of the practitioners of the world's professions will be persuaded to turn to economical automatic information processing for assistance in managing the increasing complexity of their daily tasks. They will find, from most of computer science, help only for those of their problems that have a mathematical or statistical core, or are of a routine data- processing nature. But such problems will be relatively rare, except in engineering and physical science. In medicine, biology, management -- indeed in most of the world's work -- the daily tasks are those requiring symbolic reasoning with detailed professional knowledge. The computers that will act as "intelligent assistants" for these professionals must be endowed with symbolic reasoning capabilities and knowledge. The growth in medica? knowledge has far surpassed the ability of a single practitioner to master it all, and the computer's superior information processing capacity thereby offers a natural appeal. Furthermore, the reasoning processes of medical experts are poorly understood; attempts to model expert decision making necessarily require a degree of introspection and a structured experimentation that may in turn improve the quality of the physician's own clinical decisions, making them more reproducible and defensible. New insights that result may also allow us more adequately to teach medical students and house staff the techniques for reaching good decisions, rather than merely to offer a collection of facts which they must independently learn to utilize coherently. The knowledge that must be used is a combination of factual knowledge and heuristic knowledge. The latter is especially hard to obtain and represent since the experts providing it are mostly unaware of the heuristic knowledge they are using. Medical and scientific communities currently face many widely recognized problems relating to the rapid cumulation of knowledge, for example: - codification of theoretical and heuristic knowledge - effective use of the wealth of information implicitly available in textbooks, journal articles and from practitioners - dissemination of that knowledge beyond the intellectual centers where it is collected - customizing the presentation of that knowledge to individual practitioners as well as customizing the application of the information to individual cases 5 E. A. Feigenbaum Overview of Objectives and Rationale P41 RROO785-08 We believe that computers are the most hopeful technology to help overcome these problems. While recognizing the value of mathematical modeling, statistical classification, decision theory and other techniques, we believe that effective use of such methods depends on using them in conjunction with less formal knowledge, including contextual and strategic knowledge. Artificial intelligence offers advantages for representing information and using it that will allow physicians and scientists to use computers as intelligent assistants. In this way we envision a significant extension to the decision making powers of individual practitioners without reducing the significance of the individuals. Knowledge is power, in the profession and in the intelligent agent. As we proceed to model expertise in medicine and its related sciences, we find that the power of our programs derives mainly from the knowledge that we are able to obtain from our collaborating practitioners, not from the sophistication of the inference processes we observe them using. Crucially, the knowledge that gives power is not merely the knowledge of the textbook, the lecture and the journal but the knowledge of “good practice” -- the experiential knowledge of "good judgment” and “good guessing", the knowledge of the practitioner's art that is often used in lieu of facts and rigor. This heuristic knowledge is mostly private, even in the very public practice of science. It is almost never taught explicitly; almost never discussed and critiqued among peers; and most often is not even in the moment-by-moment awareness of the practitioner. Perhaps the the most expansive view of the significance of the work of the SUMEX~AIM community is that a methodology is emerging therefrom for the systematic explication, testing, dissemination, and teaching of the heuristic knowledge of medical practice and scientific performance. Perhaps it is less important that computer programs can be organized to use this knowledge than that the knowledge itself can be organized for the use of the human practitioners of today and tomorrow. The researchers of the SUMEX-AIM community currently constitute a large fraction of all the computer scientists whose work is aimed at the development of symbolic computational methods and tools. SUMEX-AIM is laying the scientific base so that medicine will be able to take advantage of these technological opportunities for inexpensive computer power. Medical diagnostic aids and tools for the medical scientist that operate in a environment of a network of "professional workstation" computers have the practical possibility of large-scale and low-cost use because of anticipated near-term developments in the computing industry. E. A. Feigenbaum 6 P41 RROO785-08 Synopsis of Recent Progress T.A.2 Synopsis of Recent Progress As we complete year 08, we can report substantial further progress in the overall mission of the SUMEX-AIM resource. We have continued the refinement of an effective set of hardware and software tools to support the development of large, complex AI programs for medical research and to facilitate communications and interactions between user groups. We have worked to maintain high scientific standards and AI relevance for projects using the SUMEX-AIM resource and have actively sought new applications areas and projects for the community. Many projects are built around the communications network facilities we have assembled; bringing together medical and computer science collaborators from remote institutions and making their research programs available to still other remote users. As discussed in the sections describing the individual projects, a number of the computer programs under development by these groups have matured into tools increasingly useful to the respective research communities. The demand for production-level use of these programs has surpassed the capacity of the present SUMEX facility and in preparation for our renewal goals, we have been investigating the general issues of how such software systems can be moved from SUMEX and supported in production environments. A number of significant events and accomplishments affecting the SUMEX-AIM resource occurred during the past year: 1) In August 1980, under the chairmanship of Prof. Ted Shortliffe and with the assistance of Drs. L. Fagan and R. Blum, Stanford hosted the sixth AIM workshop. This workshop was innovative in that the presentations were fully "demo-based" using a tive video projection of program typescripts and actual running sessions. The purpose of this approach was to allow participants to see more deeply into the inner workings of the various systems under development. 2) In conjunction with the 1980 workshop, Drs. Clancey and Shortliffe organized a continuing education tutorial for practicing physicians. The tutorial session was attended by over 135 doctors and included an introduction to computing, background information on decision theory and database applications in medicine, and presentations on a number of AI systems by 15 members and affiliates of the SUMEX-AIM community. 3) In November 1980, we defended our pending renewal application hefore a peer review site visit team. The SUMEX-AIM community was represented by several members of the AIM Executive Committee. A strong endorsement for future SUMEX goals and a recommendation for a 5-year renewal period resulted. These were confirmed by study section and council action. The technical substance of our future goals are outlined beginning on page 47. 4) The SUMEX-AIM collaborator project community has continued vigorous development of their respective programs. Details are reported by the individual investigators in Section II. The VM and ONCOCIN projects have begun preliminary clinical testing/evaluation this past year using SUMEX network and computing resources. The CADUCEUS 7 E. A. Feigenbaum Synopsis of Recent Progress P41 RROO785-08 5) 6) 7) (INTERNIST) and SIMULATION OF COGNITIVE PROCESSES projects have been funded for and are setting up their own local VAX computing resources which should help reduce the load on SUMEX for newer pilot efforts. We have continued to work hard to meet the needs of collaborating projects and are grateful for their expressed appreciation. We supported a highly successful, experimental dissemination of the MOLGEN programs into the molecular biology community. "Advertised" through presentations and demonstrations by MOLGEN investigators at several professional conferences, over 200 molecular biologists have used the system and most have found it easy to learn and highly effective as a research tool for their investigations. We have continued development of the SUMEX facility hardware, software, and network systems to enhance throughput and to assist user access to existing and planned resources. A good range of internetwork software is available now including telnet, file transfer, and mai? handling. Following the council recommendation for approval of our renewal application, our request to augment the AMPEX memory was funded by BRP. We have installed the new memory and are in the process of tuning the monitor to optimize use of the increased user memory. We have actively explored options for professional workstation and VAX LISP systems in preparation for our renewal research. The current state of available systems is encouraging. However, delays in an operational version of Interlisp-VAX and an earlier than expected availability of Interlisp-Dolphin workstations has led us to recommend beginning the workstation phase of our research first. E. A. Feigenbaum 8 P41 RROO785-08 Details of Technical Progress 1.A.3 Details of Technical Progress The following material covers SUMEX-AIM resource activities over the past year in greater detail. These sections outline accomplishments in the context of the resource staff and the resource management. Details of the progress and plans for our external collaborator projects are presented in Section II beginning on page 89. I1.A.3.1 Facility Hardware Over the past year, the SUMEX facility hardware configuration, including the main KI-10 machine (Figure 1), the 2020 satellite machine (Figure 2), and system network interconnections (Figure 3), have continued to develop according to plan and to operate effectively within capacity limitations. The primary facility hardware development efforts this year have been directed at: 1) Augmentation of the 256K word AMPEX memory to 512K words. 2) Implementation of Ethernet interface equipment for the KI-10 and other network server facilities. 3) Investigation and planning of hardware alternatives for the system development goals of our renewal grant. 4) Support of local project hardware needs. Memory Augmentation The SUMEX-AIM facility has been operating at capacity in terms of prime-time computing load for the past several years as documented in our previous reports. In spite of implementing a number of strategic facility augmentations over the years, we have not been able to satisfy the computing demands of our community. This condition has constrained the growth of the AIM community and our ability to bring AI programs nearing operational status in contact with potential external user communities while continuing to support on-going program development efforts. We have taken active steps to transfer prime time interactive loading to evening and night hours as much as possible including shifting personnel schedules (particularly for Stanford-based projects). We have implemented tools to control the fair allocation of CPU resources between various user communities and projects and have encouraged jobs not requiring intimate user interaction to run during off hours using batch job facilities. And we have acquired a 2020 system to offload program demonstrations and evaluations from the main research machine. Despite these efforts, our prime time loading has remained at saturation. Perhaps the most significant effect of the resulting poor response time is the deterrence of interactions with medical and other professional collaborators experimenting with available AI programs, whose schedules cannot be adjusted to meet computer loading patterns. 9 E. A. Feigenbaum Progress - Facility Hardware P41 RROO785-08 From the SUMEX viewpoint, we have attempted to do everything feasible and economically justified within available budgets to maximize the use of the existing hardware for productive work. One remaining step has been the expansion of our AMPEX memory from its current 256K word complement to its full 512K word capacity. The effect of this upgrade is to make more physical memory available to user programs thereby reducing swapping overhead (page faults and interrupt handling) and smoothing out system responsiveness under heavy load by keeping more working sets in core. We requested approval for this expansion in May 1980. Following council approval of our renewal grant application, we received funding for the upgrade. The added memory was received and installed May 14, 1981, checked out during the following week, and a new 786K monitor brought up on May 21. This addition has increased user memory by about 60%. It is still too early to draw detailed conclusions about the effect of this enhancement and further tuning of monitor parameters controlling process scheduling and working set management needs to be done. We will report detailed results next year. Local Network Interfaces The initial design of the SUMEX system was that of a "star" topology centered on the KI-10 processors. In this configuration, all peripheral equipment and terminal ports were connected directly to the KI-10 busses, With the addition of new satellite machines, a unique focus no longer exists and some pieces of equipment need to be able to "connect" to more than one host. For example, a user coming into SUMEX over TYMNET will want to be able to make a selection of which machine he connects to. Another TYMNET user may want to make another choice of machine and so the TYMNET interface needs to be able to connect to any of the hosts. This could be accomplished by creating separate interfaces for each of the hosts to the TYMNET, each with a different address. Besides being expensive to duplicate such interface connections, it would be inconvenient for a user to reconnect his terminal from one host to another. Over the past year and a half we have been developing a local, high- speed Ethernet to provide a flexible basis for our planned facility developments. The KI-10's and the 2020 were connected in time to support the AIM workshop last summer. Our development of Ethernet facilities has been guided by the goals of providing the most effective range of services for SUMEX community needs while remaining compatible with and able to contribute to and draw upon network developments by other groups. Since the early 3 Mbit/sec Ethernet was given to Stanford and several other universities by Xerox, an agreement has been reached between DEC, INTEL, and Xerox on the standards for an even higher performance network [13]. The new network runs at 10 Mbits/sec and supports a significantly larger packet address space. Xerox has started to market products for the new network but debugged interfaces, software, etc. for general use are not routinely available yet. Furthermore, even though three companies have agreed on a set of low level protocols and interface conventions, the rest of the world may not go along. There is already an alternative (but closely related) IEEE specification in preparation. Even among the three parties in the Ethernet specification, there is no agreement on higher level protocols. E. A. Feigenbaum 10 P41 RROO785-08 Progress ~ Facility Hardware All of this suggests that it is not time to jump to the newer and faster networks yet. We feel the 3 Mbit/sec network is adequate for our bandwidth needs in the near future and there is already a significant investment in 3 Mbit/sec network equipment at Stanford related to SUMEX community interests. In the longer term, we will want to upgrade to whatever hardware and protocol standard is broadly adopted. In the meantime we are continuing to develop our 3 Mbit/sec PUP network services. This places a heavier burden on us to develop and maintain our own equipment for Ethernet support. We have tried to minimize the "home-brew" nature of this work by sharing common hardware and software designs with other groups in the same situation The initial KI-10 interface was made via a PDP-11 connected to the I/O bus which is inefficient under heavy traffic. In anticipation of increased Ethernet demands on the KI-10's for high-speed terminals, file transfers, and other server functions, we have been designing and implementing a more efficient direct memory access interface. This interface uses a phase decoder (design borrowed from the SUN terminal project at Stanford) to detect the incoming serial Ethernet signal, an internal packet buffer to prevent overruns to and from the TENEX time- sharing system, and a memory bus interface to transfer data. The KI-10 DMA interface is partially debugged while highest priority work is proceeding on a gateway to the computer science building across campus. In our initial connections of the KI-10 and 2020, we used a UNIBUS interface board designed by E. Markowski at Xerox. Because of the limited availability of these boards for our future work (an immediate need being for a gateway between various campus Ethernets), we began work an a PDP-11 interface board. This design is simitar to that of the KI-10 interface and shares the serial phase decoder network front end. It provides several features not available on the Xerox board including more explicit error information and a more sophisticated filter on source addresses for incoming packets. Planning for the Renewal Period Over the past year we have spent considerable effort evaluating Strategies and alternatives for planned system development in our renewal grant. Pending funding, council has approved our plan to acquire two VAX machines, five professional workstations, and a file server for the SUMEX resource starting in August 1981. We have debated at length the appropriate timing for purchases of this equipment within budget constraints. The Initial Review Group and Council enthusiastically endorsed the importance of optimizing the timing of our planned hardware acquisitions to coincide with the availability of desired technological developments and community needs. They recommended in their report that we be allowed considerable flexibility as to phasing of equipment purchases within the 5-year renewal period. The rapidly changing technical and commercial situation vis a vis the research computing equipment we plan to buy if funded, indicates that there would be significant advantage to the SUMEX-AIM community in exercising 11 E. A. Feigenbaum Progress - Facility Hardware P41 RROO785-08 this flexibility by delaying the purchase of our first VAX until the second renewal year (grant year 10) and advancing the purchase of the Professional Workstations to the first year (grant year 09). The rationale for this switch is as follows: ; 1) 2) 3) The INTERLISP language has been the basis for most SUMEX-AIM community Al research. Development of the VAX INTERLISP system at USC-ISI is substantially behind schedule. The most current estimate for completing a usable system is mid-1982 and no viable alternative version of LISP, with a fully developed programming support environment, will be available any sooner than that, Thus, if we purchased a VAX in year 09, we could not offer effective VAX LISP services before year 10. Strong pressure does exist within the ARPANET community to get VAX INTERLISP completed as expeditiously as possible so we believe that VAX will be a good machine choice by year 10 once INTERLISP is running. We are undertaking a separate study of this situation to assess the likelihood of VAX/INTERLISP being compteted in a timely fashion and to estimate its performance characteristics on the VAX 11/780. If the interchange in timing of the purchases of the first VAX and the Professional Workstations is approved, we have agreed that SUMEX-AIM will have shared access next year to the VAX 11/780 funded by ARPA to support Stanford Heuristic Programming Project research. This will minimize any delays in SUMEX-AIM work involving VAX that is not dependent on INTERLISP and witl enable necessary systems development work and preliminary experimentation by SUMEX-AIM users to proceed without having to commit NIH grant funds. Because of long term commitments for the ARPA VAX and expected growth in SUMEX community needs, however, it can only substitute on a temporary basis during the first renewal grant year. After that a VAX dedicated solely to SUMEX- AIM community use will be needed. The DEC VAX product line is continuously changing and there are some indications that new products may be offered on the 1982-1983 time scale that would be advantageous to SUMEX-AIM research. These may include features that enhance technical performance and/or cost effectiveness for our purposes. By year 10, we should be able to make a more judicious choice of the best configuration for our needs. While VAX/INTERLISP is delayed, a suitable model of the professional workstation we need for our experimentation is available earlier than expected. The Xerox Dolphin is a system that has been in use as a research machine within Xerox for some time. It meets our technological needs including a high-bandwidth bit-mapped display terminal, full TENEX INTERLISP software compatibility, increased address space over the PDP-10 (but not as large as will be available on the VAX), acceptable capacity (roughly twice the single-user KA-10 speed), and existing Ethernet hardware/software support. Dolphins will be produced shortly in limited quantity by Xerox EOS, primarily for the ARPANET computer science community, and will be available for delivery beginning in August 1981. Their cost is currently higher than that expected for comparable systems several years from now. However, the E. A. Feigenbaum 12 P41 RROOQ785-08 Progress - Facility Hardware immediate purchase of the limited quantity planned will be cost- effective in allowing research to proceed in the SUMEX-AIM community on software that will be needed to exploit these later systems. 5) If we purchase the five INTERLISP-Dolphins in year 09, a significant increase in LISP processing capacity will be added to the SUMEX-AIM resource earlier than would be possible with VAX/INTERLISP. Even though these are intended primarily for stand-alone use, they nevertheless will afford badly needed relief for the overloaded central machines since the people using the Dolphins will not be running INTERLISP simultaneously on the KI-10's or 2020. In quantitative terms, taking a Dolphin to be about equal in speed to two KA-10's, the five Dolphins will roughly treble our current dual KI-10/2020 computing capacity. Based on this rationale, it seems clearly to be in the best interest of the SUMEX-AIM community to delay the acquisition of the first VAX system and to accelerate the purchase of the five Professional Workstations. Other Hardware Development We have undertaken other hardware efforts as appropriate during the past year. Most significant of these was the development of a controller for a printer in the Stanford Oncology Clinic to support the ONCOCIN evaluation getting underway. This printer is part of an existing internal information system in use by the clinic. In order to integrate the printout from ONCOCIN sessions, we needed to provide a flexible connection to the SUMEX facility spoolers. We built a Z80-based microprocessor controller that senses status of the printer and performs buffering, flow control, and data rate conversion so it can act as a remote printer to the SUMEX machines when needed for ONCOCIN sessions. In addition we have provided broad support to users for terminal and communications connections and repairs. 13 E. A. Feigenbaum Progress - Facility Hardware P41 RROO785-08 Ethernet Interface AMPEX Memory DEC Memory ARM10-LX 4x MF-10 512K Words 256 K Words 4port memory bus DEC Central DEC Central Processor #0 Processor #1 DEC Memory KI-10 KI-10 Multiplexer : | MX-10C DEC & Digital Development Drum System 1.7M words TYMNET Interface 4800 Bit/Sec < 1/0 Bus ARPANET 50K Bit/Sec Lines Direct 513 IMP Memory Access Ethernet Interface Data Products Line Printer 2410 System Concepts Calcomp Tape SA-10 DEC/IBM Controller & Interface 2x Drives Dual DECtape 347-A Drives TD-10 Calcomp Disk DEC TTY Controller & Scanner 32 lines 2x Orives 0C-10 local dial-ups 235-11 64 Lines total 32 lines Caicomp Plotter TTL 1/0 Bus 60 dedicated 565 Extension Line Switch lines 32 x 64 SUMEX 2020 interim PDP 11/10 4lines Figure 1. Current SUMEX-AIM KI-10 Computer Configuration E. A. Feigenbaum 14 P41 RROO785-08 Progress - Facility Hardware DEC Memory 512K words (MOS) DEC Central Processor KS-10 Unibus Adapter DEC Disk RP-06 Unibus Adapter DEC Magnetic Tape TU-45 DEC Line Scanner DZ-11 -——— KI-10 ETHERNET Interface Figure 2. Current SUMEX-AIM 2020 Computer Configuration 15 E. A. Feigenbaum Progress - Facility Hardware P41 RROO785-08 ETHERNET [_ — UC Santa Cruz Stanford CSD Gateways SciT Stanford Chemistry | UC San Francisco XEROX Alto 1/O Peripherals (LPT, PLT, ...) KI-TENEX L_ —_—— —_— System - -—- TYMNET 4800 bit/sec lines Ne Interface LL ww —_ SOK bit/sec lines ARPANET Link a [— —_ Ether TIP L __ SUMEX 2020 ETHERNET Figure 3. Intermachine Connections via ETHERNET E. A. Feigenbaum 16 P41 RROO785-08 Progress - Facility Hardware T.A.3.2 System Software Our monitor software work this past year has concentrated on several areas including changes to support hardware development projects, upgrading and enhancing network interface service, correcting encountered system bugs, and implementing new features for better user community support. In addition we have invested substantial effort in becoming familiar with the VAX/UNIX system which will play a key role in our future research efforts. Hardware Implementation In parallel with our principal hardware efforts this past year to extend and improve local Ethernet connections, the necessary monitor changes to support the new hardware are being made. The largest effort, for which debugging is still on-going, has been the direct memory access Ethernet interface for the KI-10 duplex. This interface has been partially completed, including developing new interrupt service routines and facilities for doing hardware debugging during time-sharing so as not to disrupt availability of the system. Completion of this work has been delayed by placing higher priority on building a gateway connection to the Department of Computer Science building across campus. Since the KI-10's have a working, albeit inefficient, PDP-11 interface already, we decided our limited development resources were better used in establishing this badly needed new capability. Additional work has gone into upgrading software support for the terminal hardline switch (SLM) developed last year. These improvements were to correct several problems in assigning terminals to available lines and to improve user feedback on system status while negotiating for a connection. Network Interface Service Effective January 1, 1981, the ARPANET formally changed the standard for packet "leaders" to allow addressing more hosts on an Interface Message Processor (IMP) and more IMP's on the network. This change required substantial upgrades to the monitor ARPANET service routines including the internal handling of data packets and two new JSYS's that communicate with user programs about network information. We imported much of the new code from ARPANET sites working on the development of network software (especially USC-ISI and SRI) but considerable work was required to adapt it to our dual-processor monitor and operating environment. The changeover went extremely smoothly with most users unaware that a change was taking place. We expect further changes to be required by early 1983 when the higher level communication protocols will move from NCP (network control protocol) to TCP (transmission control protocol). This past year we have also continued to develop the Ethernet PUP software including improved hardware interfaces discussed above and numerous bug fixes. Many of the bug fixes relate to interactions between the PUP management software and other parts of the monitor such as the teletype handler. The Ethernet software is running very reliably now. 17 E. A. Feigenbaum Progress - System Software P41 RROO785-08 Monitor Bug Fixes and Improvements We have continued to repair important bugs in our TENEX monitor. In general the system runs extremely reliably with most problems coming from explicit hardware malfunctions or periods of instability fotlowing Significant monitor changes. We found an additional number of subtle bugs in the system this past year that had been causing various problems. By now, all of the “obvious” bugs have been located and so those remaining are much more elusive, occurring infrequently or only after a long chain of rare events that is difficult to reconstruct. Examples of fixes include: 1) After an extended period of uptime, TYMNET users found all the ports to SUMEX in apparent use. This only happened after about 6.5 days of continuous uptime, itself a rather rare event. After a long search, we found an invalid index into one of the TYMNET connection database arrays which instead of testing the appropriate state bit, was testing a high order bit of a timer field. Thus, when that bit came on after being up for a long period, the test erroneously detected the port in use. 2) With the installation of the operational Ethernet, the overall timing of system functions changed, including the management of the drum service. Commands for page transfers were sent to the drum controller asynchronously as the requests were placed on the queue. This was done in the drum interrupt service and timing was such that new transfers could be posted "on the fly". As the Ethernet became operational, the timing of interrupt handling changed so that attempts to post these new transfer requests came at the wrong time for the drum controller and caused command sequence errors. A temporary fix was made to avoid this conflict but we still want to rework the drum management software to optimize performance. 3) Finally, users are invariably able to design system call arguments that present special cases to the monitor routines which don't work. We have repaired a number of such problems in various string handling JSYS's and in the floating point output JSYS. System Loading Controls We previously reported on the system load controls we have implemented on the KI-10 duplex to allocate available system capacity effectively among projects and users according to Executive Committee guidelines. These continue to operate effectively and we have not made any substantial changes in this area. All communities (National, Stanford, and Staff) are under load controls now. We have adjusted relative priorities for projects in the national community in accordance with Executive Committee reviews of the community in August 1980. We have instituted a mechanism for reserving the 2020 for demonstrations and developmental testing of various expert systems (e.g., DENDRAL, ONCOCIN, etc.). Because of the unpredictability of usage during E. A. Feigenbaum 18 P41 RROO785-08 Progress - System Software these reserved times, we feel that too much of the 2020 capacity is lost by simply dedicating the machine to such users. We are now reevaluating the reservation system, probably in favor of a "pie-slice” system that will guarantee dedicated users a large fraction of the machine but which allows other useful work to go on when their demand is low, Executive Program We have made several changes and improvements in the SUMEX EXECutive program this past year: 1) Many of the features of the EXEC that enhance its "friendliness" require access to auxiliary files. When we come up after a crash and there is file system damage to repair, these files may be compromised and in general extraneous file access at such times is undesirable. Thus we carefully reviewed the internals of the EXEC and made changes so that in debugging mode it operates "bare bones". All unneeded file accesses and interactions are eliminated. 2) Because of the difficulty in collecting definitive data about user experiences with network connections, we implemented a log of involuntary disconnects from network terminals. This log allows us to better correlate disconnects, looking for instances when all TYMNET users are disconnected or all users from a given node are disconnected as opposed to drops by individual users which may be caused by hanging up the telephone. We have now collected a database of these disconnects covering several months and indications are that TYMNET users are being dropped occasionally through some sort of network glitch. We are developing programs to better analyze these data so we can distinguish problems at the SUMEX end from those in the network so appropriate solutions can be worked out. 3) We implemented several layers of access constraints for GENET users (see page 84) including a limit to the number of simultaneous login's and a requirement for a-user password to restrict access for commercial users. These devetopments have in fact limited the growth of the GENET community as recommended by AIM Executive Committee policy. I.A.3.3 Network Communications A highly important aspect of the SUMEX system is effective communication with remote users and between the growing number of machines available within the SUMEX resource. In addition to the economic arguments for terminal access, networking offers other advantages for shared computing. These include improved inter-user communications, more effective software sharing, uniform user access to multiple machines and special purpose resources, convenient file transfers, more effective backup, and co-processing between remote machines. 19 E. A. Feigenbaum Progress - Network Communications P41 RROO785-08 We continue to base our remote communication services on two networks - TYMNET and ARPANET for reasons detailed in previous annual reports. Users asked to accept a remote computer as if it were next door will use a local telephone call to the computer as a standard of comparison. Current network terminal facilities do not quite accomplish the illusion of a local call. Data loss is not a problem in most network communications - in fact with the more extensive error checking schemes, data integrity is higher than for a long distance phone link. On the other hand, networking relies upon shared community use of communication lines to procure widespread geographical coverage at substantially reduced cost. However, unless enough total line capacity is provided to meet peak loads, substantial queueing and traffic jams result in the loss of terminal responsiveness. Limited responsiveness for character-oriented TENEX interactions continues to be a problem for network users. TYMNET TYMNET provides broad geographic coverage for terminal access to SUMEX, spanning the country and also increasingly accessible from foreign countries. Technical aspects of our connection to TYMNET have remained unchanged this past year and have continued to operate reasonably reliably. As noted earlier, however, users have complained periodically about having their connections dropped and we have implemented a data collection facility in the EXEC program to help document and classify these failures. There are definitely episodes in which all connections are lost and the jobs are detached. These occur about once every few days but we are still analyzing these data to try to separate out local from network causes. TYMNET has made few technical changes to their network that affect us other than to broaden geographical coverage. The previous network delay problems are still apparent although better cross-country trunks into New York and New England are available improving service there. TYMNET is still primarily a terminal network designed to route users to an appropriate host and more general services such as outbound connections originated from a host or interhost connections are only done on an experimental basis. This presumably reflects the tack of current economic justification for these services among the predominantly commercial users of the network. Whereas TYMNET is developing interfaces meeting X.25 protocol standards, the internal workings of the network will likely remain the same, namely, constructing fixed logical circuits for the duration of a connection and multiplexing characters in packets over each Tink between network nodes from any users sharing that link as part of their logical circuit. We have continued to purchase TYMNET services through the NLM contract with TYMNET, Inc. Because of current tariff provisions, there is no longer an economic advantage to this based on usage volume. SUMEX charges are computed on its usage volume alone and not the aggregate volume with NLM's contribution to achieve a lower rate. We have implemented the "dedicated port" charging system for SUMEX use and have realized a substantial reduction in monthly usage costs. We will continue to work closely with NIH-BRP and NLM to achieve the most cost-effective purchase of these services. E. A. Feigenbaum 20 P41 RROO785-08 Progress - Network Communications ARPANET We continue our advantageous connection to the Department of Defense's ARPANET, now managed by the Defense Communications Agency (DCA). Current ARPANET geographical and logical maps are shown in Figure 4 and Figure 5 on page 23. This connection has facilitated close collaboration with the Rutgers-AIM facility which is also on the net. Consistent with our long-standing agreements with ARPA and DCA we are enforcing a policy that restricts the use of ARPANET to users who have affiliations with DoD- supported contractors and system/software interchange with cooperating network sites. We are somewhat unique in this policy among other network sites since NIH has not become a member of the "sponsor's group" for the Network. We would strongly encourage this step so that biomedical users could have more uniform access to the superior facilities of the ARPANET. This will become increasingly important as more NIH-sponsored sites desire access to the net and each other. We have maintained good working relationships with other sites on the ARPANET for system backup and software interchange. Such day-to-day working interactions with remote facilities would not be possible without the integrated file transfer, communication, and terminal handling capabilities unique to the ARPANET. The ARPANET is also key to maintaining on-going intellectual contacts between SUMEX projects such as the Stanford Heuristic Programming Project authorized to use the net and other active AI research groups in the ARPANET community. As indicated in the discussion of monitor software development, we implemented a significant change in ARPANET software support this past January 1, 1981. This change added support for the extended (96-bit) leader for packets that allow more Interface Message Processors (IMP) on the network and more ports per IMP. Substantial changes to the monitor network control program were necessary as well as to various user-level programs (TELNET, FTP, NETSER, RSSER, NETSTAT, etc.). The changeover went extremely smoothly with most users unaware of any effect. ETHERNET A substantial portion of our system effort this past year went into continued development of local network facilities to refine the connection between the KI-10 duplex and the 2020, to extend our network ties to other parts of campus (especially to the Computer Science Department building where the Heuristic Programming Project sits), and to prepare for the addition of new hardware in the renewal grant. As indicated in the earlier sections on monitor software and hardware, much has been done to implement more effective and efficient low-level system network connection facilities for our host systems. We have also developed a number of software tools as a basis for implementing various kinds of Ethernet servers. These have been done in the language C, primarily because it is the Tanguage on which UNIX is based, has an active support community, and is being used for other network software that may be useful for our work. Specific areas of development include: 21 £. A. Feigenbaum Progress - Network Communications P41 RROOQ785-08 1) Server operating system: We have developed a simple operating system for use in servers that provides low-level interface to the Ethernet, hardware dependent interrupt service, process scheduling capabilities, and a series of defined monitor calls for invoking communication functions. This system is written initially for the PDP-11 but will also be portable to the MC-68000. 2) Higher Level Protocols: We have written routines that provide datagram, rendezvous/termination, and byte sequential protocol facilities on which other services such as EFTP, TELNET, etc. can be based. 3) We have written software for an Ethernet-to-Ethernet gateway that will establish connections between the SUMEX machine room and the Computer Science Department across campus. This system runs currently on a PDP-11/10 and supports dynamic assimilation of a routing table, periodic broadcast of this information to other hosts on connected networks, routing of addressed packets between connected networks, forwarding of key broadcast packets to allow distribution of network directories, and recording of gateway event status reports. 4) We have developed a wide range of diagnostic programs to assist in Ethernet software development including hardware diagnostics and downloading and debugging software. 5) We are actively working on the design of an Ethernet TIP to provide more terminal ports for the SUMEX system. INTERNET SOFTWARE One of the issues confronting the development of complex network-~ based systems, interconnected by gateway machines, is the support of internet communication of various kinds. For example, when a user at one of the Stanford Ethernet hosts wants to send a message to someone at MIT on a Chaosnet host, the mail handling programs have to know how to do the routing and the mail server programs have to be prepared to receive such mail for forwarding. Similarly, when establishing terminal telnet connections between such sites, the path of the link should be established automatically with the intervening sites merely acting as relay stations. In conjunction with groups at MIT and Stanford CSD, we have been developing prototypical systems for internet mail handling and telnet connections. The mail system is most highly developed and currently knows how to route messages between hosts on the Stanford Ethernet, the ARPANET, the MIT Chaosnet, the MIT LCSnet, and the Dialnet. This system has been operating since February. We are also running a version of TELNET developed by Mark Crispin at SU-SCORE that allows a user to establish a connection across network boundaries without having to log into each intervening gateway and telnet further to the next station of a path to the desired destination host. E. A. Feigenbaum 22 &2 unequabiajy “y °y Figure 4. ARPANET Geographical Network Map ARPANET GEOGRAPHIC MAP, MARCH 1981 oO a= —i ~ wa oO oO Ss oO ai ’ Q o LINCOLN () O O Qcmu NC HARVARD SL CEA A sna <7 Q aBeRDEEN = ES BBN72 HAWAII O ) COLLINS GUNTER rosins /f BRAGG O Eauin © Ae SATELLITE CIRCUIT © oO IMP TEXAS LONDON OTP APLURIBUS IMP OPLURIBUS TIP Oc30 (NOTE: THIS MAP DOES NOT SHOW ARPA‘S EXPERIMENTAL SATELLITE CONNECTIONS) NAMES SHOWN ARE IMP NAMES, NOT (NECESSARILY) HOST NAMES C) SUOLZROLUNWWOD YUOMPaN - SSauHoug wunequabLaj *y °3 ¥e *g aunbly dey yaomzan [2946507 LINVdYY MOFEETT ARPANET LOGICAL MAP, MARCH 1981 cor 1600 wu oD 3701595 ware Data cOMeutEen PoP 10 VARIAN 13 AME SIG AMES tS 4 0 w usis3 fora] [eee] | [ror] USt He. aiac tv & @ro0 [reac of ype it vor 1s 7? L sTaMFORD ® [ote 10 } 'sumt x C) uscs? POP 1T PLERSE NOTE THAT WonLE THIS MAP SHOWS Teel HOST POPULATION OF THE NETWORK ACCORDING TO THE BEST INF OOMA TION ATTAINABLE MO CLAIM CAN @f MADE FOR 17S accuaacy HOST COMPUTER CONFIGUMATION SUPPLIEO By THE NETWORE IF OMMATION CENTER NAMES SHOWN ARE IMP NAMES NOT OME CESSAAILY) HOST NAMES PS. ay 60S - AP }208. cori a 37073035 [preen }reel ror viet fave soe} Ay {vax ve fore 1050] Sirk — oma oe 9 ee [rors] ; vor 10 use roo foeenas) DEC - 1050) UNivac 1110 ™“ ror. [rors | lotc . 10904] por -1' POR. 10 roe-1t set vor re Llalad HOGINS BAAGG POP 16 PENTAGON vic 70408 arse mp 37 ror tt we tA 4GUIN ror - +1] sia -— Mies Voor ru UEC 20808 PoP NS (ee om Ae state PS = API208 ists? vor ve ror is COLLINS WP 000 REHOK ALIG POP 11/34) fore sosay corn GP ror oO PENGUIN \ YUMA are ween Tinas GUNTER MITRE 1S? ruunaus (Ter [ror a] [ror-i} [ror- 44) [ror NJ O me Qw & PUL hte © FLURIBUS TP Ow nw aM sare cite concuit DS VERY DISTANT MOST B0-S8LO0NYN Ltd UOLZeSLUNWWOD YLOMJaN - ssauboug P41 RROO785-08 Progress - Network Communications I.A.3.4 User Software We have continued to assemble and maintain a broad range of utilities and user support software. These include operational aids, statistics packages, DEC-supplied programs, improvements to the TOPS-10 emulator, text editors, text search programs, file space management programs, graphics support, a batch program execution monitor, text formatting and justification assistance, magnetic tape conversion aids, and many more. Over the past year we have undertaken several significant development efforts to provide needed new programs to the SUMEX-AIM community. These include: 1) TTYFTP - Many groups have had the need to move files between computers and do not have the sophisticated facilities of ARPANET or other local networks to help. These include for example the transfer of data between the PUFF project at Pacific Medical Center in San Francisco and SUMEX and the movement of instrument data in support of the Ultrasound Imaging (Ob-Gyn) project. We reported last year on the development of a file transfer program usable over any teletype line (hardline, dial-up, TYMNET, etc.) which incorporates appropriate control protocols and error checking. The design is based on the DIALNET protocols developed by Crispin at the Stanford AI Laboratory and extended by our group to achieve machine and data source independence. This past year we have had a number of requests from outside groups in similar situations for copies of this software. We have distributed copies to Rutgers, Stanford Research Institute, and the University of Texas. 2) C Compiler - We spent considerable effort bringing up a workable C compiler at SUMEX that would generate code for our KI-TENEX system and also cross compile to generate code for PDP-11's and other machines. We imported an early version of a TOPS-20 C compiler from MIT and adapted it for our system. The linker, code generator, and runtime package for this system were suitably modified to work under TENEX and code generators for other machines developed. The PDP-10 version still generates quite inefficient code in that bytes occupy full 36-bit words rather than being packed. This is satisfactory for debugging purposes but would have to be fixed if C were to become a system programming language for future TENEX work (we do not anticipate this). 3) TV Editor and Display Terminal Support - Much work was done to extend the TV editor which is widely used at SUMEX. This was done by importing the work done by Hedrick at Rutgers, adapting it to our needs, and extending it for additional features. Important improvements include multiple string searches, string replacement facilities, large block text relocation, and support for additional display terminals (Infoton, Zenith H-19, Concept-100, ADDS Regent 60, and Hewlett-Packard 2600 series terminals). We have also agreed to unify the sources for TV so that closer compatibility with other groups will exist (Rutgers, SUMEX, USC- ISI, SRI, and Stanford CONTEXT). 25 E. A. Feigenbaum Progress - User Software P41 RROO785-08 4) TYPER - For the AIM workshop held at Stanford last summer, we developed a program to assist with on-line typescript manipulation. The goal of the workshop (see Section I.E.1 on page 79) was to present a deeper insight into the workings of developing AI programs by interactively tracing sessions using them. In order to assure a reasonably organized presentation that could be prepared beforehand, we developed TYPER to allow a presenter to display a typescript of a typical session. TYPER provides facilities to randomly move between parts of the typescript, to display a table of contents, and to manage a hierarchical presentation of various parts of the session. At the highest level, only an overview of program operation need be given. By interactive commands, successive layers of detail can be flashed on the display screen as the discussion proceeds or as questions arise. We have also implemented extensions and maintenance updates to many other existing programs including, for example, EDIR (a directory editing program), DUMPER (the file system backup program), BSYS (the user file archiving program), PA-1050 (the TOPS-10 compatibility package), BBD (the bulletin board program), and PUB (a text formatting program). I.A.3.5 Documentation and Education We have spent considerable effort to develop, maintain, and facilitate access to our documentation so as to accurately reflect available software. The HELP and Bulletin Board subsystems have been important in this effort. As subsystems are updated, we generally publish a bulletin or small document describing the changes. As more and more changes occur, it becomes harder and harder for users to track down all of the change pointers. Within manpower limits, we are in a continuous process of reviewing the existing documentation system for compatibility with the programs now on line and to integrate changes into the main documents. This will also be done with a view toward developing better tools for maintaining up-to-date documentation. 1.A.3.6 Software Compatibility and Sharing At SUMEX-AIM we firmly believe in importing rather than reinventing software where possible. As noted above, a number of the packages we have brought up are from outside groups. Many avenues exist for sharing between the system staff, various user projects, other facilities, and vendors. The advent of fast and convenicnt communication facilities coupling communities of computer facilities has made possible effective intergroup cooperation and decentralized maintenance of software packages. The TENEX sites on the ARPANET have been a good model for this kind of exchange based on a functional division of labor and expertise. The other major advantage is that as a by-product of the constant communication about particular software, personal connections between staff members of the various sites develop. These connections serve to pass general information about software tools and to encourage the exchange of ideas among the sites. E. A. Feigenbaum 26 P41 RROO785-08 Progress - Software Compatibility and Sharing Certain common problems are now regularly discussed on a multi-site level. We continue to draw significant amounts of system software from other ARPANET sites, reciprocating with our own local developments. Interactions have included mutual backup support, experience with various hardware configurations, experience with new types of computers and operating systems, designs for local networks, operating system enhancements, utility or language software, and user project collaborations. We have been able to import many new pieces of software and improvements to existing ones in this way. Examples of imported software include the message manipulation program MSG, TENEX SAIL, PASCAL, TENEX SOS, INTERLISP, the RECORD program, ARPANET host tables, and many others. Reciprocally, we have exported our contributions such as the crash analysis program, drum page migration system, KI-10 page table efficiency improvements, GTJFN enhancements, PUB macro files, the bulletin board system,. MAINSAIL, SPELL, SNDMSG enhancements, our BATCH monitor, and improved SA-10 software. There are also several important examples of joint development efforts such as the internet mailer program (XMAILR). Because this program incorporates facilities for routing mail through many networks, it is important that the various sections of the program dealing with these specialized protocols be developed by the groups with expertise in the appropriate technology. Network connections have made a joint effort possible involving MIT, Stanford SCORE, and SUMEX. We spent considerable effort developing a preliminary version of a TENEX/TOPS-20 compatibility package. The issue here is that as DEC develops TOPS-20, even though it is TENEX-like, it is not TENEX compatible and vice versa. Thus, a hope was to write a program that would resolve these compatibilities automatically rather than to force special adaptations for the two operating systems. The kinds of incompatibilities that exist include PDP-20 machine instructions that do not exist on earlier machines, new JSYS calls, incompatible changes to old JSYS calls, different syntax and facilities for device/file names, and different handling of error returns (types of return and error codes). It has proven unworkable to effectively handle all of these problems at the user level. Monitor changes are required to implement the widely used error return features (ERJMP/ERCAL) and make handling of other incompatibilities easier. We have not accomplished a complete compatibility package in any sense but have implemented requisite monitor changes and have developed several user packages that help emulate TOPS-20 JSYS calls for programs running on TENEX. We do not foresee being able to completely and effectively solve these problems within the expected lifetime of existing TENEX machines. Finally, we have also assisted groups that have interacted with SUMEX user projects get access to software available in our community. For example, Prof. Dreiding’'s group in Switzerland became interested in some of the system software available here after attending the DENDRAL CONGEN workshops (see Section II.A.1.3 on page 103). We have provided him with the non-licensed programs requested. We have also provided software to Professor Bodmer's group at the Imperial Cancer Research Group in England in collaboration with the MOLGEN project (see Section I1.A.1.5 on page 136). 27 E. A. Feigenbaum Progress - Core Research P41 RROO785-08 1.4.3.7 Core Research Over the past year we have supported several core research activities aimed at developing information resources, basic AI research, and tools of general interest to the SUMEX-AIM community. Principal areas of current effort include: 1) The AI Handbook which is a compendium of knowledge about the field of Artificial Intelligence being compiled by Professor Feigenbaum and collaborators. The handbook is broad in scope, covering all of the important ideas, techniques, and systems developed during 20 years of research in AI in a series of articles. Each is about four pages long and is a description written for non-Al specialists and students of AI. The handbook will be published in three volumes, the first of which is now on the market published by William Kaufmann, Inc. The AI Handbook effort is described in more detail in Section II.A.1.2 on page 99 and an outline of the current contents of the handbook can be found in Appendix B. 2) The AGE project which is an attempt to isolate inference, control, and representation techniques from previously developed knowledge- based programs; reprogram them for domain independence; write a rule-based interface that will help a user understand what the package offers and how to use the modules; and make the package available to other members of the AIM community. A more detailed description of progress on the AGE package can be found in Section IIT.A.1.1 on page 91. It should be noted that SUMEX is providing only partial support for these projects with complementary support coming from an ARPA contract to the Heuristic Programming Project. E. A. Feigenbaum 28 P41 RROO785-08 Progress - Resource Operations Statistics I.A.3.8 Resource Operations Statistics The following data give an overview of various aspects of SUMEX-AIM resource usage. There are five sub-sections containing data respectively for: 1) Overall resource toading data 2) Relative system loading by community 3) Individual project and community usage 4) Network usage data 5) System reliability data 29 E. A. Feigenbaum Progress - Resource Operations Statistics P41 RROO785-08 1. Overall resource loading data The following plots display several different aspects of system loading over the life of the project. These include total CPU time delivered per month, the peak number of jobs logged in, and the peak load average. The monthly "peak" value of a given variable is the average of the daily peak values for that variable during the month. Thus, these “neak" values are representative of average monthly loading maxima and do not reflect the largest excursions seen on individual days, which are much higher. These data show well the continued growth of SUMEX use and the self- limiting saturation effect of system load average, especially after installation of our overload controls early in 1978. Since late 1976, when the dual processor capacity became fully used, the peak daily load average has remained between about 5.5 and 6. This is a measure of the user capacity of our current hardware configuration and the mix of AI programs. 700 4 Total CPU Hrs/Mo 600 7 WV \ 500 + \ 400 - 300 - 200 - \ 100 -1 0 TOT rt tt ttt Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 41981 Figure 6. Total CPU Time Consumed by Month E. A. Feigenbaum 30 P41 RROO785-08 Progress - Resource Operations Statistics Peak Number 50 4 of Jobs 30 4 20 - 10 > 0 a p TT pr p PTC ryprT pre Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 7. Peak Number of Jobs by Month 8 Peak Load Average j f 6 + ‘( ‘ ‘ ' 4- : 2 - 0 T porrt pct | TTT ptr p tT ptr |! Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 8. Peak Load Average by Month 31 E, A. Feigenbaum Progress - Resource Operations Statistics P41 RROO785-08 2. Relative System Loading by Community The SUMEX resource is divided, for administrative purposes, into 3 major communities: user projects based at the Stanford Medical School, user projects based outside of Stanford (national AIM projects), and common system development efforts. As defined in the resource management plan approved by BRP at the start of the project, the available system CPU capacity and file space resources are divided between these communities as follows: Stanford 40% AIM 40% Staff 20% The "available" resources to be divided up in this way are those remaining after various monitor and community-wide functions are accounted for. These include such things as job scheduling, overhead, network service, file space for subsystems, documentation, etc. The monthly usage of CPU and file space resources for each of these three communities relative to their respective aliquots is shown in the plots in Figure 9 and Figure 10. Terminal connect time is shown in Figure 11. It is clear that the Stanford projects have held an edge in system usage despite our efforts at resource allocation and the substantial voluntary efforts by the Stanford community to utilize non-prime hours. This reflects the maturity of the Stanford group of projects relative to those getting started on the national side and has correspondingly accounted for much of the progress in AI program development to date. E. A. Feigenbaum 32 P41 RROO785-08 Progress - Resource Operations Statistics 40 % Allocated National Projects CPU Usage 30 20 10 0-T prrryriery rrr yt rrp err ype TT 7 17s 18764807 1378 1370 1880 1381 40 % Allocated Stanford Projects CPU Used 30 20 10 OTT TTT yp rrr yp re prt pr rp rr yt Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 20 % Allocated System Staff CP \~ ° \ SW Wn OUT TT tr Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 9. Monthly CPU Usage by Community 33 E. A. Feigenbaum Progress - Resource Operations Statistics P41 RROO785-08 % Allocated File Space Used 40 National Projects 30 ere 20 File System Upgrade 10 0 ry rrryptgtrT yt a a a .? Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 io Stanford Projects 40 + ‘% Allocated? File ae aon DL 30 - 20 - File System Upgrade 10 - OT TTT Ta ee pp re Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 % Alla Cath d System Staff Space Used ON NN 10 File System Upgrade OTT a rr rt Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 10. Monthly File Space Usage by Community E. A. Feigenbaum 34 P41 RROO785-08 Progress - Resource Operations Statistics National Projects Connect 4000 Hrs/Mo 3000 N 2000 4000 AS AD 0 ee port y@tgry ee T? Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Vey Connect p lf Ar 4000 ~) tirs/Mo VAN \f 3000 : 2000 1000 / 0 ry try Tt rT | ne ee pert yteTrT yd Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Connect System Staff sons Hrs/Mo | f 3000 2000 AS i Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 11. Monthly Terminal Connect Time by Community 1000 35 —. A. Feigenbaum Progress - Resource Operations Statistics P41 RROO785-08 3. Individual Project and Community Usage The table following shows cumulative resource usage by project during the past grant year. The entries include a summary of the operational funding sources (outside of SUMEX-supplied computing resources) for currently active projects, total CPU consumption by project (Hours), total terminal connect time by project (Hours), and average file space in use by project (Pages, 1 page = 512 computer words). These data were accumulated for each project for the months between May 1980 and April 1981. Again the well developed use of the SUMEX resource by the Stanford community can be seen, It should be noted that the Stanford projects have voluntarily shifted a substantial part of their development work to non- prime time hours which is not explicitly shown in these cumulative data. It should also be noted that a significant part of the DENDRAL, MYCIN, AGE, AI Handbook, and MOLGEN efforts, here charged to the Stanford aliquot, support development efforts dedicated to national community access to these systems. The actual demonstration and use of these programs by extramural users (e.g., the GENET community) is charged to the national community in the "AIM USERS" category, however. Several of the projects admitted to the National AIM community use the Rutgers-AIM resource as their home base. We do not explicitly list these projects in this annual report covering the Stanford SUMEX~-AIM resource. We do record information about the Rutgers resource itself, however, and note its separate resource status with the flag "[Rutgers- AIM}". E. A. Feigenbaum 36 P41 RROO785-08 Progress - Resource Operations Statistics Resource Use by Individual Project - 5/80 through 4/81 CPU National AIM Community (Hours) 1) 2) 3) 4) ACT Project 95.81 "Acquisition of Cognitive Procedures" John R. Anderson, Ph.D. Carnegie-Mellon Univ. ONR NO0014-77-C-0242 9/78-9/80 $175,000 NSF IST-80-15357 2/81-2/84 $186,000 SECS Project 837.86 "Simulation & Evaluation of Chemical Synthesis" W. Todd Wipke, Ph.D. U. California, Santa Cruz NIH RR-01059-03S1 7/80-12/81 $36,949 NIH/NCI NO1-CP-75816 1/80-7/81 $74,394 Mod Human Cogn Project 150.08 "Hierarchical Models of Human Cognition" Walter Kintsch, Ph.D. Peter G. Polson, Ph.D. University of Colorado NIE-G-78-0172 9/80-8/81 $46,537 NIMH MH-15872-9-13 6/80-5/81 $32,880 ONR N00014-78-C-0433 6/80-5/81 $60,000 ONR NOO0014-78-C-0165 1/80-6/81 $85,000 CADUCEUS Project 344.92 "Clinical Decision Systems Research Resource” Jack D. Myers, M.D. Harry E. Pople, Jr., Ph.D. University of Pittsburgh NIH RR-01101-04 7/80-6/81 $465,199 NLM LM03710-01 7/80-6/81 $148,458 NLM LM03589-01 7/80-6/81 $32,750 37 Connect (Hours) 1214.90 11968.57 2084.62 5975.17 File Space (Pages) 2362. 10239 898 8365 E. A. Feigenbaum Progress - Resource Operations Statistics 5) 6) 7) 8) E. A. SOLVER Project "Problem Solving Expertise" Paul E. Johnson, Ph.D. William B. Thompson, Ph.D, University of Minnesota NSF SE079-13036 NICHD T36-HD-17151 NICHD HD-01136 NSF/BNS-77-22075 NLM/NSF proposals pending PUFF-VM Project "Biomedical Knowledge Engineering in Clinical Medicine" John J. Osborn, M.D. Inst. Medical Sciences, San Francisco Edward A. Feigenbaum, Ph.D. Stanford University NIH GM-24669 9/78-8/81 $164,000 (*) Renewal pending SCP Project "Simulation of Cognitive Processes” James G. Greeno, Ph.D. Alan M, Lesgotd, Ph.D. University of Pittsburgh NIE-G-80-0014 12/80-11/81 $2,627,067 ONR NOQ0014-79-C-0215 10/80-9/81 $247,053 NSF/NIE SED78-22289 12/78-8/81 $149,967 *** TRutgers-AIM] *** Rutgers Project "Computers in Biomedicine" Saul Amarel, D.Sc. NIH RR-00643 12/80-11/81 $495,079 Feigenbaum -55 97.01 5.13 12.87 38 13.94 5651.45 236.97 339.45 P41 RROO785-08 3785 931 8653 P41 RROO785-08 Progress - Resource Operations Statistics 9) AIM Pilot Projects AI-COAG 4.02 96.89 672 EXCHANGE 7.64 100.89 60 HEADMED .66 29.03 770 KRL .57 6.99 268 MDX .53 30.96 28 MELANOMA .99 20.11 9 MISL .73 14,42 813 SPA 1.42 30.10 251 SPEECH 11.10 224.32 537 UI 8.51 379.92 46 AIM Pilot Totals 36.17 933.63 3459 10) AIM Administration 8.98 449.06 4141 11) AIM Users on Stanford Projects AGE 2.44 51.90 27 DENDRAL 62.59 905.71 1056 HMF 35.06 1026.88 2735 HPP 5.30 118.23 115 MOLGEN 411.85 5757.19 1095 MYCIN 7.44 620.40 105 AIM-Associates 1.65 37.00 153 Guest (all projects) 12.88 135.08 239 AIM User Totals 569.21 8652.39 5529 Community Totals 2158.59 37520.15 48371 39 E. A. Feigenbaum Progress - Resource Operations Statistics CPU Stanford Community (Hours) 1) 2) 3) 4) 5) E. AGE Project (Core) 388.99 "Generalization of AI Tools" Edward A. Feigenbaum, Ph.D. Dept. Computer Science ARPA MDA-903-80-C-0107 (**) (partial support) AI Handbook Project (Core) 31.56 Edward A. Feigenbaum, Ph.D. Dept. Computer Science ARPA MDA-903-80-C-0107 (**) (partial support) DENDRAL Project 654.47 "Resource Related Research: Computers in Chemistry" Carl Djerassi, Ph.D. Dept. Chemistry NIH RR-00612-12 5/81-4/82 $237,387 EXPEX Project 26.99 "Expert Explanation" Edward H. Shortliffe, M.D., Ph.D. Depts. Medicine/Computer Science ONR NR 049-479 1/81-12/81 $140,825 MOLGEN Project 325.62 "Experiment Planning System for Molecular Genetics" Edward A. Feigenbaum, Ph.D. Bruce G. Buchanan, Ph.D. Laurence H. Kedes, M.D. Douglas L. Brutlag, Ph.D. Depts. Computer Science/ Medicine/Biochemistry NSF ECS-8016247 10/80-9/81 $146,582 (*) Feigenbaum 40 Connect (Hours) 4625.17 1680.34 8170.34 1163.37 6146.69 P41 RROO785-08 File Space (Pages) 3575 2554 16366 513 6734 P41 RROO785-08 Progress - Resource Operations Statistics 6) MYCIN Projects 706.62 11373 .94 13261 "Computer-based Consult. in Clin. Therapeutics" Bruce G. Buchanan, Ph.D. Edward H. Shortliffe, M.D., Ph.D. Depts. Medicine/Computer Science NSF MCS-79-03753 7/79-3/81 $146 ,152 ONR/ARPA N00014-79-C-0302 3/79-3/82 $396 ,325 Kaiser Fdn, 7/79-12/80 $20,000 NLM LM-03395 7/80-6/81 $47,845 NLM LM-00048 7/80-6/81 $39,107 7) Protein Struct Modeling 89.26 1194.98 3916 "Heuristic Comp. Applied to Prot. Crystallog.” Edward A. Feigenbaum, Ph.D. Dept. Computer Science NSF MCS-79-33666 12/79-11/81 $35,318 8) RX Project 64.15 1683.05 2222 Depts. Computer Science/Medicine Robert L. Blum, M.D. Gio C.M. Wiederhold, Ph.D. NLM New Invest. 7/79-6/82 $90,000 NCHSR 4/79-3/81 $35,000 9) Stanford Pilot Projects DECIDER (E. Johnson) .27 3.70 0 STRUCT (Abarbanel) 12.41 317.34 87 SCANR (Brinkley) 36.81 758.69 771 Stanford Pilot Totals 49.49 1079.73 858 10) Stanford and HPP Assoc. 196.49 9538.63 11609 Community Totals 2433.64 46656,24 61608 41 E. A. Feigenbaum Progress - Resource Operations Statistics P41 RR0O785-08 CPU Connect File Space SUMEX Staff (Hours) (Hours) (Pages) 1) Staff 602.53 20648 .45 9571 2) System Associates 78.42 3081.22 4205 3) Misc. Usage .16 1.84 770 Community Totals 681.11 23731.561 14546 CPU Connect File Space System Operations (Hours) (Hours) (Pages) 1) Operations 2230.32 94975 .93 69094 Resource Totals 7503.66 202883 .83 193619 {*) Award includes indirect costs. (**) Supported by a larger ARPA contract MDA-903-80-C-0107 awarded to the Stanford Computer Science Department: Current Year Total Award (10/80-11/15/81) (10/79-9/82) Heuristic Programming Project $ 538,262 $1,613,588 VLSI/CAD Network-based Graphics Development Resource 214,851 685,374 Total award $ 763,113(*) $2,298,962(*) E. A. Feigenbaum 42 P41 RRO0785-08 Progress - Resource Operations Statistics 4. Network Usage Statistics The plots in Figure 12 and Figure 13 show the monthly network terminal connect time for TYMNET and ARPANET. This forms the major billing component for SUMEX-AIM TYMNET usage. The terminal connect time does not reflect the time spent in file transfers and mail forwarding. 4200-7 TYMNET Conn Hrs \ 1000 - 800 - A | 600 4 , \ 400 - 200 - 0 ry crt T TT | TTT rTTryp@tgqT I TTT YT T Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 12. TYMNET Terminal Connect Time 43 E. A. Feigenbaum Progress - Resource Operations Statistics P41 RROO785-08 1200-7 ARPANET Conn Hrs 1000 800- 600-4 \, 400- any” 200 - 0- > prrrywerT | a a Trryt Jan Jan Jan Jan Jan Jan Jan 1975 1976 1977 1978 1979 1980 1981 Figure 13. ARPANET Terminal Connect Time E. A. Feigenbaum 44 P41 RROO785-08 5. System Reliability Progress - Resource Operations Statistics System reliability has been very good on average with several periods of particular hardware or software problems. system reloads and downtime for the past year. failure. MAY JUN RELOADS Hardware 3 2 Software 0 3 Environmental 0 1 Operator Error 0 0 Unknown Cause 2 0 Totals 5 6 DOWNTIME (Hrs) Unscheduled 2 17 Scheduled 30 14 Totals (Hrs) 32 31 TABLE 1. JUL aroronnm 6 79 95 System Reliability by Month 1980 AUG oOowh@ 15 18 17 35 SEP oorr@: 10 7 17 24 45 OCcT 16 2 0 0 0 18 11 28 39 The table below shows monthly It should be noted that the number of system reloads is greater than the actual number of system crashes since two or more reloads may have to be done within minutes of each other after a crash to repair file damage or to diagnose the cause of NOV > | Or Or NM 1 17 18 DEC — ee) JAN oiooownd 15 21 1981 FEB MAR APR 2 4 0 2 1 4 0 0 1 0 2 1 0 0 0 4 7 6 2 4 6 4 4 14 6 8 20 E. A. Feigenbaum Progress - SUMEX Staff Publications P41 RROO785-08 I.A.3.9 SUMEX Staff Publications The following are publications for the SUMEX staff and include papers describing the SUMEX-AIM resource and on-going research as well as documentation of system and program developments. Many of the publications documenting SUMEX-AIM community research are from the individual collaborating projects and are detailed in their respective reports (see Section II on page 89). Publications for the AGE and AI Handbook core research projects are given there. [1] Carhart, R.E., Johnson, S.M., Smith, D.H., Buchanan, 8.G., Dromey, R.G., and Lederberg, J, Networking and a Collaborative Research Community: A Case Study Using the DENDRAL Programs, ACS Symposium Series, Number 19, Computer Networking and Chemistry, Peter Lykos (Editor), 1975. [2] Levinthal, E.C., Carhart, R.E., Johnson, S.M., and Lederberg, J., When Computers Talk to Computers, Industrial Research, November 1975 [3] Wilcox, C. R., MAINSAIL - A Machine-Independent Programming System, Proceedings of the DEC Users Society, Vol. 2, No. 4, Spring 1976. [4] Wilcox, Clark R., The MAINSAIL Project: Developing Tools for Software Portability, Proceedings, Computer Application in Medical Care, October, 1977, pp. 76-83. [5] Lederberg, J. L., Digital Communications and the Conduct of Science: The New Literacy, Proc. IEEE, Vol. 66, No. 11, Nov 1978. [6] Wilcox, C. R., Jirak, G. A., and Dageforde, M. L., MAINSAIL - Language Manual, Stanford University Computer Science Report STAN-CS-80-791 (1980). [7] Wilcox, C. R., Jirak, G. A., and Dageforde, M. L., MAINSAIL - Implementation Overview, Stanford University Computer Science Report STAN-CS-80-792 (1980). Mr. Clark Wilcox also chaired the session on "Languages for Portability" at the DECUS DECsystem10 Spring '76 Symposium. In addition, a substantial continuing effort has gone into developing, upgrading, and extending documentation about the SUMEX-AIM resource, the SUMEX-TENEX system, and the many subsystems available to users. These efforts include a number of major documents (such as SOS, PUB, TENEX-SAIL, and MAINSAIL manuals) as well as a much larger number of document upgrades, user information and introductory notes, an ARPANET Resource Handbook entry, and policy guidelines. E. A. Feigenbaum 46 P41 RROO785-08 Progress - Future Plans I.A.3.10 Future Plans Our plans for the next grant year are based on those approved by the council review of our recent five-year renewal application scheduled to begin in August 1980. In addition to the specific plans for next grant year (discussed in some earlier sections too), we present a summary below of our overall objectives for the next five-year period to serve as a foundation for future reports. Near and long term objectives and plans for individual collaborating projects are discussed in Section II beginning on page 89, The goals of the SUMEX-AIM resource are long term in supporting basic research in artificial intelligence, applying these techniques to a broad range of biomedical problems, experimenting with communication technologies to promote scientific interchange, and developing better tools and facilities to carry on this research. Just as the tone of our renewal proposal derives from the continuing long-term research objectives of the SUMEX-AIM community, our approach derives from the methods and philosophy already established for the resource. We will continue to develop useful knowledge-based software tools for biomedical research based on innovative, yet accessible computing technologies. For us it is important to make systems that work and are exportable. ence, our approach is to integrate available state-of-the art hardware technology as a basis for the underlying software research and development necessary to support the AI work. SUMEX-AIM will retain its broad community orientation in choosing and implementing its resources. We will draw upon the expertise of on-going research efforts where possible and build on these where extensions or innovations are necessary. This orientation has proved to be an effective way to build the current facility and community. We have built ties to a broad computer science community; have brought the results of their work to the AIM users; and have exported results of our own work. This broader community is particularly active in developing technological tools in the form of new machine architectures, language support, and interactive modalities. Toward a More Distributed Resource The initial model for SUMEX as a centralized resource was based on the high cost of powerful computing facilities and not being able to duplicate them readily. This role is evolving with the introduction of more compact and inexpensive computing technology. Our future goals are guided by community needs for more computing capacity and improved tools to build more effective expert systems and to test operational versions of AI programs in real-world settings. In order to meet these needs, we must take advantage of a range of newly developing machine architectures and systems. As a result, SUMEX-AIM will become more a distributed community 47 E. A. Feigenbaum Progress - Future Plans P41 RRO0785-08 resource with heterogeneous computing facilities tethered to each other through communications media. Many of these machines will be located physically near the projects or biomedical scientists using them. We have actively supported proposals from the more mature AIM projects for additional computing facilities tailored to their particular needs and designed to free the main SUMEX resource for new, developing applications projects. To date, the Rutgers resource has acquired a DEC 2050 facility, part of which is allocated for AIM usage; the "Simulation of Cognitive Processes" project has acquired a VAX which supports their needs; and the "Caduceus" (INTERNIST) project is acquiring a VAX to support experimental clinical testing of their program. Our future plans anticipate an even broader diversification of computing resources to meet the need of the AIM community. The Continuing Role of SUMEX-Central Even with more distributed computing resources, the central resource will continue to play an important role as a communication crossroads, as a research group devoted to integrating the new software and hardware technologies to meet the needs of medical AI applications, as a spawning ground for new application projects, and as a base for local AI projects. A key challenge will be to maintain the scientific community ties that grew naturally out of the previous colocation within a central facility. E. A. Feigenbaum 48 P41 RROO785-08 Progress - Future Plans Summary of Five-year Objectives The following outlines the specific objectives of the SUMEX-AIM resource during the follow-on five year period. Note that these objectives cover only the resource nucleus; near and long-term objectives for individual collaborating projects are discussed in their respective reports in Section II. Specific aims are broken into three categories; 1) resource operations, 2) training and education, and 3) core research. Resource Operations 1) Maintain the vitality of the AIM community. We will continue to encourage and explore new applications of AI to biomedical research and improve mechanisms for inter- and intra-group collaborations and communications. While AI is our defining theme, we may entertain exceptional applications justified by some other unique feature of SUMEX-AIM essential for important biomedical research. To minimize administrative barriers to the community-oriented goals of SUMEX-AIM and to direct our resources toward purely scientific goals, we plan to retain the current user funding arrangements for projects working on SUMEX facilities. User projects will fund their own manpower and local needs; will actively contribute their special expertise to the SUMEX-AIM community; and will receive an allocation of computing resources under the control of the AIM management committees. There will be no “fee for service" charges for community members. We will also continue to exploit community expertise and sharing in software development; and to facilitate more effective information sharing among projects. 2) Continue to provide effective computational support for AIM community goals. Our efforts will be to extend the support for artificial intelligence research and new applications work; to develop new computational tools to support more mature projects; and to facilitate testing and research dissemination of nearly operational programs. We will continue to operate and develop the existing KI-10/2020 facility as the nucleus of the resource. We will acquire additional equipment to meet developing community needs for more capacity, larger program address spaces, and improved interactive facilities. New computing hardware technologies becoming available now and in the next few years will play a key role in these developments and we expect to take the lead in this community for adapting these new tools to biomedical AI needs. We plan the phased purchase of two VAX computers to provide increased computing capacity and to support large address space LISP development, a 2000M byte file server to meet file storage needs, and a number of single-user "professional workstations" to experiment with improved human interfaces and AI program dissemination. 3) Provide effective and geographically accessible communication facilities to the SUMEX-AIM community for effective remote collaborations, communications among distributed computing nodes, 49 E. A. Feigenbaum Progress - Future Plans P41 RROO785-08 and experimental testing of AI programs. We will retain the current ARPANET and TYMNET connections for at Teast the near term and will actively explore other advantageous connections to new communications networks and to dedicated links. Training and Education Our goals during the follow-on period for assisting new and established users of the SUMEX-AIM resource are a continuation of those adopted for the previous grant term. Collaborating projects are responsible for the development and dissemination of their own AI programs. The SUMEX resource will provide community-wide support and will work to make resource goals and AI programs known and available to appropriate medical scientists. Specific aims include: 1) 2) 3) Provide documentation and assistance to interface users to resource facilities and programs. We will continue to exploit particular areas of expertise within the community for developing pilot efforts in new application areas. Continue to allocate “collaborative linkage" funds to qualifying new and pilot projects to provide for communications and terminal support pending formal approval and funding of their projects. These funds are allocated in cooperation with the AIM Executive Committee reviews of prospective user projects. Continue to support workshop activities including collaboration with the Rutgers Computers in Biomedicine resource on the AIM community workshop and with individual projects for more specialized workshops covering specific application areas or program dissemination. E. A. Feigenbaum 50 P41 RROO785-08 Progress - Future Plans Core Research Our core research efforts will continue to emphasize basic research on AI techniques applicable to biomedical problems and the generalization and documentation of tools to facilitate and broaden application areas. SUMEX core research funding is complementary to similar funding from other agencies and contributes to the long-standing interdisciplinary effort at Stanford in basic AI research and expert system design. We expect this work to provide the underpinnings for increasingly effective consultative programs in medicine and for more practical adaptations of this work within emerging microelectronic technologies. Specific aims include: 1) Continue to explore basic artificial intelligence issues for knowledge acquisition, representation, and utilization; reasoning in the presence of uncertainty; strategy planning; and explanations of reasoning pathways with particular emphasis on biomedical applications. 2) Support community efforts to organize and generalize AI tools that have been developed in the context of individual application projects. This will include work to organize the present state-of- the-art in AI techniques through the AI Handbook effort and the development of practical software packages (e.g., AGE, EMYCIN, UNITS, and EXPERT) for the acquisition, representation, and utilization of knowledge in AI programs. The objective is to evolve a body of software tools that can be used to more efficaciously build future knowledge-based systems and explore other biomedical AI applications. §1 E. A. Feigenbaum Progress - Future Plans P41 RRO0O785-08 Hardware Acquisition Rationale As discussed in our progress report and supported by collaborating project reports, we have implemented an effective set of computing resources to support AI applications to biomedical research. At the resource core is the KI-TENEX/2020 facility, augmented by portions of the Rutgers 2050 and Stanford SCORE 2060 machines. These have provided an unsurpassed set of tools for the initial phases of SUMEX-AIM development in terms of operating system facilities, human engineering, language support for artificial intelligence program development, and community communications tools. As the size of our community and the complexity of knowledge-based programs have increased, several issues have become important for the continued development and practical dissemination of AI programs: 1) The community has a continuing need for more computing capacity. This arises from the growth of new applications projects, new core research ideas, and the need to disseminate mature systems within and outside of the AIM community. Nowhere is this felt more strongly than among the Stanford community where system access constraints have seriously impeded development progress. A picture of system congestion can be found in the summary of loading Statistics beginning on page 29 and in the statements from many of our user projects. 2) Many programs require a larger virtual address space. As AI systems become more expert and encompass larger and more complex domains, they require ever larger knowledge bases and data structures that must be traversed in the course of solving problems. The 256K word address limit of the PDP-10 has constrained program development as discussed in our renewal proposal. Increasing effort has gone into "overlays" resulting in higher machine overhead, more difficulty in making program changes, and lost programmer time. Simpler hardware solutions are needed. 3) AI programs are being tested and disseminated increasingly beyond their development communities. We cannot continue to provide all of the computing resources this implies through central systems like SUMEX. The capacity does not exist. Network communications facilities are not able to support facile human interactions (high speed, improved displays, graphics, and speech/touch modalities). And a grant-supported research environment cannot meet the technical and administrative needs of a "production" community. Thus, we need to explore better ways to package complex AI software and distribute the necessary computing tools cost effectively into the user communities. No single solution to these requirements for future development is available and we proposed and got peer approval to investigate a variety of machine architectures and support functions over the next grant period including: E. A. Feigenbaum 52 P41 RROO785-08 Progress - Future Plans 1) experimentation with new shared centralized systems 2) distributed single-user “professional workstations" 3) improved communications tools to integrate them together effectively. In addition to continuing operation of the existing resources, we plan to direct SUMEX research efforts to explore the potential of such newly available systems as solutions to AIM community needs. Our approach will be to integrate a heterogeneous set of network-connected hardware tools, some of which will be distributed through the user community. We will emphasize the development of system and application level software tools to allow effective use of these resources and continue to provide community leadership to encourage scientific communications. Specific Hardware Plans for Year 09 In our proposal as approved by council, we described a carefully detailed plan for hardware acquisition. One of the approved purchases, the augmentation to the AMPEX core memory for the KI-10 duplex, was approved for the current year 08 and has already been imptemented. In addition, for the technical reasons discussed on page 11, we have obtained BRP approval to accelerate the purchase of the five approved professional workstations to year 09 and to delay the first VAX purchase to year 10. The following then is a summary of planned hardware purchase for year 09: - Buy five Interlisp Dolphin professional workstations for use in developing and experimenting with this means for AI program export and human interface enhancements. - Develop a file server coupled to SUMEX host machines via the high speed Ethernet. This will minimize the need for redundant large file systems on each host and alleviate the file storage limitations of the AIM community. - Acquire examples of state-of-the art display equipment including a bit-mapped display station and a hardcopy laser printing device. - Buy additional required communications, interface, and test equipment to support the above acquisitions and community needs. Continued Operation of Existing Hardware The current SUMEX-AIM facilities represent a large existing investment. We do not propose any substantial changes to the existing KI- 10 and 2020 hardware systems and we expect them to continue to provide effective community support and serve as a communication nucleus for more distributed resources. The proposed augmentation of the existing KI-10 AMPEX memory box in order to reduce page swapping overhead is underway. 53 E. A. Feigenbaum Progress - Future Plans P41 RROO785-08 It should be recognized that the KI-10 processors are now 6 years old and will be 12 years old at the end of the proposed grant term. We have already begun to feel maintenance problems from age such as poor electrical contacts from oxidization and dirt, backplane insulation flowing on “tight wraps", and brittle cables. These problems are quite manageable still and we expect to be able to continue reliable operation over the next grant term, We plan no upgrades to the 2020 configuration. The current file shortage will be remedied in conjunction with that of the rest of the facility by implementing a community file server sharable and accessible via the Ethernet. For both systems, we are actively working to complete efficient interfaces to the Ethernet to allow flexible, high speed terminal connections, file transfers, and effective sharing of network, printing, plotting, remote links, and other resources. This system will form the backbone for smooth integration of future hardware additions to the resource. E. A. Feigenbaum 54 P41 RROO785-08 ETHERNET XEROX Alto SUMEX 2020 1/O Peripherals (LPT, PLT, ...) SN 4800 bit/sec lines TYMNET Interface KI-TENEX System we ARPANET 50K hit/sec lines . Link CZ ETHERNET Progress - Future Plans UC Santa Cruz r——— Stanford CSD Gateways SCIT ——— Stanford Chemistry p——— UC San Francisco Ether TIP VAX INTERLISP Systems (years 2, 3) File Server (year 1) 5 INTERLISP Dolphin Work Stations (year 1) Figure 14, Planned Ethernet System to Integrate System Hardware 55 E. A. Feigenbaum Progress - Future Plans P41 RROO785-08 Communication Networks Networks have been centrally important to the research goals of SUMEX-AIM and will become more so in the context of increasingly distributed computing. Communication will be crucial to maintain community scientific contacts, to facilitate shared system and software maintenance based on regional expertise, to allow necessary information flow and access at all levels, and to meet the technical requirements of shared equipment. Long-Distance Connections We have had reasonable success at meeting the geographical needs of the community during the early phases of SUMEX-AIM through our ARPANET and TYMNET connections. These have allowed users from many locations within the United States and abroad to gain terminal access to the AIM resources (SUMEX, Rutgers, and SCORE) and through ARPANET links to communicate much more voluminous file information. Since many of our users do not have ARPANET access privileges for technical or administrative reasons, a key problem impeding remote use has been the limited communications facilities (speed, file transfer, and terminal handling) offered currently by commercial networks. Commercial improvements are slow in coming but may be expected to solve the file transfer problem in the next few years. A number of vendors (AT&T, IBM, Xerox, etc.) have yet to announce commercially available facilities but TELENET is actively working in this direction. We plan to continue experimenting with improved facilities as offered by commercial or government sources in the next grant term. We have budgeted for continued TYMNET service and an additional amount annually for experimental network connections. High-speed interactive terminal support wil? continue to be a problem since one cannot expect to serve 1200-9600 baud terminals effectively over shared long-distance trunk lines with gross capacities of only 9600-19200 baud. We feel this is a problem that is best solved by distributed machines able to effectively support terminal interactions locally and coupled to other AIM machines and facilities through network or telephonic links. As new machine resources are introduced into the community, we will allocate budgeted funds with Executive Committee advice to assure effective communication links, Local Intermachine Connections A key feature of our plans for future computing facilities is the support of a heterogeneous processing environment that takes advantage of newly available technotogy and shared equipment resources between these machines. The "glue" that links these systems together is a high speed local network. We have chosen Ethernet and the Xerox PUP [9, 12] protocols for these interconnections. This choice was based on the availability of that technology now and the economics of using already developed TENEX and other server software. We expect the Ethernet system to continue to meet our technical needs for the coming grant term and we plan to continue to use it. We are working closely with other groups here E. A. Feigenbaum 56 P41 RROO785-08 Progress - Future Plans at Stanford and elsewhere to share hardware interface and software designs wherever possible. Our goals are to complete integration of the SUMEX-AIM system, including making selected KI-10 peripherals available as Ethernet nodes, creating links to nearby campus resources, and establishing needed remote links to other groups not on the ARPANET such as Wipke at the University of California at Santa Cruz. A diagram of our Ethernet system is shown in Figure 14 on page 55 and includes the following major elements: 1) KI-10 direct memory access interface. We currently have an inefficient I/O bus connection. 2) Stanford campus gateway. Establish links to other Ethernets on campus to allow access to special resources (Dover printer, plotters, typesetting equipment, etc.) and to allow users to easily access various computing resources. 3) Ethertip. We need additional terminal ports into the system and the Ethernet provides a natural mechanism to do this supporting high speed terminals and connections to various resources (KI-10, 2020, VAX's, etc.). 4) TYMNET connection. This connection currently comes through the KI- 10's and will be moved to a separate Ethernet node. This will free the KI-10's from handling the special TYMNET protocol and will allow TYMNET users to access any of the SUMEX-AIM resources. Similar facilities for the ARPANET may also be implemented depending on administrative constraints. 5) Printer/plotter service. We plan to make these local resources accessible from any of the SUMEX-AIM machines instead of being centered on the KI-10's. This will also free up the KI-10's from routine spooler tasks. 7) Connections for other machines (VAX's, Professional Workstations, file server, etc.) 57 E. A. Feigenbaum Progress -— Future Plans P41 RROO785-08 Resource Software We will continue to maintain the existing system, language, and utility support software on our systems at the most current release levels, including up-to-date documentation. We will also be extending the facilities available to users where appropriate, drawing upon other community developments where possible. We rely heavily on the needs of the user community to direct system software development efforts. Specific development areas for existing systems include: 1) completion of the Ethernet connections and necessary host software. This will include basic packet handling, PUP protocols at all levels, and relocation of shared existing resources to become Ethernet nodes. 2) bug fixes in the current monitors. We still have a number of bugs that cause infrequent crashes and that are hard to isolate because they cause system problems long after the fact. We will continue to work to repair these problems as time permits. 3) continued evatuation of system efficiency to improve performance. 4) compatibility issues. Our current compatibility package for TOPS-20 requires additional work to extend its features. We will also keep it up-to-date as DEC make new changes to their system. 5) continued work to create similar working and programming environments between our TENEX and TOPS-20 systems. This will include moving TENEX features like the SUMEX GTJFN enhancements and scheduling controls as needed to TOPS-20 and vice versa 6) continued work to improve system information and help facilities for users. Our plans for augmenting the SUMEX-AIM resources will entail substantial new system and subsystem programming. Our goals will be to derive as much software as possible from the user communities of the new VAX and Professional Workstation machines but we expect to have to do considerable work to adapt them to our biomedical AI needs. Many features of these systems are designed for a computer science environment and lack some of the human engineering and "friendliness" capabilities we have found needed to allow non-computer scientists to effectively use them. We are beginning to experiment with physician needs for interfaces to our AI programs to be better able to adapt the new machines as professional aids. Also many of the utility tools that we take for granted in the well- developed TENEX and TOPS-20 environment (communications, text manipulation, file management, accounting, etc.) will have to be reproduced. We expect to set up many of the common information services as network nodes, Within the AIM community we expect to serve as a center for software sharing between various distributed computing nodes. This will include contributing locally developed programs, distributing those derived from E. A. Feigenbaum 58 P41 RROO785-08 Progress - Future Plans elsewhere in the community, maintaining up-to-date information on subsystems available, and assisting in software maintenance. 59 E. A. Feigenbaum Progress - Future Plans P41 RROO785-08 Community Management We pian to retain the current management structure that has worked so well. We will continue to work closely with the management committees to recruit the additional high quality projects which can be accommodated and to evolve resource allocation policies which appropriately reflect assigned priorities and project needs. We expect the Executive and Advisory Committees to play an increasingly important role in advising on priorities for facility evolution and on-going community development planning in addition to their recruitment efforts. The composition of the Executive committee will grow as needed to assure representation of major user groups and medical and computer science applications areas. The Advisory Group membership rotates regularly and spans both medical and computer science research expertise. We expect to maintain this policy. We will continue to make information available about the various projects both inside and outside of the community and thereby promote the kinds of exchanges exemplified earlier and made possible by network facilities. The AIM workshops under the Rutgers resource have served a valuable function in bringing community members and prospective users together. We will continue to support this effort. This summer the AIM workshop will be held in Vancouver, British Columbia in conjunction with the International Joint Conference on Artificial Intetligence. We are actively helping to organize the meeting. We will continue to assist community participation and provide a computing base for workshop demonstrations and communications. We will also assist individual projects in organizing more specialized workshops as we have done for the DENDRAL and AGE projects. We plan to continue indefinitely our present policy of non-monetary allocation control. We recognize, of course, that this accentuates our responsibility for the careful selection of projects with high scientific and community merit. E. A. Feigenbaum 60 P41 RRO0785-08 Progress - Future Plans Training and Education Plans We have an on-going commitment, within the constraints of our staff size, to provide effective user assistance, to maintain high quality documentation of the evolving software support on the SUMEX-AIM system, and to provide software help facilities such as the HELP and Bulletin Board systems. These latter aids are an effective way to assist resource users in staying informed about system and community developments and solving access problems. We plan to take an active role in encouraging the development and dissemination of community databases such as the AI Handbook, up-to-date bibliographic sources, and developing knowledge bases. Since much of our community is geographically remote from our machine, these on-line aids are indispensable for self help. We will continue to provide on-line personal assistance to users within the capacity of available staff through the SNDMSG and LINK facilities. We budget funds to continue the “collaborative linkage” support initiated during the first term of the SUMEX-AIM grant. These funds are allocated under Executive Committee authorization for terminal and communications support to help get new users and pilot projects started. Finally, we will continue to actively support the AIM workshop series in terms of planning assistance, participation in program presentations and discussions, and providing a computing base for AI program demonstrations and experimentation. 61 E. A. Feigenbaum Progress - Future Plans P41 RROO785-08 Core Research Plans SUMEX core research includes both basic AI research-and development of community tools useful for building expert systems. Expert systems are symbolic problem solving programs capable of expert-level performance, in which domain-specific knowledge is represented and used in an understandable line of reasoning. The programs can be used as problem solving assistants or tutors, but also serve as excellent vehictes for research on representation and control of diverse forms of knowledge. MYCIN is one of the best examples. Because the main issues of building expert systems are coincident with general issues in AI, we appreciate the difficulty of proposing to "solve" basic problems. However, we do propose to build working programs that demonstrate the feasibility of our ideas within well defined limits. By investigating the nature of expert reasoning within computer programs, the process is "“demystified”. Ultimately, the construction of such programs becomes itself a well-understood technical craft. The foundation of all of our core research work is expert knowledge: its acquisition from practitioners, its accommodation into the existing knowledge bases, its explanation, and its use to solve problems. Continued work on these topics provides new techniques and mechanisms for the design and construction of knowledge-based programs; experience gained from the actual construction of these systems then feeds back both (a) evaluative information on the ideas’ utility and (b) reports of quite specific problems and the ways in which they have been overcome, which may suggest some more general method to be tried in other programs. One of our long-range goals is to isolate AI techniques that are general, to determine the conditions for their use and to build up a knowledge base about AI techniques themselves. SUMEX resources are coordinated for this purpose with the multidisciplinary efforts of the Stanford Heuristic Programming Project (HPP). Under support from ARPA, NIH/NLM, ONR, NSF, and private funding, the HPP conducts research on five key scientific problem areas, as well as a host of subsidiary issues [i]: 1) Knowledge Representation - How shall the knowledge necessary for expert-level performance be represented for computer use? How can one achieve flexibility in adding and changing knowledge in the continuous development of a knowledge base? Are there uniform representations for the diverse kinds of specialized knowledge needed in all domains? 2) Knowledge Utilization - What designs are available for the inference procedure to be used by an expert system? How can the control structure be simple enough to be understandable and yet sophisticated enough for high performance? How can strategy knowledge be used effectively? 3) Knowledge Acquisition - How can the model of expertise in a field of work be systematically acquired for computer use? If it is true that the power of an expert system is primarily a function of the E. A. Feigenbaum 62 P41 RROO785-08 Progress - Future Plans quality and completeness of the knowledge base, then this is the critical "bottleneck" problem of expert systems research. 4) Explanation - How can the knowledge base and the line of reasoning used in solving a particular problem be explained to users? What constitutes an acceptable explanation for each class of users? 5) Tool Construction - What kinds of software packages can be constructed that will facilitate the implementation of expert systems, not only by the research community but also by various user communities? Artificial Intelligence is largely an empirical science. We explore questions such as these by designing and building programs that incorporate plausible answers. Then we try to determine the strengths and weaknesses of the answers by experimenting with perturbations of the systems and extrapolations of them into new problem areas. The test of success in this endeavor is whether the next generation of system builders finds the questions relevant and the answers applicable to reduce the effort of building complex reasoning programs. 63 E. A. Feigenbaum Highlights P41 RROO785-08 I.B Highlights I.B.1 Handbook of Artificial Intelligence The AI Handbook is a compendium of knowledge about the field of Artificial Intelligence being assembled under Professor Edward Feigenbaum and Messrs Avron Barr and Paul Cohen. It is being compiled by students and investigators at several research facilities across the nation. The AI Handbook Project is a good example of community collaboration using the SUMEX-AIM communication facilities to prepare, review, and disseminate this reference work on AI techniques. The Handbook articles exist as computer files at the SUMEX facility. All of our authors and reviewers have access to these files via the network facilities and use the document-editing and formatting programs available at SUMEX. This relatively small investment of resources has resulted in what we feel will be a seminal publication in the field of AI, of particular value to researchers, like those in the AIM community, who want quick access to AI ideas and techniques for application in other areas. The AI Handbook Project was undertaken as a core activity by SUMEX in the spirit of community building that is the fundamental concern of the facility. We feel that the organization and propagation of this kind of information-to the AIM community, as well as to other fields where AI is being applied, is a valuable service that we are uniquely qualified to support. The scope of the work is broad: Two hundred articles cover all of the important ideas, techniques, and systems developed during 20 years of research in AI. Each article, roughly four pages long, is a description written for non-AI specialists and students of AI. Additional articles serve as overviews, which discuss the various approaches within a subfield, the issues, and the problems. We expect the Handbook to reach a size of approximately 1000 pages. Roughly two-thirds of this material will constitute Volumes I and II of the Handbook. The material in Volumes I and II will cover AI research in Heuristic Search, Representation of Knowledge, AI Programming Languages, Natural Language Understanding, Speech Understanding, Automatic Programming, and Applications-oriented AI Research in Science, Mathematics, Medicine, and Education. Researchers at Stanford University, Rutgers University, SRI International, Xerox PARC, RAND Corporation, MIT, USC-ISI, Yale, and Carnegie-Mellon University have contributed material to the project. The current schedule for publication of the several volumes is as follows. It should be noted that Volume I has been setected by the Library of Computer Science as their August 1981 book club selection. May, 1981: Publication of Volume 1 by publisher (Wm. Kaufmann Inc., Los Altos, Ca.) August, 1981: Submission of final copy to publisher for Volume II (publication by end of 1981). E. A. Feigenbaum 64 P41 RROO785-08 Highlights August-September, 1981: Completion of Technical Reports containing chapters of Handbook October, 1981: Submission of final copy of Volume III to publisher (for publication first quarter 1982) 65 E. A. Feigenbaum Highlights P41 RROO785-08 1.B.2 Tutorial on AI in Clinical Medicine In conjunction with the 1980 AIM Workshop, a continuing education tutorial designed for physicians was held at Stanfard on August 17-18, 1980. The tutorial was entitled "Computers in Medicine -- Applications of Artificial Intelligence Techniques" and was organized by Drs. W. Clancey and E. Shortliffe. The tutorial was well-attended by 135 physicians, 18 students, 10 members of the press, and several non-physician researchers. It was accredited for postgraduate medical education through Stanford University School of Medicine. Enrollees came from as far away as Mexico and the East Coast. The course included an optional introduction to computers for those who had no prior experience with the technology, an overview of SUMEX-AIM research, and an introduction to background materials regarding decision theory and data base applications in medicine. Speakers also provided detailed presentations on MYCIN, CASNET/EXPERT, INTERNIST and GUIDON. The course closed with a panel discussion on the problems and promise of AI in Medicine. A syllabus was distributed including a comprehensive survey of medical AI research and is comprised of recent articles written by the tutorial faculty, mostly for a clinical audience. The faculty consisted of 15 distinguished researchers from the national AI community, including 7? physicians and 9 speakers from centers other than Stanford. Coordination and planning for the tutorial was was facilitated by sending electronic messages; almost all speakers regularly use SUMEX or another ARPANET machine. The course was exceedingly well received. Attendees were fascinated by the content, generally felt it was well presented, and indicated they would recommend the course to others if it were made available again. Many physicians requested a follow-up course that would introduce them to more technical detail than had been possible in the introductory tutorial. To evaluate the impact of the tutorial on the participants, and to assess baseline opinions regarding the field, we undertook a survey of the physicians' knowledge about computers as well as their attitudes towards medical consultation systems. The statistical analysis of these questionnaires has now been completed, and a paper summarizing the results submitted for publication (*). In brief, the survey showed that physicians were willing to accept the possibility of computer-based clinical decision aids but placed severe demands on the capabilities of such systems if they were to be acceptable for routine use. (*) Teach, R.L. and Shortliffe, E.H. "An Analysis of Physician Attitudes Regarding Computer-based Clinical Consultation Systems." Submitted for publication, March 1981. E. A. Feigenbaum 66 P41 RR0O785-08 Highlights 1.B.3 GENET -~ Dissemination of AI Tools for Molecular Genetics The MOLGEN project at Stanford has focused on applications of artificial intelligence and symbolic computation in the field of molecular biology. The research began in 1975 and by early 1980, through many collaborative contacts, it was realized that some of the systems developed by MOLGEN were already of direct utility to many scientists in the domain. In order to broaden MOLGEN's base of scientist collaborators to molecular biologists at institutions other than Stanford and to experiment with the use of a SUMEX-lTike resource to disseminate sophisticated AI software tools to a generally computer-naive community, we initiated an experimental user group called GENET. The response to our very limited announcement of this facility has been most enthusiastic. We have offered three main programs to assist molecular genetics users: SEQ, a DNA-RNA sequence analysis program; MAP, a program that assists in the construction of restriction maps from restriction enzyme digest data; and MAPPER (written and maintained by William Pearson from Johns Hopkins University), a simplified version of the MOLGEN MAP program that is somewhat more efficient than the MOLGEN version. Some of the other more sophisticated programs being developed by MOLGEN research efforts have not been offered because they are not ready for novice users. In addition, the GENET users have had access to the SUMEX-AIM programs for electronic messaging, text editing, file searching, etc. The GENET community, begun in spring of 1980, started to grow exponentially until they were consuming SUMEX resources on a scale equal to the largest AI research project. We were obliged to place restrictions on the number of simultaneous GENET users and to otherwise limit the growth of the community. Even with these restrictions, the community currently consists of approximately 200 users from 63 research institutions. Of these 200 users, approximately 35 are consistently active users. That is, they log in, run programs, and interact with the MOLGEN members on an almost daily basis. Many of these users have made valuable contributions to our work. About 100 others are frequent, but not regular users. They log in only when they have a major analysis task to perform, which seems to be on the order of once a month. The remaining users rarely use the system. They have logged in a few iimes, but for one reason or another they never become regular users of the system. Quite often this is because a lab group will settle on having one or two graduate students or post-doctoral associates become the “computer experts” of the group, and as a result, the computer use by the other people in the lab drops to a very Tow level. An equally prevalent reason for users to stop using the GENET account is a lack of SUMEX resources. The major complaint that we get from GENET users concerns the lack of compute time and availability of the system. One account just is not enough for 200 people to share. We have succeeded in the goals set out for GENET. Many of our GENET guests have become active collaborators in core MOLGEN research. We are also pleased by the numerous comments SUMEX has received from GENET users 67 —E. A. Feigenbaum Highlights P41 RROO785-08 praising the user-sensitive nature of the resource, especially in comparison to typical university computer centers. It is clear we have only had the resources to whet the appetite of this large, active, international community. —. A. Feigenbaum 68 P41 RROO785-08 Highlights 1.B.4 AGE - A Tool for Knowledge-Based System Development One of the most difficult, time-consuming, and expensive aspects of building knowledge-based systems (indeed any kind of software system) is the human effort involved in designing and coding them from the ground up. A major goal of SUMEX core research has been to demystify and make explicit the art of knowledge engineering. More concretely, we have attempted to isolate inference, control, and representation techniques from previous knowledge-based programs; reprogram them for domain independence; write an interface that will help a user understand what the package offers and how to use the modules; and make the package available to other members of the AIM community and the general scientific community to assist in knowledge- based program development. The AGE (Attempt to Generalize) package, developed by H. P. Nii and E. Feigenbaum is one of the earliest experimental examples of such a system and has reached a level of practical utility. The design and implementation of the AGE program is based primarily on the experience gained with knowledge-based programs at the Stanford Heuristic Programming Project in the last decade. The programs that have been, or are being, built include: DENDRAL, meta-DENDRAL, MYCIN, HASP, AM, MOLGEN, CRYSALIS [Feigenbaum 1977], and SACON [Bennett 1978]. Initially, the AGE program embodies the AI techniques used in these programs but longer range goals are to integrate those developed at other AI laboratories as well. It is hoped that AGE will speed up the process of building knowledge-based programs and facilitate the dissemination of AI techniques by (1) packaging common AI software tools so that they need not be reprogrammed for every problem; and (2) helping people who are not knowledge engineering specialists write knowledge-based programs. AGE is being developed along two separate fronts: the "kit" of tools for implementing knowledge-based systems and the "intelligent" interface to assist users make use of them. The current AGE system provides a set of preprogrammed "components" or “building blocks". A "component" is a collection of functions and variables that support conceptual entities in program form, For example, the production rule component, consists of a rule interpreter and various strategies for rule selection and execution. The components in AGE have been carefully selected and modularly programmed to be useable in combinations. For those users not familiar enough to experiment on their own, AGE provides two predefined configurations of components--each configuration is called a "framework". One framework, called the Blackboard framework, is for building programs that use a globally accessible data structure called a "blackboard" [Lesser 77}, and independent sources of knowledge which cooperate to form hypotheses. The other framework, called the Backchain framework, is for building programs that use backward-chained production rules as their primary mechanism of generating inferences (e.g., MYCIN). Currentty AGE-1 is available on a limited basis on the SUMEX-AIM resource and on the Stanford SCORE 2060 computer in the Computer Science Department. We held a three-day workshop in March 1980 to familiarize invitees with the use of AGE and to allow each participant to implement a 69 E. A, Feigenbaum Highlights P41 RROO785-08 running program related to his application area. For the 1980 AIM Workshop we reimplemented a major portion of the VM program using AGE. In addition to demonstrating a variety of features of AGE, we were able to illustrate the relatively short implementation time required once the goals of the application and the necessary knowledge were delineated -- a first-year graduate student had the program running in three weeks. We are still working to broaden the user community for AGE and to learn from their experiences what directions our future research efforts should take. E. A. Feigenbaum 70 P41 RROO785-08 Highlights I.B.5 ONCOCIN- An Oncology Chemotherapy Advisor Work on the oncology chemotherapy consultation system, named ONCOCIN, was begun in July 1979. It is one of the newest application areas being investigated in the Stanford SUMEX-AIM community and is designed to be an interactive system for assigning and managing patients on chemotherapy protocols. This spring, it was installed for initial experimental use by faculty and fellows in the Debbie Probst Oncology Day Care Center at Stanford University Medical Center. Overall goals for ONCOCIN are (1) to demonstrate that a rule-based consultation system with explanation capabilities can be usefully applied and accepted in a busy clinical environment; (2) to improve the tools available for building knowledge- based expert systems for medical consultation; and (3) to establish an effective scientific relationship with a group of physicians that will facilitate future research and implementation of knowledge-based tools for clinical decision making. In addition to ONCOCIN's basic AI research goals, it is directed toward the development of a clinically useful oncology consultation tool that will: (1) assist with the identification of protocols that may apply to a given patient and to help determine the patient's eligibility for a given protocol; (2) provide detailed information on protocols in response to questions from clinic personnel; (3) assist with chemotherapy dose selection and attenuation for a given patient; (5) provide reminders, at appropriate intervals, of follow-up tests and films required by the protocol in which a given patient is enrolled; and (6) reason about managing current patients in light of stored data from previous visits of the individual patients or aggregate data about groups of "similar" patients. We are pursuing a five-year plan for accomplishing these goals. We spent the first year working out a prototype ONCOCIN system, drawing from programs and capabilities developed for the EMYCIN system-building project. We also undertook a detailed analysis of the day-to-day activities of the Stanford oncology clinic in order to determine how to introduce ONCOCIN with minimal disruption of an operation which is already running smoothly. Much of this early effort was spent giving careful consideration to the most appropriate mode of interaction with physicians in order to optimize the chances for ONCOCIN to become a useful and accepted tool in this specialized clinical environment. More recently we have detailed the design and have implemented an actual experimental system. This system is based on multiple processes that manage the physician interface, reasoning and problem-solving, and patient database management. All of the system work has been completed to allow installation of ONCOCIN in the clinic. Following the initial prototype development based on lymphoma protocols, we checked to verify that the representation method we are using will be adequate for arbitrary protocol Knowledge that may be encountered in the future. So we decided to encode and briefly test the knowledge of a non-lymphoma protocol. We chose the complicated protocol for oat cell (small cell) carcinoma of the lung because it involves a large number of possible therapies and complex interweaving of chemotherapy and radiotherapy. After approximately one 71 E. A. Feigenbaum Highlights P41 RR0O785-08 month's effort, the oat cell protocol was encoded and run successfully on a number of test cases. In addition, the lymphoma protocol specifications used in the clinic were changed and we spent a few weeks entering the necessary corrections. In all cases the ONCOCIN representation scheme was adequate to accommodate the protocol knowledge with only minor changes, and we are confident that the system will be able to adapt to other protocols that need to be encoded in the coming years. ONCOCIN has been extensively debugged through runs on several hundred sample patient cases with the results reviewed in detail by the collaborating oncologists. We have just begun to offer the ONCOCIN system for use by the oncology faculty and fellows in the morning chemotherapy clinics in which most of the lymphoma patients receive their treatment. We have taken care in introducing ONCOCIN to provide needed baseline information so we can formally evaluate its impact and effectiveness in the oncology clinic. E. A. Feigenbaum 72 P41 RROO785-08 Administrative Changes I, Administrative Changes The SUMEX-AIM resource has undergone several administrative changes this past year that serve to enhance its position within the Stanford Medical School as a resource for AI research: 1) 2) 3) Professor Edward Shortliffe was appointed as co-Principal Investigator of SUMEX-AIM. - Professor Shortliffe has been central in the development of the MYCIN group of projects and has tong worked closely with Professor Feigenbaum in planning the future development of SUMEX. This appointment takes formal recognition of this role for Professor Shortliffe and strengthens SUMEX-AIM through his close scientific and administrative ties to the Stanford medical community. In parallel with Professor Shortliffe's appointment as co-Principal Investigator, SUMEX moved administratively from the Department of Genetics to the Department of Medicine. It is now administered jointly between the Departments of Medicine and Computer Science. As part of the largest clinical medicine department at Stanford, SUMEX now has increased visibility and opportunity to broaden its local scientific collaborations. Professor Elliott Levinthal began a two-year leave of absence to take a position as head of the Defense Sciences Office at DARPA. Professor Roy Maffly has replaced him as AIM liaison in charge of coordinating the reviews of new project applications and serving as the interface to collaborative projects. 73 E. A. Feigenbaum Resource Management and Allocation P41 RROO785-08 I.D Resource Management and Allocation The mission of SUMEX-AIM, locally and nationally, entails both the recruitment of appropriate research projects interested in medical AI applications and the catalysis of interactions among these groups and the broader medical community. User projects are separately funded and autonomous in their management. They are selected for access to SUMEX on the basis of their scientific and medical merits as well as their commitment to the community goals of SUMEX. Currently active projects span a broad range of application areas such as clinical diagnostic consultation, molecular biochemistry, belief systems modeling, mental function modeling, and instrument data interpretation (descriptions of the individual collaborative projects are in Section II beginning on page 89). I.D.1 Management Committees Since the SUMEX-AIM project is a multilateral undertaking by its very nature, we have created several management committees to assist in administering the various portions of the SUMEX resource. As defined in the SUMEX-AIM management plan adopted at the time the initial resource grant was awarded, the available facility capacity is allocated 40% to Stanford Medical School projects, 40% to national projects, and 20% to common systen development and related functions. Within the Stanford aliquot, Prof. Feigenbaum and BRP have established an advisory committee to assist in selecting and allocating resources among projects appropriate to the SUMEX mission. The current membership of this committee is listed in Appendix C. For the national community, two committees serve complementary functions. An Executive Committee oversees the operations of the resource as related to national users and makes the final decisions on authorizing admission for new projects and revalidating continued access for existing projects. It also establishes policies for resource allocation and approves plans for resource development and augmentation within the national portion of SUMEX (e.g., hardware upgrades, significant new development projects, etc.). The Executive Committee oversees the planning and implementation of the AIM Workshop series currently implemented under Prof. S. Amarel of Rutgers University and assures coordination with other ‘AIM activities as well. The committee will play a key role in assessing the possible need for additional future AIM community computing resources and in deciding the optimal placement and management of such facilities. The current membership of the Executive committee is listed in Appendix C. Reporting to the Executive Committee, an Advisory Group represents the interests of medical and computer science research relevant to AIM goals. The Advisory Group serves several functions in advising the Executive Committee; 1) recruiting appropriate medical/computer science projects, 2) reviewing and recommending priorities for allocation of resource capacity to specific projects based on scientific quality and E. A. Feigenbaum 74 P41 RROO785-08 Resource Management and Allocation medical relevance, and 3) recommending policies and development goals for the resource. The current Advisory Group membership is given in Appendix Cc, These committees have actively functioned in support of the resource. Except for the meetings held during the AIM workshops, the committees have "met" by messages, net-mail, and telephone conference owing to the size of the groups and to save the time and expense of personal travel to meet face to face. The telephone meetings, in conjunction with terminal access to related text materials, have served quite well in accomplishing the agenda business and facilitate greatly the arrangement of meetings. Other solicitations of advice requiring review of sizable written proposals are done by mail. We will continue to work with the management committees to recruit the additional high quality projects which can be accommodated and to evolve resource allocation policies which appropriately reflect assigned priorities and project needs. We will continue to make information available about the various projects both inside and outside of the community and thereby promote the kinds of exchanges exemplified earlier and made possible by network facilities. I.D.2 New Project Recruiting The SUMEX-AIM resource has been announced through a variety of media as well as by correspondence, contacts of NIH-BRP with a variety of prospective grantees who use computers, and contacts by our own staff and committee members. The number of formal projects that have been admitted to SUMEX has more than trebled since the start of the project to a current total of 8 national AIM projects and 8 Stanford projects. Others are working tentatively as pilot projects or are under review. We have prepared a variety of materials for the new user ranging from general information such as is contained in a SUMEX-AIM overview brochure to more detailed information and guidelines for determining whether a user project is appropriate for the SUMEX-AIM resource. A questionnaire is available to assist users seriously considering applying for access to SUMEX-AIM. Pilot project categories have been established both within the Stanford and national aliquots of the facility capacity to assist and encourage new projects in formulating possible AIM proposals and pending their application for funding support. Pilot projects are approved for access for limited periods of time after preliminary review by the Stanford or AIM Advisory Group as appropriate to the origin of the project. These contacts have sometimes done much more than provide support for already formulated programs. For example, Prof. Feigenbaum's group at Stanford previously initiated a major collaborative effort with Dr. Osborn's group at the Institutes of Medical Sciences in San Francisco. This project in "Pulmonary Function Monitoring and Ventilator Management - PUFF/VM" (see Section II.A.2.4 on page 201) originated as a pilot 75 E. A. Feigenbaum Resource Management and Allocation P41 RROO785-08 request to use MLAB in a small way for modeling. Subsequently the AI potentialities of this domain were recognized by Feigenbaum, Nii, and Osborn and a joint proposal was submitted to and funded by NIH. This past summer John Kunz from Dr. Osborn'’s laboratory spent. approximately half time at Stanford to learn more about AI research and to participate more closely in the development of the PUFF/VM program. Similarly, Prof. Feigenbaum and Ms. Nii recently spent two days with Profs. Kintsch and Polson at the University of Colorado, introducing them to the newly developed AGE package for use in formulating their program on modeling aspects of human cognition. A list of the fully authorized projects currently comprising the SUMEX-AIM community can be found with brief abstracts in Appendix A on page 278. More detailed descriptions of collaborative project activities can be found in Section II. As an additional aid to new projects or collaborators with existing projects, we provide a limited amount of funds for use to support terminals and communications needs of users without access to such equipment. We are currently providing support for 6 terminals and 4 modems for users as well as a leased line between Stanford and the University of California at Santa Cruz for the Chemical Synthesis project. 1.0.3 Stanford Community Building The Stanford community has undertaken several internal efforts to encourage interactions and sharing between the projects centered here. Professor Feigenbaum organized a project with the goal of assembling a handbook of AI concepts, techniques, and current state-of-the-art. This project has had enthusiastic support from the students and substantial progress made in preparing many sections of the handbook (see Section TI.A.1.2 on page 99 for more details). Weekly informal lunch meetings (SIGLUNCH) are also held between community members to discuss general AI topics, concerns and progress of individual projects, or system problems as appropriate. In addition, presentations from a substantial number of outside speakers are invited. I.D.4 Existing Project Reviews We have conducted a continuing careful review of on-going SUMEX~AIM projects to maintain a high scientific quality and relevance to our medical AI goals and to maximize the resources available for newly developing applications projects. At meetings of the AIM Advisory Group and Executive Committee this past year, all the national AIM projects were reviewed. E. A. Feigenbaum 76 P41 RROO785-08 Resource Management and Allocation These groups recommended continued access for most formal projects on the system. However, they recommended that the Higher Mental Functions project could better meet it current goals through computer support at UCLA and we have therefore reduced this project to “associate” status. 1.D.5 Resource Allocation Policies As the SUMEX Facility has become increasingly loaded, a number of diverse and conflicting demands have arisen which require controlled allocation of critical facility resources (file space and central processor time). We have already spelled out a policy for file space management; an allocation of file storage is defined for each authorized project in conjunction with the management committees. This allocation is divided among project members in any way desired by the individual principal investigators. System allocation enforcement is implemented by project each week. As the weekly file dump is done, if the aggregate space in use by a project is over its allocation, files are archived from user directories over allocation until the project is within its allocation. We have implemented effective system scheduling controls to attempt to maintain the 40:40:20 balance in terms of CPU utilization and to avoid system and user inefficiencies during overload conditions. The initial complement of user projects justifying the SUMEX resource was centered to a large extent at Stanford. Over the past five years of the SUMEX grant, a substantial growth in the number of national projects was realized. During the same time the Stanford group of projects has matured as well and in practice the 40:40 split between Stanford and non-Stanford projects is not ideally realized although the demand from the national community has increased substantially (see Figure 9 on page 33 and the tables of recent project usage on page 36). Our job scheduling controls bias the allocation of CPU time based on percent time consumed relative to the time allocated over the 40:40:20 community split. The controts are “soft” however in that they do not waste computer cycles if users below their allocated percentages are not on the system to consume the cycles. The operating disparity in CPU use to date reflects a substantial difference in demand between the Stanford community and the developing national projects, rather than inequity of access. For example, the Stanford utilization is spread over a large part of the 24- hour cycle, while national-AIM users tend to be more sensitive to local prime-time constraints. (The 3-hour time zone phase shift across the continent is of substantial help in load balancing.) During peak times under the overload control system reported previously, the Stanford community still experiences mutual contentions and delays while the AIM group has relatively open access to the system. We did enable overload controls for the national community this past year, however, because of their substantial increase in demand. For the present, we propose to continue our policy of "soft" allocation enforcement for the fair split of resource capacity. 77 E. A. Feigenbaum Resource Management and Allocation P41 RROO785-08 Our system also categorizes users in terms of access privileges. These comprise fully authorized users, pilot projects, guests, and network visitors in descending order of system capabilities. We want to encourage bona fide medical and heatth research people to experiment with the various programs available with a minimum of red tape while not allowing unauthenticated users to bypass the advisory group screening procedures by coming on as guests. So far we have had relatively little abuse compared to what other network sites have experienced, perhaps on account of the personal attention that senior staff gives to the logon records, and to other security measures. However, the experience of most other computer managers behooves us to be cautious about being as wide open as might be preferred for informal service to pilot efforts and demonstrations. We will continue developing this mechanism in conjunction with management committee policy decisions. We have also encouraged mature projects to apply for their own machine resources in order to preserve the SUMEX-AIM resource for research and development efforts and to support projects unable to justify their own machines. The INTERNIST project has received approval for a VAX machine to support their planned development and program testing work. Also Profs. Lesgold and Greeno's "Simulation of Cognitive Processes" project has moved the bulk of their work to their own local VAX. E. A. Feigenbaum 78 P41 RROO785-08 Dissemination Efforts I.E Dissemination Efforts Throughout its existence, SUMEX-AIM has devoted substantial efforts toward disseminating information about its activities as a resource and about the work of individual collaborative projects. We continue to make many presentations at professional meetings, to provide services to demonstrate developed AI programs for interested groups and individuals, and to work in organizing workshops within the SUMEX-AIM community to introduce our work to collaborating professional communities. We have also spent substantial efforts in the past working with the Research Resources Information Center to produce the "Seeds of Artificial Intelligence" monograph to address a broader community of technical and lay people. The following sections summarize .some of the activities undertaken this past year: I.E.1 Sixth AIM Workshop The Sixth Annual AIM (Artificial Intelligence in Medicine) Workshop was held at Stanford University on 13-16 August 1980. The program chairman was Dr. —. Shortliffe, the chairman for demo-based sessions was Dr. L. Fagan, and the short report chairman was Dr. R. Blum. This was the first Workshop to be held in California, and was be held in conjunction with the first annual meeting of the AAAI Society (American Association for Artificial Intelligence). Among the goals of this year's conference was the development of a format for scientific exchange that would help clarify the technical details of the programs that are under development throughout the AIM community. Many individuals have observed that it can be difficult at meetings such as this to obtain detailed understanding of one another's work. Formal presentations with slides and a description of data structures typically are divorced from a sense of the program's operation as seen to the user. AS a result, many of us have had to complement our annual Workshop participation with visits to other sites so that we can learn about others' work in depth. In 1980 we experimented with a format that tried to simulate the kind of detailed interactions that have previously occurred only in individual sessions after hours or at times other than the Workshop. Demo-Based Sessions This year the major portion of the conference was devoted to detailed discussions of AIM systems through the vehicle of specially prepared demonstrations. Each of the established AIM systems was represented with a two hour presentation. Each speaker had a display terminal, special projection system and high-speed connections to the SUMEX 2020 computer. Rather then rely on an impromptu live demonstration, each project was asked to prepare a typescript of an interactive session, subject to the following guidelines: 79 E. A. Feigenbaum Dissemination Efforts P41 RROO785-08 (1) The typescript was to represent the interaction exactly as it occurred on the screen to the user (i.e., the presenters were asked not to delete mistakes, problems, garbage collect messages, etc.), and was to be augmented only with the following: (a) annotations to clarify specific points. (b) "break" interruptions as described below. (2) The typescript was to be presented in short segments with enough discussion to identify the current point in the program's reasoning process. (3) At pertinent points the researchers were asked to break into the program's operation during typescript preparation and display pertinent data structures to illustrate the system's internal representation and organization, A computer program was written by the SUMEX staff to facilitate the display of annotated and formatted typescripts. The input to the program is a typescript file that has special control characters inserted into text to mark off pages of information. Other control characters are used to highlight (brighten) important points in the typescript, to turn pages or move to a specific page, and to provide for different levels of detail. The provision for different levels of detail was designed to show selectively information in response to questions, or to adjust presentations for different audiences (e.g., physicians vs. computer scientists). Because the program's output is treated as a text file by the system, slide-line material or diagrams can be inserted into the running transcript. A more complete description of the program is available on- line on the SUMEX computer. No detailed evaluation of the demonstration techniques was undertaken, but our general impression was that the extended speaking time and concentration on program typescripts did orient the talks towards the details of how the programs operate. The major limitations were adequate but lTess~than-optimal imaging quality from the projection system (particularly in the largest auditorium), and the limited experience of AIM users with the equipment and software used. The SUMEX 2020 with the KI- 10's as backup provided excellent computer support for the display technology. One group, the BELIEVER project from Rutgers, augmented their typescript with a “live” demonstration running on the Rutgers AIM resource. The SUMEX staff provided excelient support in the development of programs, equipment setup, and computer support. A series of 20-minute parallel sessions was also provided for newer AIM systems under development. These talks used standard visual aids. If AIM conferences use demonstration sessions in the future, the featured programs should probably be chosen from among these developing systems. Since the Workshop, several projects (including MOLGEN, GUIDON, and VM) have used the stored typescript for demonstrations. They have been useful when visitors wish to see a particular program but resources are not E. A. Feigenbaum 80 P41 RROO785-08 Dissemination Efforts available to run the program in a real-time setting. Each of the demonstration files is available on the SUMEX system, and is available for access by all SUMEX users. 81 E. A. Feigenbaum Dissemination Efforts P41 RROO785-08 I.E.2 Tutorial on AI in Medicine In conjunction with the AIM Workshop, a continuing education tutorial designed for physicians was held at Stanford on August 17-18, 1980. The tutorial was entitled "Computers in Medicine -- Applications of Artificial Intelligence Techniques” and was organized by Drs. W. Clancey and E. Shortliffe. The tutorial had a remarkably good attendance by physicians as well as several other individuals with an interest in the field. The course included an optional introduction to computers for those who had no prior experience with the technology, an overview of SUMEX-AIM research, and an introduction to background materials regarding decision theory and data base applications in medicine. Speakers also provided detailed presentations on MYCIN, CASNET/EXPERT, INTERNIST and GUIDON. The course closed with a panel discussion on the problems and promise of AI in Medicine. It was accredited for postgraduate medical education through Stanford University School of Medicine; the 135 physicians in attendance earned 11.5 continuing education credits. In addition, 18 students, several non-physician researchers, and 10 members of the press attended. Enrollees came from as far away as Mexico and the East Coast. For the reasonable fee of $40 covering the two days of lectures, the attendees also received a syllabus of readings and two lunches. The syllabus is a comprehensive survey of medical AI research and is comprised of recent articies written by the tutorial faculty, mostly for a clinical audience. The faculty consisted of 15 distinguished researchers from the AIM community, including 7 physicians and 9 speakers from centers other than Stanford. By holding the tutorial immediately after the AIM Workshop and before the first Annual Meeting of the American Association for Artificial Intelligence (AAAI), we were able to secure the participation of expert physicians in the field who were already at Stanford (Drs. Greenes, Lindberg, Myers, and Pauker), as well as computer scientists from the East Coast (Ors. Davis, Kulikowski, Pople, Szolovits, and Swartout). Stanford speakers included Drs. Blum, Buchanan, Clancey, Feigenbaum, Fries, and Shortliffe. Coordination and planning for the tutorial was was facilitated by sending electronic messages; almost all speakers regularly use SUMEX or another ARPANET machine. To evaluate the impact of the tutorial on the participants, and to assess baseline opinions regarding the field, we undertook a survey of the physicians' knowledge about computers as well as their attitudes towards medical consultation systems. The statistical analysis of these questionnaires has now been completed, and a paper summarizing the results submitted for publication (Teach, R.L. and Shortliffe, E.H. "An analysis of physician attitudes regarding computer-based clinical consultation systems." Submitted for publication, March 1981). In brief, the survey showed that physicians were willing to accept the possibility of computer- based clinical decision aids but placed severe demands on the capabilities of such systems if they were to be acceptable for routine use. In addition, attendees were asked to evaluate the course itself, as well the the talks by individual speakers. These forms showed that the course was exceedingly well received. Attendees were fascinated by the content, generally felt it was well presented, and indicated they would E. A. Feigenbaum 82 P41 RROO785-08 Dissemination Efforts recommend the course to others if it were made available again. Many physicians requested a follow-up course that would introduce them to more technical detail than had been possible in the introductory tutorial. In conclusion, we believe that the tutorial was an encouraging success, and demonstrated the effectiveness of this kind of forum for introducing physicians to the research efforts within the AIM community. The faculty is enthusiastic about repeating the course, possibly on the East Coast in conjunction with a future AIM Workshop. Several members of the audience expressed interest in detailed, small group discussions of particular AIM programs. We believe these discussions could be a valuable way of exporting our methods and approach beyond the immediate AIM community. 83 E. A. Feigenbaum Dissemination Efforts P41 RROO785-08 I.E.3 GENET - An Experiment in AI System Dissemination Background The MOLGEN project at Stanford (see Section II.A.1.5 on page 136) has focused on applications of artificial intelligence and symbolic computation to the field of molecular biology. The research began in 1975 and is currently in the first year of a three year grant renewal. In early 1980 it was realized that some of the systems developed by MOLGEN were of direct utility to many scientists in the domain. Accordingly, with the cooperation of the SUMEX-AIM staff and close coordination with the AIM Executive Committee, it was decided in February 1980 to provide a carefully limited guest service for the community use of such systems. There were two major reasons for the establishment of this guest service, which took the form of the GENET account on SUMEX. The first was to broaden MOLGEN's base of scientist collaborators, to find molecular biologists at institutions other than Stanford who could contribute actively to our knowledge-based approach to problem solving. The second was to introduce a generally computer-naive community to the benefits of resource sharing provided by a system like SUMEX, with the hope of serving as a model for the dissemination of other AI software and possibly for an eventual resource for molecular biology. We believe that we have succeeded in these two goals. Many of our GENET guests have become active collaborators in core MOLGEN research. These collaborators include Professor Allan Maxam at Harvard Medical School, Dr. Walter Goad at Los Alamos, Dr. Richard Roberts at Cold Spring Harbor, Dr. William Pearson at Johns Hopkins, Ors. Walter Bodmer, Julia Bodmer, and Robert Kamen at the Imperial Cancer Research Fund, Professor Fred Blattner at Wisconsin, Dr. Andrew Taylor at University of Oregon, and Dr. Dan Davison of SUNY-Stonybrook. We are also pleased by the numerous comments SUMEX has received from GENET users praising the user-sensitive nature of the resource, especially in comparison to typical university computer centers. GENET has been important both for MOLGEN and for the national community of molecular biology. It has ensured a steady flow of ideas for the artificial intelligence research that is core to both the MOLGEN grant and the SUMEX-AIM mission. It has also provided a useful service to an international community that is not readily available elsewhere. GENET Community Management Our decision to support the GENET guest experiment and our approach to doing so within the SUMEX-AIM resource has been reviewed and approved both by the AIM Executive Committee and by the Initial Review Group/National Advisory Research Resources Council in the course of the peer review of our pending SUMEX renewal application. We have tried to manage the GENET guest experiment in such a way that we maintain the "friendly" interface of the SUMEX-AIM resource for molecular biologists unfamiliar with computers while taking appropriate steps so that GENET E. A. Feigenbaum 84 P41 RROO785-08 Dissemination Efforts usage does not detract from on-going AI research and so that we assure prudent administration SUMEX as an NIH-BRP resource. The key elements our management approach include: 1) Controlled announcement of the GENET opportunity -- Beginning in February 1980, the availability of GENET services was announced, primarily by talks at professional conferences with accompanying program demonstrations. We decided against publishing "blanket" in announcements in professional journals in order to maintain a very high standard of collaborator interest and scientific expertise within the limited group we could serve with available SUMEX resources. 2) Close coordination with the AIM Executive Committee -- We kept the AIM Executive Committee apprised of plans for the GENET experiment and of progress and growth of the community. At the August 1980 AIM Workshop meeting of the Executive Committee, Professor L. Kedes of the MOLGEN project made a presentation on the status of GENET. The Executive Committee approved continuation of the GENET service but because of the significant growth in the number of GENET users and their consumption of CPU resources, a limit of two simultaneous GENET jobs was placed on the community. The Executive Committee also approved the concept of a proposed Molecular Biology Computing Resource related to but separate from the existing SUMEX resource. 3) Careful control of GENET usage -- We have closely monitored the very rapid growth in GENET usage of SUMEX (see data below). With Executive Committee advice and in cooperation with the MOLGEN project personnel managing the GENET community, we have instituted several successively stringent controls on GENET users: a) All GENET users run out of the same directory so scheduler control limits are enforced to hold GENET usage as a whole down relative to that of AI research projects during heavy loads. b) The GENET directory has been intentionally limited in disk space allocation so that large numbers of files cannot be retained. c) Starting in October 1980, a limit of two simultaneous logged-in GENET jobs was placed on the community. d) Starting in December 1980, a policy statement was issued restricting GENET use to academic collaborators. MOLGEN project management informed industrial collaborators that they could no longer use the GENET facility and actively monitored adherence to this policy. Previously, valuable feedback had been obtained from a small group of industrial collaborators for MOLGEN ATI program development. However, with the rapid growth of the highly competitive molecular genetics industry, there was no way we could adequately control industrial users consistent with SUMEX's status as a federally funded national resource. Thus, we decided to exclude them. In April 1981, we instituted a GENET user password checking system to further control community access, particularly in regard to industrial users. 85 E. A. Feigenbaum Dissemination Efforts P41 RROO785-08 4) Limited commitment of SUMEX staff resources -- The day to day management of the GENET community has been the responsibility of MOLGEN project personnel. SUMEX personnel have only contributed to developing system facilities to help manage GENET (guest and GENET password capabilities), assisted with technical communications problems, and advised in establishing GENET management policies consistent with AIM Executive Committee and SUMEX Principal Investigator resource policies. The total commitment of staff time has been on the order of 1-2 man-months. Scope of the GENET User Community The GENET community consists of approximately 200 users from 63 research institutions. Of these 200 users, approximately 35 are consistently active users. That is, they log in, run programs, and interact with the MOLGEN members on an aimost daily basis. Many of these users have made valuable contributions to our work. About 100 others are frequent, but not regular users. They log in only when they have a major analysis task to perform, which seems to be on the order of once a month. The remaining users rarely use the system, They have logged in a few times, but for one reason or another they never become regular users of the system. Quite often this is because a lab group will settle on having one or two graduate students or post-doctoral associates become the "computer experts" of the group, and as a result, the computer use by the other people in the lab drops to an almost non-existent level. Unfortunately, an equally prevalent reason for users to stop using the GENET account is a lack of resource time. Probably the major complaint that we get from GENET users is concerning the lack of compute time and availability of the system. One account just is not enough for 200 people to share, especially when it is restricted to 2 jobs at one time. We constantly remind the GENET users to use there resources wisely. We encourage them to use the BATCH system to run job in the wee hours of the morning, and we remind them to be prepared to do their work quickly when they log in to the system, but their efforts do not seem to help the problem very much. Most GENET users use only'a small set of programs. These consists of text editors, which are used to set up the data files that for the MOLGEN analysis programs; XSEARCH, which GENET users use to effectively search through our database for sequences that can assist them in their research; and the electronic mail facilities. Very few of our GENET users actually feel ‘comfortable using programs other than the ones that we maintain, not because the other programs would not be useful, but instead because the users do not have the computer time to experiment with what is available. There are three note-worthy programs that we provide for GENET users that are used extensively. SEQ, a DNA-RNA sequence analysis program, which is continually being improved, is the most widely used. MAP, a program that assists in the construction of restriction maps from restriction enzyme digest data, is also used a great deal. Finally, a new program, MAPPER (written and maintained by William Pearson from Johns Hopkins University), is a simplified version of the MOLGEN MAP program that is somewhat more E. A. Feigenbaum 86 P41 RROO785-08 Dissemination Efforts efficient than the MOLGEN version. The MOLGEN UE program and special molecular genetics knowledge bases are not available to the general GENET user at this time for two reasons. First of all, the UE program is quite costly to use (in terms of computer cycles), and secondly, we feel that the knowledge base is not quite ready for the computer novice to learn and use without a significant amount of initial assistance. A few GENET users (mostly Stanford associates) that have had a significant interest in the knowledge base have become EXO-MOLGEN users and are developing knowledge bases on their own which we hope will eventually be added to the ones that MOLGEN is developing and maintaining. GENET Usage Statistics Following is a table of monthly statistics for GENET usage of SUMEX. Note "TOTAL CONNECT HOURS" includes connect time for local dialups, hardlines, ARPANET, and TYMNET. "TYMNET CONNECT HOURS" includes that part of the total connect time which is via TYMNET and for which SUMEX pays a Separate usage charge. Recent GENET TYMNET usage has been about 20-25% of the total SUMEX TYMNET connect time. Our monthly TYMNET bills are about $5,000, so monthly GENET TYMNET usage is about $1,125. Most GENET users come from other parts of the country and no additional local dial-up lines have been installed to support GENET usage. Total TYMNET GENET % Month/ CPU Connect Connect of Sumex File Year Hours Hours Hours TYMNET Use Pages Feb/80 3.23 32.72 18.88 2.0% 57 Mar/80 1.28 51.57 12.80 1.4 95 Apr/80 8.37 117.87 51.73 5.4 209 May/80 9.20 104.46 66.65 8.0 166 Jun/8s0 11.08 188.35 118.03 11.7 253 Jul/80 19.21 342.87 189.00 18.2 231 Aug/80 18.71 257.23 188.53 18.2 367 Sep/80 57.32 409.83 254.53 28.5 626 Oct/80 36.47 348.66 211.95 23.3 920 Nov/80 82.90 648.56 308.40 31.1 1133 Dec/80 19.86 295.85 188.67 22.8 1110 Jan/81 48.00 747.91 277.30 27,2 996 Feb/81 22.58 265.39 163.55 16.1 962 Mar/81 29.73 613.74 313.57 25.0 982 Apr/81 43.04 662.57 unavail unavail 1633 Plots of the CPU usage, connect time, and file usage data can be found in Figures 16-18. 87 E. A. Feigenbaum Dissemination Efforts P41 RROQ785-08 100 4 GENET CPU Usage Hrs/Month 80 - 60 - 20 7 a ned . 0 T T T 1 | T T | T l T Apr July Oct Jan Apr 1980 1980 1980 1981 1981 Figure 16. GENET CPU Usage by Month ‘ _ GENET Conn Time 800 Hrs/Month | I é so “ 600 + i / on ~/ | 200 - 4 _——— O T T T T | T T T T T Apr July Oct Jan Apr 1980 1980 1980 1981 1981 Figure 17, GENET Connect Time by Month E. A. Feigenbaum 87a P4i RROO785-08 Dissemination Efforts 1800 + GENET File Usage Pgs/Month 1500 - 1200 -+ 900 - 600 + 300 + ae 0 T 1 T T I T 1 T T | Apr July Oct Jan Apr 1980 1980 1980 1981 1981 Figure 18. GENET File Space by Month 87b E. A. Feigenbaum Comments on the Biotechnology Resources Program P41 RROO785-08 I.F Comments on the Biotechnology Resources Program Resource Organization We firmly believe that the Biotechnology Resources Program is one of the most effective vehicles for developing and disseminating technological tools for biomedical research. The goals and methods of the program are well-designed to encourage building of the necessary multi-disciplinary groups, merging appropriate technological and medical disciplines. In our expertence with the SUMEX-AIM resource, several elements of this approach seem to emerge as key to the development and management of an effective resource: 1) Effective Management Framework - there needs to be an explicit agreement between the BRP and the resource principal investigator that sets out a clear mandate for the resource and its allocation, provides worthwhile ‘incentives for the host institution and investigator to invest the necessary substantial professional career time to develop and manage the resource, and ensures equitable distribution of resource services to its target community. 2) Close Working Relationship with NIH - a resource is a major and often Jong-term investment of money and human energy. A close and mutually supportive working relationship between resource management, its advisory committees, and the NIH administration is essential to assure healthy development of the resource and its relationship to its user community. We at SUMEX-A{M have benefited immensely from such a relationship with Dr. William R. Baker, Jr. in the evolution of the SUMEX-AIM community. 3) Freedom to Explore Resource Potential - a resource, by its nature, operates at the "cutting edge" in developing its characteristic technology and learning how to effectively disseminate it to the biomedical community at large. BRP should not impose artificial constraints on the resource for commercializing its efforts (fees for service) or developing its potential {budget ceilings). Such artificial policy impositions can serve to undermine the very goals central to BRP’s reason for existence. Satisfactory policies in this regard have been worked out recently and should be retained. Electronic Communications SUMEX-AIM has pioneered in developing more effective methods for facilitating scientific communication. Whereas face to face contacts continue to have their place, in the longer term we feel that computer- based communications will become increasingly important to NIH and the biomedical community. We would like to see BRP take a more active role in promoting these tools within NIH and its grantee community. A concrete step would be to become a sponsoring agency for the ARPANET which remains the most effective means for a very broad spectrum of services to promote good communications. This could serve as a base for interconnecting sponsored machines and offering a broader range of services and promoting broader collaboration among the biomedical community at large. E. A. Feigenbaum 88 P41 RROO785-08 Description of Scientific Subprojects II Description of Scientific Subprojects II.A Scientific Subprojects The following subsections report on the AIM community of projects and "pilot" efforts including local and national users of the SUMEX-AIM facility at Stanford. Those using the Rutgers-AIM facility are annotated with "[Rutgers-AIM]". In addition to these detailed progress reports, we have included briefer summary abstracts of the fully authorized projects in Appendix A on page 278. The collaborative project reports and comments are the result of a solicitation for contributions sent to each of the project Principal Investigators requesting the following information: I. SUMMARY OF RESEARCH PROGRAM Project rationale Medical relevance and collaboration Highlights of research progress --Accomplishments this past year --Research in progress D. List of relevant publications E. Funding support (see details below) 1 > IT. INTERACTIONS WITH THE SUMEX-~AIM RESOURCE Medical collaborations and program dissemination via SUMEX Sharing and interactions with other SUMEX-AIM projects {via computing facilities, workshops, personal contacts, etc.) C. Critique of resource management (community facilitation, computer services, communications services, capacity, etc.) OD III. RESEARCH PLANS (8/80-7/86) A. Project goals and plans --Near-term --Long-range B. Justification and requirements for continued SUMEX use C. Needs and plans for other computing resources beyond SUMEX-AIM D Recommendations for future community and resource development We believe that the reports of the individual projects speak for themselves as rationales for participation; in any case the reports are recorded as submitted and are the responsibility of the indicated project leaders. 89 E. A. Feigenbaum Stanford Projects P41 RROO785-08 II.A.1 Stanford Projects The following group of projects is formally approved for access to the Stanford aliquot of the SUMEX-AIM resource. Their access is based on review by the Stanford Advisory Group and approval by Professor Feigenbaum as Principal Investigator. E. A. Feigenbaum 90 P41 RROO785-08 AGE - Attempt to Generalize II.A.1.1 AGE - Attempt to Generalize AGE - Attempt to Generalize H. Penny Nii and Edward A. Feigenbaum Computer Science Department Stanford University ABSTRACT: Isolate inference, control, and representation techniques from previous knowledge-based programs; reprogram them for domain independence; write an interface that will help a user understand what the package offers and how to use the modules; and make the package available to other members of the AIM community and labs doing knowledge-based programs development, and the general scientific community. I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The general goal of the AGE project is to demystify and make explicit the art of knowledge engineering. It is an attempt to formulate the knowledge that knowledge engineers use in constructing knowledge-based programs and put it at the disposal of others in the form of a software laboratory. The design and implementation of the AGE program is based primarily on the experience gained in building knowledge-based programs at the Stanford Heuristic Programming Project in the last decade. The programs that have been, or are being, built are: DENDRAL, meta-DENDRAL, MYCIN, HASP, AM, MOLGEN, CRYSALIS [Feigenbaum 1977], and SACON [Bennett 1978]. Initially, the AGE program will embody artificial intelligence methods and techniques used in these programs. However, the Tong-range aspiration is to integrate those developed at other AI laboratories. The final product is to be a collection of building-block programs combined with an "intelligent front-end" that will assist the user in constructing knowledge-based programs. It is hoped that AGE will speed up the process of building knowledge-based programs and facilitate the dissemination of AI techniques by: (1) packaging common AI software tools so that they need not be reprogrammed for every problem; and (2) helping people who are not knowledge engineering specialists write knowledge-based programs. B. Medical Relevance and Collaboration AGE is relevant to the SUMEX-AIM Community in two ways: as a vehicle for disseminating cumulated knowledge about the methodologies of knowledge engineering and as a tool for reducing the amount of time needed to develop knowledge-based programs. (1). Dissemination of Knowledge: The primary strategy for conducting AI research at the Stanford Heuristic Programming Project is to build complex programs to solve carefully chosen problems and to allow the 91 —E. A. Feigenbaum AGE - Attempt to Generalize P41 RROO785-08 problems to condition the choice of scientific paths to be explored. The historical context in which this methodology arose and summaries of the programs that have been built over the last decade at HPP are discussed in {Feigenbaum 1977]. While the programs serve as case studies in building a field of “knowledge engineering,” they also contribute to a cumulation of theory in representation and control paradigms and of methods in the construction of knowledge-based programs. The cumulation and concomitant dissemination of theory occur through scientific papers. Over the past decade we have also cumulated and disseminated methodological knowledge. In.Computer Science, one effective method of disseminating knowledge is in the form of software packages. Statistical packages, though not related to AI, are one such example of software packages containing cumulated knowledge. AGE is an attempt to make yesterday's "experimental technique" into tomorrow’s "tool" in the Field of knowledge engineering. (2). Speeding up the Process of Building Knowledge-based Programs: Many of the programs built at HPP are intelligent agents to assist human problem solving in tasks of significance to medicine and biology (see separate sections for discussions of work and relevance). Without exception the programs were handcrafted. This process often takes many years, both for the AI scientists and for the experts in the field of collaboration, AGE will reduce this time by providing a set of preprogrammed inference mechanisms and representational forms that can be used for a variety of tasks. Close collaboration is still necessary to provide the knowledge base, but the system design and programming time of the AI scientists can be significantly reduced. Since knowledge engineering is an empirical science, in which many programming experiments are conducted before programs suitable for a task are produced, reducing the programming and experimenting time would significantly reduce the time required to build knowledge-based programs, C. Highlights of Research Summary Last year we reported the addition of Backchaining framework (the chaining of production rules in the manner similar to that used in MYCIN) and an interface to the Units package (for additional representational form and its use from AGE rules). In the past year we placed our research emphasis on (1) improving the existing component parts and the user interface, (2) developing debugging facilities, and (3) producing additional documents. We completed the implementation of Trace and Break packages, as well as a facility for trace-back explanation. Using the trace-back facility users can inquire about the program's actions; AGE answers the questions by using the execution history list. Some example questions are: "What was the hypothesis before the execution of rule 2 in KS X?", "What Event led to the activation of KS X?". Since AGE has no knowledge of the application domain, it cannot “explain” the program actions in the language of the domain, but it produces “explanations” that are useful to the implementers. E. A. Feigenbaum 92 P41 RROO785-08 AGE - Attempt to Generalize We found that the specification and editing protocols for the various components were awkward and difficult for the users to learn. We redesigned this particular portion of the interface and have completed about 75% of the re-implementation. In addition to the standard documents (a user's guide and a reference manual), we began a documented series of examples. These examples are actually implemented and running programs; each document consists of a description of the example problem, its formulation in terms of AGE, reasons for the particular formulation, and a complete program listing. In addition, the programs are available for the users to run. We observed that our documents, like most other program documentations, are useful only to those people who are already familiar with AGE. The Example Series is an experiment to see if a combination of standard documents and examples would be of any significant help to new users. D. Publications Nii, H. Penny and Aiello, Nelleke, "AGE: a knowledge-based program for building knowledge-based programs," Proc. of IJCAI-6, pp. 645-655, vol. 2, 1979. Nii, H. Penny, “An Introduction of Knowledge Engineering, Blackboard Model, and AGE," HPP Working Paper, HPP-80-29. Aiello, N. and Nii, H.P., "The Joy of AGE-ing: A User's Guide to AGE-1." Aiello, N., Bock, C., Nii, H.P., White, W., "AGE Reference Manual." AGE Example Series 1: “BOWL: A Beginner's Program." AGE Example Series 2: “AGEPUFF: A Simple Event-Driven Program." Il. INTERACTION WITH THE SUMEX-AIM RESOURCES AGE Availability: Currently AGE-1 is available to a limited number of groups on the PDP-10 at the SUMEX-AIM Computing Facility and on the PDP-20/60 at the SCORE Facility of the Computer Science Department. The current implementation is described briefly in a later section. Dissemination: We previously reported a three-day workshop that we conducted in March 1980. The aims of the workshop were to familiarize the attendees with the use of AGE, and for each participant to implement a running program related to his application area. Of the attendees, the group from the Institute of Medical Electronics, University of Tokyo, has continued to use AGE to develop a medical diagnosis program. In addition, many of the activities of the past year described earlier were direct results of what we learnt at the workshop. 93 E. A. Feigenbaum AGE - Attempt to Generalize P41 RROO785-08 For the 1980 AIM Workshop we reimplemented in AGE a major portion of the VM program (described elsewhere). In addition to demonstrating a variety of features of AGE, we were able to demonstrate the relatively short implementation time required once the goals of the application and the necessary knowledge were delineated -- a first-year graduate student had the program running in three weeks. Profile of the Current AGE System: To correspond to the two general technical goals described earlier, AGE is being developed along two separate fronts: the development of tools and the development of "intelligent" user interface. Currently Implemented Tools: The current AGE system provides the user with a set of preprogrammed modules called "components" or “building blocks". Using different combinations of these components, the user can build a variety of programs that display different problem-solving behavior. AGE also provides user interface modules that help the user in constructing and specifying the details of the components. A component is a collection of functions and variables that support conceptual entities in program form. For example, production rule, as a component, consists of: (1) a rule interpreter that support the syntactic and semantic description of production-rule representation as defined in AGE, and (2) various strategies for rule selection and execution. The components in AGE have been carefully selected and modularly programmed to be useable in combinations. For those users not familiar enough to experiment with combining the components, AGE currently provides the user two predefined configuration of components~--each configuration is called a "framework". One framework, called the Blackboard framework, is for building programs that are based on the Blackboard model [Lesser 77]. Blackboard model uses the concepts of a globally accessible data structure called a "blackboard", and independent sources of knowledge which cooperate to form hypotheses. The Blackboard model has been modified to allow flexibility in representation, selection, and utilization of knowledge. The other framework, called the Backchain framework, is for building programs that use backward-chained production rules as its primary mechanism of generating inferences. The Front-End: To support the user in the selection, specification, and use of the components, AGE is currently organized around four major subsystems that interact in various ways. Around it is a system executive that allows the user access to the subsystems through menu selection. Figure 1. shows the general interrelationship among these subsystems. E. A. Feigenbaum 94 P41 RROO785-08 AGE - Attempt to Generalize The Browse and Design subsystems help to familiarize the user with AGE and to guide the user in the construction of his programs through the use of predefined frameworks. The third subsystem is a collection of interface modules that help the user specify the various components of the framework. The last subsystem is designed for testing and refining the user program. Each of the subsystem is described in more detail below: BROWSE: The function of Browse subsystem is to guide the user in browsing through its textual knowledge base, called the MANUAL. The MANUAL contains (a) a general description of the building-block components on the conceptual level; (b) a description of the implementation of these concepts within AGE; (c) a description of how these components are used within the object program; (d) how they can be constructed by the user; and (e) various examples. The information in the MANUAL is organized to represent the conceptual hierarchy of the components and to represent the functional relationship among them. DESIGN: The function of the DESIGN subsystem is to guide the user in the design and construction of his program through the use of predefined configuration of components, or framework. Each framework is defined in DESIGN-SCHEMA, a data structure in the form of AND/OR tree, that, on one hand, represents all the possible configuration of components within the framework; and, on the other hand, represents the decisions the user must make in order to design the details of the user program. Using this schema, the DESIGN subsystem guides the user from one design decision point to another. At each decision point, the user has access to the MANUAL and also to advice regarding design decisions at that point. An appropriate ACQUISITION module can be invoked from the DESIGN subsystem so that general design and implementation specifications can be accomplished simultaneously. ACQUISITION: For each component that the user must specify, there is a corresponding acquisition/editor module that queries the user for task- specific information. The calling sequence of the acquisition module is guided by DESIGN-SCHEMA when the user is using the DESIGN subsystem. They can also be accessed directly from the system menu or Interlisp. INTERPRETER: This subsystem contains several modules that help the user run and debug his program. The Check module checks for the completeness and correctness of the specification for an entire framework. The Interpreter executes the user program. The Trace and Break modules are run-time debugging aids. The Editor, Check, Trace, Break, and the Explanation (described below) modules are designed to complement each other, and to help the user observe the workings of his program and to make corrections as necessary. EXPLANATION: AGE thas enough information to replay its execution steps, and it has reasonable justifications for the actions within the various framework. AGE provides a back-trace explanation facility whereby questions related to the execution history can be answered by the system interactively. However, AGE is totally ignorant of the user's task domain and has no means of conducting a dialogue about the specifics of the domain. A detailed history of the execution steps is available to the user to build his own domain specific explanation, if necessary. 95 E. A. Feigenbaum AGE - Attempt to Generalize P41 RROOQ785-08 SYSTEM KNOWLEDGE SUBSYSTEM RESULT t----------- + +----- V------ + | MANUAL |....>| BROWSE t----------- t.., 0 te---- to----- + . | trecenr nH n-- ee V------ + toon ece------ + | DESIGN |....>| DESIGN |....>]USER SYSTEM | | SCHEMA |.... | | | DESIGN | +----------- +, tee--- t------ + t------ +----- + [Koc ce eee e eee ewes +----------- +, teeeee V------ + te----------- + [COMPONENTS |....>| ACQUISITION]....>| | USER | EDITOR | | SYSTEM +----------- + tocenene----- + te----- t----- + [Kove cece cece eens +------ V----- + | INTERPRETER ]..... > EXECUTION fann--- |----- + HISTORY LIST Figure 1. AGE System Organization (... = data flow; --- = control flow) TII. RESEARCH PLAN Research Topics: The task of building a software laboratory for knowledge engineers is divided into two main sub-tasks: 1. The isolation of techniques used in knowledge-based programs: It has always been difficult to determine if a particular problem solving method used in a knowledge-based program is "Special" to a particular domain or whether it generalizes easily to other domains. In existing knowledge-based programs, the domain specific knowledge and the manipulation of such knowledge using AI techniques are often so closely coupled that it is difficult to make use of the programs for other domains. One of our goals is to isolate the AI techniques that are general and determine precisely the conditions for their use. 2. Guiding the user in the initial application of these techniques: Once the various techniques are isolated and programmed for use, an intelligent agent is needed to guide the user in the application of these techniques. In AGE-1, we assume that the user understands AI techniques, knows what she wants to do, but does not understand how to use the AGE system to accomplish his task. A longer range interest involves helping the user determine what techniques are applicable to his task, i.e. it will assume that the user does not understand the necessary techniques of writing knowledge-based programs. E. A. Feigenbaum 96 P41 RRO0785-08 AGE - Attempt to Generalize Research Plan: In our judgement the the first research task has progressed enough to a point where can continue on to the second task. .The system that embodies the results to date is called AGE-1. The structure of AGE-1 is now frozen and only minor modifications are being made. We wilt continue to support it by correcting bugs and adding requested features that are easily implementable. AGE-2 AGE-2 will try to address the second of the research tasks described above. Although the current Design subsystem provides specification functions that allow the user to interactively specify the knowledge of the domain and the control structure, it does not (aside from simple advise) provide the user any help in the actual design process. For example, AGE should be able to provide some aids to the user on what kinds of inference mechanisms and representations are appropriate for his application problem. We have stated this problem in our previous reports without any promising ideas on how we might attack this problem. With the variety of feedbacks we received from our experimental users, we now understand a few of the problems the inexperienced users are faced with. With these in mind, we have begun, and will continue, to explore ways in which we can redesign and add facilities that will help users who are not familiar with knowledge engineering techniques and methodologies. One of the major obstacles in the way of AGE-2 development is the way in which AGE-1 is implemented. Although the syntax of AGE-1 is clearly defined (see the Reference Manual), the semantics are not well-defined. They are defined in ad hoc fashion in the Editor, the Interpreter, and the Check modules. In order for AGE-2 to be able to conduct a dialogue about itself with the user, its semantics, as well as its syntax, must be uniformly represented. Since very little research results are available in the area of representing the semantics of systems (one exception is in the automatic programming research), we need to experiment with a variety of approaches. We have already begun to look into some alternative representations. In changing the representation of the AGE system, no new components will be added, and minimum amount of changes will be made to the vefinition of the existing components. Concurrent with re-representing the AGE system, we will identify a dozen or so framework, in addition of the existing two, that have simpler constructs and are easier for the novice users to understand. The simplicity will be achieved by providing less options for the user -~ options which, because of their nature, are confusing to new users. Limiting the degrees of freedom for the user has the side benefit of allowing AGE to provide more specific description and aids. For example, in a very constrained framework we can provide a library of “standard" predicates for the users, which can have associated with them English translations; with such texts available the rules and the back-trace explanation can be printed in English-like form. Once the user is 97 E. A. Feigenbaum AGE - Attempt to Generalize P41 RROO785-08 comfortable with the more simple frameworks, he can add complexity simply by replacing the predefined options selected for the frameworks. Computing Resources and Management: We believe the computing and communication resources provided by the SUMEX Facility make it one of the best in the country. The management is responsive to the needs of the research community and provides superb services. However, the system is getting to a point where no serious research and development is possible, because of the lack of computing cycles due to overcrowding. It is a compliment to the facility that there are so many users. On the other hand, our productivity has gone down in recent months, because of the heavy load on the system. It would appear that the situation will not improve on its own, since many of the projects that were small a few years ago are maturing into larger, more complex systems. Which is the way it should be. The environment in which the work is done also needs to grow. In short, without augmentation to the current computing power and storage space (which had never been generous), our ability to make research progress at SUMEX will be drastically curtailed. E. A. Feigenbaum 98 P41 RROO785-08 AI Handbook Project IT.A.1.2 AI Handbook Project Handbook of Artificial Intelligence E.A. Feigenbaum, A. Barr, and P. Cohen Stanford Computer Science Department I. SUMMARY OF RESEARCH PROGRAM A. Technical Goals The AI Handbook is a compendium of knowledge about the field of Artificial Intelligence. It is being compiled by students and investigators at several research facilities across the nation. The scope of the work is broad: Two hundred articles cover all of the important ideas, techniques, and systems developed during 20 years of research in Al. Each article, roughly four pages long, is a description written for non-AI specialists and students of AI. Additional articles serve as Overviews, which discuss the various approaches within a subfield, the issues, and the problems. There is no comparable resource for AI researchers and other scientists who need access to descriptions of AI techniques like problem solving or parsing. The research literature in AI is not generally accessible to outsiders. And the elementary textbooks are not nearly broad enough in scope to be useful to a scientist working primarily in another discipline who wants to do something requiring knowledge of AI. Furthermore, we feel that some of the Overview articles are the best critical discussions available anywhere of activity in the field. To indicate the scope of the Handbook, we have included an outline of the articles as an appendix to this report (see page 303). B. Medical Relevance and Collaboration The AI Handbook Project was undertaken as a core activity by SUMEX in the spirit of community building that is the fundamental concern of the facility. We feel that the organization and propagation of this kind of information to the AIM community, as well as to other fields where AI is being applied, is a valuable service that we are uniquely qualified to support. ‘ C. Progress Summary Because our objective is to develop a comprehensive and up-to-date survey of the field, our article-writing procedure is suitably involved. First drafts of Articles are reviewed by the staff and returned to the author (either an AI scientist or a student in the area). His final draft is then incorporated into a Chapter, which when completed is sent out for review to one or two experts in that particular area, to check for mistakes and omissions. After corrections and comments from our reviewers are 99 E. A. Feigenbaum AI Handbook Project P41 RROO785-08 incorporated by the staff, the manuscript is edited, and a final computer- prepared, photo-ready copy of the Chapter is generated. We expect the Handbook to reach a size of approximately 1000 pages. Roughly two-thirds of this material will constitute Volumes I and II of the Handbook. The material in Volumes I and II will cover AI research in Heuristic Search, Representation of Knowledge, AI Programming Languages, Natural Language Understanding, Speech Understanding, Automatic Programming, and Applications-oriented AI Research in Science, Mathematics, Medicine, and Education. Researchers at Stanford University, Rutgers University, SRI International, Xerox PARC, RAND Corporation, MIT, USC-ISI, Yale, and Carnegie-Mellon University have contributed material to the project. D. List of Relevant Publications Many of the chapters of Volumes I and II of the AI Handbook have already appeared in preliminary form as Stanford Computer Science Technical Reports, authored by the respective chapter-editors. References follow. Other chapters of Volumes II and III will appear as Technical Reports in the summer and fall of 1981. HPP-79-12 (STAN-CS-79-726) Ann Gardner. Search. HPP-79-17 (STAN-CS-79-749) William Clancey, James Bennett, and Paul Cohen. Applications-oriented AI Research: Education. HPP-79-21 (STAN-CS-79-754) Anne Gardner, James Davidson, and Terry Winograd. Natural Language Understanding. HPP-79-22 (STAN-CS-79-756) James S. Bennett, Bruce G. Buchanan, and Paul R. Cohen. Applications-oriented AI Research: Science and Mathematics. HPP-79-23 (STAN-CS-79-757) Victor Ciesielski, James S. Bennett, and Paul R. Cohen. Applications-oriented AI Research: Medicine. HPP-79-24 (STAN-CS-79-758) Robert Elschlager and Jorge Phillips. Automatic Programming. HPP-80-3 (STAN-CS-80-793) Avron Barr and James Davidson. Representation of Knowledge. FE. Funding Support Status The Handbook Project is partially supported under the Heuristic Programming Project contract with the Advance Research Projects Agency of the DOD, contract number MDA 903-77-C-0322, E. A. Feigenbaum, Principal Investigator and under the core research activities of the SUMEX-AIM resource. E. A. Feigenbaum 100 P41 RROO785-08 AI Handbook Project Il, INTERACTIONS WITH SUMEX-AIM RESOURCE A. Collaborations and Medical Use of Programs via SUMEX We have had a modest level of collaboration with a group of students and staff at the Rutgers resource, as well as occasional collaboration with individuals at other ARPA net sites. B. Sharing and Interactions with Other SUMEX-AIM Projects As described above, we have had moderate levels of interaction with other members of the SUMEX-AIM community, in the form of writing and reviewing Handbook material. During the development of this material, limited arrangements have been made for sharing the emerging text. As final manuscripts are produced, they will be made available to the SUMEX- AIM community both as on-line files and in the hardcopy, published edition. C. Critique of Resource Management Our requests of the SUMEX management and systems staff, requests for additional file space, directories, systems support, or program changes, have been answered promptly, courteously and competently, on every occasion. III. RESEARCH PLANS (8/80 - 7/83) A. Long Range Project Goals The following is the schedule for completion and publication of the AI Handbook: May, 1981: Publication of Volume 1 by publisher (Wm. Kaufmann Inc., Los Altos, Ca.) August, 1981: Submission of final copy to publisher for Volume II (publication by end of 1981). August-September, 1981: Completion of Technical Reports containing chapters of Handbook October, 1981: Submission of final copy of Volume III to publisher (for publication first quarter 1982) (note: Volume I has been selected by the Library of Computer Science as their August,1981 book club selection) 101 E. A. Feigenbaum AI Handbook Project P41 RROO785-08 B. Justifications and Requirements for Continued SUMEX Use The AI Handbook Project is a good example of community collaboration using the SUMEX-AIM communication facilities to prepare, review, and disseminate this reference work on AI techniques. The Handbook articles currently exist as computer files at the SUMEX facility. All of our authors and reviewers have access to these files via the network facilities and use the document-editing and formatting programs available at SUMEX. This relatively small investment of resources will result in what we feel will be a seminal publication in the field of AI, of particular value to researchers, like those in the AIM community, who want quick access to AI ideas and techniques for application in other areas. C. Needs and Plans for Other Computational Resources We use document preparation programs at SUMEX and the Computer science Department's SCORE machine. We have used and will continue to use a Computer Science Department phototypesetting machine, the Alphatype, to produce the final copy of the AI Handbook. The phototypesetting software called TEX, developed at Stanford, is the vehicle for this production. D. Recommendations for Future Community and Resource Development None. E. A. Feigenbaum 102 P41 RROQO785-08 DENDRAL Project I1.A.1.3 DENDRAL Project The DENDRAL Project Resource-Related Research: Computers in Chemistry Prof. Carl Djerassi Department of Chemistry Stanford University I. SUMMARY OF RESEARCH PROGRAM The DENDRAL Project is a resource-related research project. The resource to which it is related is SUMEX-AIM, which provides DENDRAL its sole computational resource for program development and dissemination to the biomedical community. A. Project Rationale The DENDRAL project is concerned with the application of state-of- the-art computational techniques to several aspects of structural chemistry. The overalt goals of our research are to develop and apply computational techniques to the procedures of structural analysis of known and unknown organic compounds based on structural information obtained from physical and chemical methods and to place these techniques in the hands of a wide community of collaborators to help them solve questions of structure of important biomolecules. These techniques are embodied in interactive computer programs which place structural analysis under the complete control of the scientist working on his or her own structural problem. Thus, we stress the word assisted when we characterize our research effort as computer-assisted structure elucidation or analysis. Our principal objective is to extend our existing techniques for computer assistance in the representation and manipulation of chemical structures along two complementary, interdigitated lines. We are developing a comprehensive, interactive system to assist scientists in all phases of structural analysis (SASES, or Semi-Automated Structure Elucidation System) from data interpretation through structure generation to data prediction, This system will act as a computer-based laboratory in which complex structural questions can be posed and answered quickly, thereby conserving time and sample. In a complementary effort we are extending our techniques from the current emphasis on topological, or constitutional, representations of structure to detailed treatment of conformational and configurational stereochemical aspects of structure. By meeting our objectives we will fill in the "missing link” in computer assistance in structural analysis. Our capabilities for structural analysis based on the three-dimensional nature of molecules is an absolute necessity for relating structural characteristics of molecules to their observed biological, chemical or spectroscopic behavior. These capabilities will represent a quantum leap beyond our current techniques 103 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 and open new vistas in applications of our programs, both of which will attract new applications among a broad community of structural chemists and biochemists who will have access to our techniques. This access depends entirely on our access to and the continued availability of SUMEX-AIM. These issues are discussed in detail in the subsequent section, Interactions with the SUMEX-AIM Resource. The primary rationale for our research effort is that structure determination of unknown structures and the relationship of known structures to observed spectroscopic or biological activity are complex and time-consuming tasks. We know from past experience that computer programs can complement the biochemist's knowledge and reasoning power, thereby acting as valuable assistants in solving important biomedical problems. By meeting our objectives we feel strongly that our programs will become essential tools in the repertoire of techniques available to the structural biochemist. We are currently beginning the second year of our three year grant. This period represents a transition in the sense that we have pushed our research efforts in techniques for spectral interpretation, structure generation (e.g., CONGEN) and spectral prediction to their limits within the confines of topological representations of molecular structure. At this time, these techniques are perceived to be of significant utility in the scientific community as evidenced by our workshops, the demand for the exportable version of CONGEN and the number of persons requesting collaborative or guest access to our programs at Stanford (see Interactions with the SUMEX-AIM Resource). These existing techniques will, for some years to come, remain as important first steps in solving structural problems. However, in order to anticipate the future needs of the community for programs which are more generally applicable to biological structure problems and more easily accessible we must address squarely the limitations inherent in existing approaches and search for ways to solve them. Our major objectives are based on the following rationale. None of our techniques (or the techniques of any other investigators) for computer-assisted structure elucidation of unknown molecular structures make full use of stereochemical information. As existing programs were being developed this limitation was less important. The first step in many structure determinations is to establish the constitution of the structure, or the topological structure, and that is what CONGEN, for example, was designed to accomplish. However, most spectroscopic behavior and certainly most biological activities of molecules are due to their three-dimensional Nature. For example, some programs for prediction of the number of resonances observed in 13CMR spectra use the topological symmetry group of a molecule for prediction. However, in reality it is the symmetry group of the stereoisomer that must be used. This group reflects the usually lower symmetry of molecules possessing chiral centers and which generally exist in fewer than the total possible number of conformations. This will increase the number of carbon resonances observed over that predicted by the topological symmetry group alone. More generally, few of the techniques in the area of computer-assisted structure elucidation can be used in accurate prediction of structure/property relationships, whether the properties be spectral resonances or biological activities. E. A. Feigenbaum 104 P41 RROO785-08 DENDRAL Project A structure is not, in fact, considered to be established until its configuration, at ‘least, has been determined. Its conformational behavior may then be important to determine its spectroscopic or biological behavior, For these reasons we are emphasizing in our current grant period development of stereochemical extensions to CONGEN, our newly-developed structure generator, GENOA (see References 17, 18), and related programs such as the C-13 Nuclear Magnetic Resonance (NMR) programs (see References 15, 16), including machine representations and manipulations of configuration (see References 1, 10) and conformation (see Reference 19) and constrained generators for both aspects of stereochemistry (see References 6, 9, i1, 12). None of the existing techniques for computer-assisted structure elucidation of unknown molecules, excepting very recent developments in our own laboratory, are capable of structure generation based on inferred partial structures which may overlap to any extent. Such a capability is a critical element in a computer-based system, such as we propose, for automated inference of substructures and subsequent structure generation based on what is frequently highly redundant structural information including many overlapping part structures. Important elements of our research are concerned with further developments of such a capability for structure generation (the GENOA program, (see Reference 17)). Given the above tools for structure representation and generation, we can consider new interpretive and predictive techniques for relating spectroscopic data (or other properties) to molecular structure (see References 2, 3, 7, 8, 14, 15, 16). The capability for representation of stereochemistry is required for any comprehensive treatment of: 1) interpretation of spectroscopic data (see References 15, 16); 2) prediction of spectroscopic data (see References 15, 16); 3) induction of rules relating known molecular structures to observed chemical or biological properties (see Reference 19). These elements, taken together, will yield a general system for computer-aided structural analysis (the SASES system) with potential for applications far beyond the specific task of structure elucidation. Parallel to our program development we have embarked on a concerted effort to extend to the scientific community access to our programs, and critical parts of our research effort are devoted to methods for promoting this resource sharing. Our rationale for this effort is that the techniques must he readily accessible in order to be used, and that development of useful programs can only be accomplished by an extended period of testing and refinement based on results obtained in analysis of a variety of structural problems, analyzed by those scientists actively involved in solutions to those problems. Our efforts in this area are summarized in Section II.A, Scientific Collaboration and Program Dissemination). B. Medical Relevance and Collaboration The medical relevance of our research lies in the direct relationship between molecular structure and biological activity. The sciences of chemistry and biochemistry rest on a firm foundation of the past history of 105 E. A. Feigenbaum DENDRAL Project P41 RRO0785-08 well-characterized chemical structures. Indeed, structure elucidation of unknown compounds and the detailed investigation of stereochemical configurations and conformations of known compounds are absolutely essential steps in understanding the physiological role played by structures of demonstrated biological activity. Our research is focussed on providing computational assistance in several areas of structural chemistry and biochemistry, with primary attention directed to those aspects of the problem which are most difficult to solve by strictly manual methods. These aspects include exhaustive and irredundant generation of constitutional isomers, and configurational and conformational stereoisomers under chemical, biological and spectroscopic constraints with a guarantee that no plausible stereoisomer has been overlooked. Although our programs can be applied to a variety of structural problems, in fact most applications by our group and by our collaborators are in the area of natural products, antibiotics, pheremones and other biomolecules which play important biochemical roles. In discussions of collaborative investigations involved with actual applications of our programs we have always stressed the importance of strong links between the structures under investigation and the importance of such structures to health-related research. This emphasis can be seen by examination of the affiliations of current DENDRAL-related investigators and the brief description of current collaborative efforts in Interactions with the SUMEX-AIM Resource. C. Highlights of Research Progress In this section we discuss briefly some major highlights of the past year and research currently in progress. 1. Past Year 1.1 Exportable version of the CONGEN program for computer- assisted structure elucidation. CONGEN is an interactive computer program whose task is to provide to the structural biochemist all chemical structures which are possible candidates for the structure of an unknown chemical compound. Based on this information, experiments can be designed to pinpoint the correct structure, thereby facilitating rapid and unambiguous identification of novel, bioactive chemicals. During the past two years we have completed an exportable version of the CONGEN program and have exported it to a variety uf structural analysis laboratories in academic, private and industrial research organizations. CONGEN is being utilized at Stanford and at export sites in the hands of investigators who use it as a tool in solving their own structural problems. We have been exporting versions of CONGEN for about 18 months. The program has been used as an aid in the solution of many new structures and recent results have formed the basis for at least eight formal lectures by users of CONGEN at remote sites. 1.2 Version I of the GENOA program for structure generation with overlapping atoms. GENOA (see Reference 17) is an outgrowth of CONGEN whose purpose is to suggest candidate structures for an unknown based on redundant and ambiguous structural inferences. This program, which E. A. Feigenbaum 106 P41 RROO785-08 DENDRAL Project utilizes CONGEN as an integral part of the computational procedures, is far simpler to use by the practicing biochemist. This results from GENOA's capability to construct structures based on substructural information obtained from a variety of spectroscopic, chemical and biochemical techniques. The program itself considers the structural implications of each new piece of structural data and automatically ensures that all overlaps are considered, thereby freeing the investigator from concerns about the potential for overlapping, or redundant substructural information. In addition, GENOA is the ideal tool for interfacing to automated procedures for spectral interpretation (see References 14, 15), because the necessity for manual intervention in the assignment of substructures is no longer required as it was for CONGEN. 1.3 Programs for Interpretation and Prediction of Spectral Data. We are actively pursuing several novel approaches to the automated interpretation of spectral data, concentrating on carbon-13 magnetic resonance (CMR), proton magnetic resonance (PMR) and mass spectral (MS) data. These approaches utilize large data bases of correlations between substructural features of a molecule and spectral signatures of such features. Our approaches are unique in that: 1) we can incorporate stereochemical features of substructures into the data bases; and 2) we can use the same data bases for both interpretation and prediction of data. We have recently reported several new developments in the area of analysis of mass spectral data, including methods for mass spectral data interpretation (see Reference 14} and mass spectrum prediction (see References 3, 7, 8). For either interpretation or prediction of magnetic resonance data, stereochemical substructure descriptors are absolutely essential. Resonance positions are a strong function of the local environment of a resonating atom, including position in space relative to other neighboring atoms. Descriptors which include the three dimensional relationships among atoms in a substructure are required in order to obtain meaningful correlations. We have recently completed the first phases of development of the data base and associated interpretation and prediction programs for C-13 NMR data (see References 15, 16). This approach uses a structure and substructure representation which incorporates configurational stereochemistry (see Reference 16). Such data bases can be used to interpret spectral data to obtain substructures to be used in CONGEN and GENOA, the structure generating programs (see References 15, 17). Continued automation of this aspect of structure elucidation will significantly ease the burden on the structural biochemist because the computer-based files are much more comprehensive and easier to use than correlation tables or diffuse literature sources. The same data bases can be used to predict spectral signatures in the context of a set of complete molecular structures. Comparison of predicted and observed spectra allows a rank-ordering of candidates and will be very useful in directing the attention of the investigator to the most plausible alternatives (see References 7, 8, 15). 107 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 1.4 Constrained generation of configurational stereoisomers. During the previous grant period we solved the problem of computer generation of configurational stereoisomers. These are isomeric chemical structures that differ from one another in the arrangement of atoms in three-dimensional space. We have developed this method further, including now the capability for construction of all possible stereoisomers under stereochemical constraints (see Reference 9). Previously, CONGEN and GENOA were capable only of generation of constitutional isomers which convey no information about the structure in three dimensions. The interaction of biomolecules with biochemical systems is based on their three dimensional nature, not simply their constitution. Therefore, these new developments are crucial to use of computational techniques in structural studies. Now, for the first time, a computer program can be used to begin with the molecular formula of an unknown compound and using constraints on both molecular connectivity and configuration arrive at a set of structural alternatives which include potential stereochemical variability. This capability allows use of spectral data whose interpretation depends strongly on stereochemical features of molecules. Most importantly, it gives us a structural representation and methods for structure generation and manipulation which represent the foundations for future developments of the one important remaining aspect of structural analysis, treatment of molecular conformations. 2. Research in Progress The following are some highlights of research in progress. The common theme of these studies is representation of stereochemistry and use of stereochemical information in answering questions concerning the nature of known or unknown molecular structures. 2.1 Development of GENOA and STRCHK. GENOA can now deal with representations of configurational stereochemistry, although it does not make active use of such representations in generating constitutional isomers. The STRCHK (for Structure Checking) program represents the next stage in development of the post-generation analysis programs. STRCHK provides the entry point into the STEREO program for constrained generation of stereoisomers (see Reference 9) from constitutional isomers generated by either CONGEN or GENOA. In the case of GENOA, stereochemical information is passed to STRCHK where it can be used in STEREO. In addition, the mass and C-13 spectrum prediction and ranking programs (see References 7, 8, 14, 15) are available from STRCHK, together with several other utility programs for examining structural candidates. Both programs will be developed further, to the point where export to other computer facilities, as was done with CONGEN, will be possible. 2.2 Development of the C-13 data base and interpretive program. We plan further expansion of the C-13 NMR data base, using data obtained by us from the literature and supplied by others in collaborative efforts. Eventually we would like to pass this work on to an organization better equipped to build and maintain data bases. For the time being, however, our work is sufficiently experimental that we will maintain responsibility for the data base. The C-13 spectrum interpretation program will continue E. A. Feigenbaum 108 P41 RROO785-08 DENDRAL Project to be developed, as we attempt to make the program more “intelligent” chemically, 2.3 Representation and manipulation of conformational stereochemistry. The next year will see intensive efforts to develop programs for representation of molecular conformations. Preliminary work has ted to an algorithm for representation and enumeration of conformations, and to a method for searching for common three-dimensional substructures in a set of structures (see Reference 19). The former study will first be directed to ward a program for representation of substructures with conformation designations. This will lead directly to a method for development of a data base and prediction program for any spectroscopic technique, such as proton NMR, where the spectral signatures are strongly influenced by molecular configurations. Subsequently, a program for generation and, eventually, constrained generation of molecular conformations, will be developed. Parallel to this work, the program for searching for three-dimensional common substructures, a problem important in structure/biological activity correlations, will be developed and tested extensively on previous studies presented in the literature. This work (see Reference 19) is based to an extent on similar work carried out for constitutional representations of structure (see Reference 5). D. List of Recent Publications (1) J.G. Nourse, R.E. Carhart, D.H. Smith, and C. Djerassi, “Exhaustive Generation of Stereoisomers for Structure Elucidation," J. Am, Chem. Soc., 101, 1216 (1979). (2) C. Djerassi, D.H. Smith, and T.H. Varkony, "A Novel Role of Computers in the Natural Products Field," Naturwiss., 66, 9 (1979). (3) N.A.B. Gray, D.H. Smith, T.H. Varkony, R.E. Carhart and B.G. Buchanan, "Use of a Computer to Identify Unknown Compounds. The Automation of Scientific Inference," Chapter 7 in "Biomedical Applications of Mass Spectrometry, First Supplementary Volume," G.R. Waller and O.C. Dermer, Eds., John Wiley and Sons, Inc., New York, 1980, p. 125. (4) T.C. Rindfleisch, D.H. Smith, W.J. Yeager, M.W. Achenbach, and A. Wegmann, "Mass Spectrometer Data Acquisition and Processing Systems," in Chapter 3 of "Biomedical Applications of Mass Spectrometry, First Supplementary Volume," G.R. Waller and 0.C. Dermer, Eds., John Wiley and Sons, Inc., New York, 1980, p. 55. (5) T.H. Varkony, Y. Shiloach, and D.H. Smith, "Computer-Assisted Examination of Chemicat Compounds for Structural Similarities," J. Chem, Inf. Comp. Sci., 19, 104 (1979). (6) J.G. Nourse and D.H. Smith, "Nonnumerical Mathematical Methods in the Problem of Stereoisomer Generation,” Match, (No. 6), 259 (1979). (7) N.A.B. Gray, R.E. Carhart, A. Lavanchy, D.H. Smith, T. Varkony, B.G. Buchanan, W.C. White, and L. Creary, “Computerized Mass Spectrum Prediction and Ranking,” Anal. Chem., 52 1095 (1980). 109 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 (8) A. Lavanchy, T. Varkony, D.H. Smith, N.A.B. Gray, W.C. White, R.E. Carhart, B.G. Buchanan, and C. Djerassi, "Rule-Based Mass Spectrum Prediction and Ranking: Applications to Structure Elucidation of Novel Marine Sterols,” Org. Mass Spectrom., 15 355 (1980). (9) J.G. Nourse, D.H. Smith, and C. Djerassi, "“Computer-Assisted Elucidation of Molecular Structure with Stereochemistry," J. Am. Chem. Soc., 102, 6289 (1980). (10) J.G. Nourse, "Applications of Artificial Intelligence for Chemical Inference. 28. The Configuration Symmetry Group and Its Application to Stereoisomer Generation, Specification, and Enumeration.", J. Amer. Chem. Soc., 101, 1210, (1979). : (11) J.G. Nourse, “Application of the Permutation Group to Stereoisomer Generation for Computer Assisted Structure Elucidation.", in "The Permutation Group in Physics and Chemistry", Lecture Notes in Chemistry, Vol. 12, Springer-Verlag, New York, (1979), p. 19. (12) J.G. Nourse, “Applications of the Permutation Group in Dynamic Stereochemistry" in "The Permutation Group in Physics and Chemistry”, Lecture Notes in Chemistry, Vol. 12, Springer-Verlag, New York, (1979), p. 28. (13) J.G. Nourse, "Selfinverse and Nonselfinverse Degenerate Isomerizations,” J. Am. Chem. Soc., in press (1980). (14) N.A.B. Gray, A. Buchs, D.H. Smith, and C. Djerassi, "Computer-Assisted Structural Interpretation of Mass Spectral Data," Helv. Chim. Acta, in press (1981). (15) N.A.B. Gray, C.W. Crandell, J.G. Nourse, D.H. Smith, and C. Djerassi, "Computer-Assisted Interpretation of C-13 Spectral Data,” J. Org. Chem., 46 703 (1981). (16) N.A.B. Gray, J.G. Nourse, C.W. Crandell, D.H. Smith, and C. Djerassi, "Stereochemical Substructure Codes for C-13 Spectral Analysis,” Org. Magn. Res., 15, 375 (1981). (17) R.E. Carhart, D.H. Smith, N.A.B. Gray, J.G. Nourse, and C. Djerassi, "GENOA: A Computer Program for Structure Elucidation Based on Overlapping and Alternative Substructures," j. Org. Chem, 46, 1708 (1981). (18) D.H. Smith, N.A.B. Gray, J.G. Nourse, and C.W. Crandel!, "The DENDRAL PROJECT: Recent Advances in Computer Assisted Structure Elucidation,” Anal. Chim. Acta, Computer Techniques and Optimization, in press, (1981). E. A. Feigenbaum 110 P41 RROQO785-08 DENDRAL Project (19) D.H. Smith, J.G. Nourse, and C.W. Crandell, "Computer Techniques for Representation of Three-Dimensional Substructures and Exploration of Potential Pharmacophores," Proceedings of a Chemical Industries Institute of Technology Symposium on "Structure Activity Correlation as a Predictive Tool in Toxicology,” Feb. 10-12, 1981, Raleigh, NC, in press. E. Funding Support Title: RESOURCE RELATED RESEARCH: COMPUTERS IN CHEMISTRY (grant) Principal Investigator: Car? Djerassi, Professor of Chemistry, Department of Chemistry, Stanford University Dennis H. Smith (Associate Investigator), Senior Research Associate, Department of Chemistry, Stanford University Funding Agency: Biotechnology Resources Program, Division of Research Resources, National Institutes of Health Grant Identification Number: RR-00612-12 Total Award and Period: Total - 5/1/80 - 4/30/83 ~-------- $641,419 Current Award and Period: Current - 5/1/81 - 4/30/82 -------- $237,387 Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE In the coming period of our research, our computational approaches to structural biochemistry will become much more general and we plan wide dissemination of the programs resulting from our work. These more general approaches to aids for the structural biochemist will yield computer programs with much wider applicability than, for example, the existing CONGEN, GENOA, STEREO and STRCHK programs. We expect that this will create a significant increase in requests for access to our programs, placing heavy emphasis on our retationship with SUMEX to provide this access (see Justification and Requirements for Continued SUMEX Use for additional details). 111 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 For these reasons, in our current grant period the SUMEX-AIM resource is identified as the resource to which our research is related. The SUMEX- AIM resource has provided the computational basis for our past program developments and for initial exposure of the scientific community to these programs. The resource is, however, funded completely separately from our own research; we are only one of a nationwide community of users of the SUMEX-AIM facility. Our relationship to SUMEX is one which goes far beyond mere consumption of cycles on the SUMEX machine. [It has been the goal of. the SUMEX project to provide a computational resource for research in symbolic computational procedures applied to health-related problems. As such research matures, it produces results, among which are computer programs, of potential utility to a broad community of scientists. A second goal of SUMEX has been to promote dissemination of useful results to that community, in part by providing network access to programs running on the SUMEX-AIM facility during their development phases. SUMEX does not, however, have the capacity to support extensive operational use of such programs. It was expected from the beginning that user projects would develop alternative computing resources as operational demands for their programs grew. Such a state has been reached for the CONGEN, GENOA, STEREO and STRCHK programs and future developments in the DENDRAL Project to yield more generally useful programs will simply magnify the problem, We will, therefore, under our relationship with SUMEX-AIM, participate as before in the SUMEX-AIM community in sharing methods and results with other groups during development of new programs. In addition, we plan to utilize the small machines requested as part of the SUMEX renewal. Our project will benefit by being able to provide more extensive operational access to our existing and developing programs using these machines, and to provide a test environment for adapting our programs to a more realistic laboratory computing environment than the special-purpose SUMEX resource (see Justification and Requirements for Continued SUMEX Use for additional information). SUMEX will benefit by moving a substantial part of the DENDRAL production load to more cost-effective systems, thereby freeing the SUMEX resource for new program development, Collaborators who wish to use existing programs for specific problems would access SUMEX via the network as before, but now would be routed to new machines. New program developments will be carried out on SUMEX itself, taking advantage of the much more extensive repertoire of peripheral devices, languages, debugging tools and text editors, i.e., precisely the tasks for which that system was designed. Our proposed relationship to SUMEX-AIM has important implications beyond the practical considerations mentioned above. There is a significant research component to our proposal to make small machines as integral part of the resource sharing aspects of our relationship to SUMEX, The DENDRAL project is one of the first of the SUMEX-AIM projects to have developed sufficient maturity to require additional computer facilities to support production use and to facilitate export of its programs to be applied to real-world, biomedical structural problems. In a sense, then, we will be acting in a pathfinding role for the rest of the SUMEX-AIM community as other projects reach maturity and seek realistic mechanisms for dissemination of their software to meet the computational needs of their collaborators. Cooperating with SUMEX in the use of small machines, E. A. Feigenbaum 112 P4i RROO785-08 DENDRAL Project implementing new software, regulating access to divert development and applications to the appropriate machine are all experiments which we are willing to undertake together with SUMEX, knowing that we will be providing direction to future efforts atong similar lines. We will also be in a pathfinding role for a large segment of the biochemical community involved in computing, as we explore the utility of machines which will be much more widely available in Department and laboratory environments than DEC-10's and -20's. There are currently very few widely available computing resources which provide access to symbolic, problem solving programs operating in an interactive environment, We would be able to fulfill that need to the extent that applications have direct biomedical relevance, to the limits of our share of the SUMEX-AIM computing resource. A. Scientific Collaboration and Program Dissemination Scientific Collaborations: The following is a brief description of collaborative efforts that have been taking place or will soon commence in the use of DENDRAL programs for various aspects of structural analysis. 1) Ors. Larry Anderson and Elliott Organick, Depts. of Fuels Engineering and Computer Science, University of Utah. Dr. Anderson's research is in establishing the structure of coal and related polymers via various thermal and chemical degradation schemes. The degradation products are of interest to both energy and environmental studies. Professor Organick is responsible in part for the computer and graphics facility on which CONGEN and related programs can be run. We are exploring with them structure representations based on the Superatom concept in CONGEN as a means of representing families of structures. Access to our programs is primarily via the computer facility at Utah. 2) Dr. Raymond Carhart, Lederle Laboratories. Dr. Carhart (a former member of our group) is engaged in research concerned with computer applications to structure/activity relationships. Program development is done jointly between Lederle and Stanford with free exchange of software. Lederle applications are carried out on their own computer facility. 3) Dr. Janet Finer-Moore, University of Georgia. Dr. Finer-Moore is engaged in structure analysis of alkaloids in Dr. Peletier's group at Georgia. This research makes extensive use of 13C NMR. Our collaboration involves the development and application of our 13C interpretive and predictive programs in structure elucidation of new compounds based on an extensive set of 13C data available on closely related compounds. Access is via network to our programs at Stanford. We have just completed the draft of a manuscript as a result of this collaboration. (Dr. Finer-Moore has recently moved to the University of California, San Francisco.) 113 E. A. Feigenbaum DENDRAL Project P41 RRO0785-08 4) Dr. Brenda Kimble, University of California, Davis. Dr. Kimble's research is in structural analysis of compounds which are present in trace amounts in environmental milieus and which show mutagenic activity. Many of these compounds are largely aromatic. We are developing the capabilities of our programs to deal efficiently with large, polynuclear aromatic compounds. Access to our programs is via network to Stanford. 5) Or. Fred McLafferty, Cornel] University. Dr. McLafferty's research is involved with instrumental and analytical aspects of mass spectrometry. We are working with him on the development and application of an interface between his STIRS system and CONGEN/GENOA for structure determination based on mass spectral data. Part of this collaboration is development of IBM versions of some of our programs. Access is in part to Stanford, shifting primarily to Cornell as development proceeds. 6) Dr. David Cowburn, The Rockefeller University Dr. Cowburn's research is in the area of conformational analysis, primarily of peptides. We are working with him on the development and application of our programs for generation of molecular conformations. Dr. Cowburn's works with large ring peptides which represent a significant challenge for a conformation generator. His participation will help assure an eventual program of practical use rather than just theoretical interest. Collaboration will be via network access to our programs at Stanford. 7) Or. Gilda Loew, SRI International and The Rockefeller University. Dr. Loew's research is in the area of quantitative structure/activity relationships, using primarily the methods of quantum mechanical calculations. We are working with her to interface our conformational generator to her coordinate-based calculation methods. Collaboration is carried out via accounts at Stanford with concurrent development of her programs on a VAX facility (NASA Ames Research Center). 8) Or. D.C. Rohrer, Medical Foundation of Buffalo Research Laboratories, Buffalo, New York, We have initiated a collaboration with Dr. Rohrer on the problem of finding the common 3-dimensional substructural features of a set of chemical structures. The use of such a program would be to postulate substructural features which are responsible for similar biological or spectral properties. The initial approach is similar to that used successfully to find the greatest common subgraph of a set of constitutional structures. Collaboration will be via network access to Stanford. E. A. Feigenbaum 114 P41 RROO785-08 DENDRAL Project 9) Dr. J.N. Shoolery, Varian Associates, Palo Alto We are collaborating with Dr. Shoolery and others at Varian to obtain high quality C-13 spectra of several marine sterols available only in very small quantities. This is being done as part of our ongoing project to develop programs which are capable of spectral interpretation and prediction. The Varian people access our programs directly or via network. Program Dissemination: We have provided access to our programs to a community of collaborators via 1) distribution of the CONGEN program to other laboratories, and 2) guest or individual accounts on the SUMEX computer facility here at Stanford. These methods to promote the dissemination and use of our programs are elaborated below, followed by a brief description of some of our collaborations. a) Program Export The past two years we have distributed CONGEN to a number of laboratories owning computers on which the exportable version can now execute. These currently include DEC PDP-10 and -20 systems operating under the TENEX, TOPS-10 and TOPS-20 operating systems, and more recently, the beginnings of a version for IBM systems. The following persons are currently running CONGEN on their own laboratory computers: Dr. Larry Anderson - University of Utah (work described in section on collaborations) Dr. Hartmut Braun - Organische-Chemisches Institut der Universitat Zurich, Switzerland A former member of Prof. Wipke's group at UC Santa Cruz. He has only recently installed the program at ETH, Zurich. Dr. Raymond Carhart - Lederle Laboratories (work described in section on collaborations) Dr. Roy Carrington - Shell Biosciences Laboratory, England Dr. Carrington has used the program both as a guest user and recently in export. He has given presentations on the use of CONGEN and has applied the program to the structure determination of a new acidic amino acid, 2,4-methanoglutamic acid, and other compounds from plant seeds. This work was done in collaboration with Prof. Jon Clardy at Cornell who is also a guest user. Dr. Robert Carter - University of Lund, Sweden Dr. Carter obtained a version of the program for use of several groups at Universities in Sweden. Dr. Daniel Chodosh ~ Smith, Kline & French Laboratories He has installed CONGEN and written an extensive users’ manual for the use of SKF chemists. 115 E. A. Feigenbaum DENDRAL Project E. A. Dr. Hen Dr. Dou De Phil Dr. Mar Dr. Car DOr. G. Dr. Fre Dr. Pet Dr. Jam Dr. Dav Dr. Jos Feigenbaum P41 RROG785-08 ry Dayringer - Monsanto Agricultural Products Co. He and Dr. Schwenzer (now at Gulf) were responsible for obtaining and installing CONGEN. Primary use is as an aid to structure elucidation. of photoproducts and metabolites of agricultural chemicals. glas Dorman - Lilly Research Labs Dr. Dorman has been one of our best users. He attended our 1978 workshop and has given several presentations on the use of CONGEN. He has used the program as an aid in solving a number of structures including some beta-lactam antibiotic derivatives. ip Ihrig - Amoco Standard Oi1 (Indiana) tin Huber ~- Ciba-Geigy, Switzerland Dr. Huber. is a former member of Prof. Wipke'’s group at UC Santa Cruz.- He has recently received the program and is currently working to interest his coworkers at Ciba in computer assisted structure elucidation. roll Johnson - Oak Ridge National Laboratory Dr. Johnson is a long time colleague who spent a year at Stanford in 1976. He is involved with the analytical group at Oak Ridge and is using the program as an analytical aid and as a model for programs he is developing. Jones - ICI Pharmaceuticals, England He has installed CONGEN and is currently evaluating its utility for use by analytical chemists at ICI. d W. McLafferty - Cornell University (work summarized under collaborations) er W. Milne - CSIRO Division of Computing Research, Australia He contacted us through his association with the Heuristic Programming Project at Stanford. He has acted as the Australian contact for distribution of CONGEN in that country. es Morrison - Latrobe University, Australia (see Milne, above) id Pensak - E.I. duPont de Nemours and Company (see EXODENDRAL account DUPONT, and workshop) eph SanFitippo - Rutgers University Dr. SanFillippo is using CONGEN in conjunction with his work on superoxide chemistry and in the evaluation of mass spectral data for environmental samples. 116 P4i RROO785-08 DENDRAL Project Dr. William Sieber - Sandoz, Ltd., Switzerland He has installed CONGEN for use by structural chemists at Sandoz. Currently they are evaluating its utility. Dr. M.D. Sutherland - University of Queensland, Australia (see Milne, above) Dr. R.O, Watts - Australian National University (see Milne, above) b) EXODENDRAL Account We reserve a special account on SUMEX for persons interested in access to our programs. Initially, this account was used for anyone desiring access, independent of expected level of use or eventual interest. As the SUMEX system became more heavily loaded a mechanism for guest access was provided and at that point we began to differentiate our users by level of interest. For those desiring merely to try programs we provide guest access (see page 119). If there is interest in continuing collaboration, EXODENDRAL status is given, which provides access to more system facilities and good file management capabilities. The persons who have been active under EXODENDRAL status this year are the following (with the account name followed by the contact person and association): Dr. Jean-Claude Braekman - Universite Libre de Bruxelles, Belgium He is a former post doctoral fellow in our group, and accesses CONGEN from Belgium for natural products structure elucidation, Dr. Hartmut Braun - Organische-Chemisches Institut der Universitat Zurich, Switzerland (see section on export) Dr. Roy Carrington - Shell Biosciences Laboratory, England (see section on export) Dr. David Cowburn - The Rockefeller University (see section on collaborations) Dr. Douglas Dorman - Lilly Research Laboratories (see section on export) 117 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 E. Dr. Andre Dreiding - Organische-Chemisches Institut der Universitat Zurich, Switzerland He has used CONGEN and STEREO extensively in structural studies. He has also worked closely with Braun (see section on export under Braun). Dr. Ear] Abrahamson - E.1I. duPont de Nemours and Company Dr. Abrahamson and 4 colleagues attended our 1980 workshop. They are attempting to integrate our program into their overall computer software system which includes a wide variety of programs for applications to chemical problems. Dr. Janet Finer-Moore - University of Georgia (see section on collaborations) Dr. Kenneth Gash - California State College at Dominguez Hills Dr. Steven Heller - Environmental Protection Agency We are continuing our work with the NIH/EPA Chemical Information System, through Heller, to attempt to find mechanisms for making CONGEN accessible through that system. Dr. Martin Huber ~ Ciba-Geigy, Switzerland (see section on export) Dr. Peter W. Milne - CSIRO Division of Computing Research, Australia (see section on export) Dr. Henry Dayringer - Monsanto Company (see section on export) Dr. Mark Wood - Rutgers University Dr. Raymond Carhart - Lederle Laboratories (see section on collaborations) Dr. Douglas C. Rohrer - Medical Foundation of Buffalo (see section on collaborations) Dr. Jean Mathieu - Roussel UCLAF (see section on guest access under Delaroff) A. Feigenbaum 118 P41 RROQO785-08 DENDRAL Project Dr. William Sieber - Sandoz Ltd., Switzerland (see section on export) Dr. James Shoolery - Varian Associates (see section on collaborations) c) GUEST Access We have provided GUEST access to our programs for those persons desiring occasional access to study a structural problem and for those who wish a "hands-on" introduction to the programs. Persons who have received information about this method of access are Visted below (and the names of those who have actually logged in as guests are preceded with an asterisk): *Dr. Robert Adamski - Alcon Labs *Dr. A. Bothner-by - Carnegie Metlon University Dr. Bothner-by has requested access to aid others in the Chemistry Department with structure elucidation work. *Or. Reimar Bruening - Institut fur Pharmazeutische Arzneimittellehre der Universitat, West Germany Dr. Bruening has used the program to aid in his solution of the structure of the alkaloid Cassine. He was a participant in our 1978 workshop and has maintained interest since then. He has given at least one presentation in Germany on our programs. *Dr. William Brugger -. International Flavors and Fragrances Dr. Brugger is interested in eventually obtaining CONGEN for use at IFF in natural products structure elucidation, *Dr. Robert Carter - University of Lund, Sweden (see section on export) *Dr. Francois Choplin - Institut Le Bel, France *Dr. Jon Clardy - Cornell University He has used CONGEN on occasion to determine the potential structural variety for an unknown prior to obtaining the X-ray crystal structure. Dr. Brian Coleman - Koninklijke/Shell-Laboratorium, Holland *Dr. Mike Crocco - American Hoechst Corp. 119 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 *DOr. *Dr. *Dr. *DOr. “Or. Dr. *Ms. Dr. *Dr. Dr. Dr. Dr. Dr. Dr. *Dr. V. Delaroff - Rousset UCLAF, France Dr. Delaroff attended our 1980 workshop. He is in charge of a spectroscopic team which checks structures and suggests structures for unknown compounds with important biological activities. They have been using our programs by remote access to aid these investigations. Dan Dolata - University of California at Santa Cruz He is one of our contacts with Prof. Wipke's group at UC Santa Cruz. Bruno Frei - Laboratorium f. Organische Chemie, Switzerland Y. Gopichand - University of Oklahoma He has worked with Prof. Schmitz on the solution of several Structures of various marine natural products. John Gordon - Kent State University Dr. Gordon has been using CONGEN while working at Chemical Abstracts in Columbus, Ohio. He has been using CONGEN to investigate general issues of structure representation. Peter Gund - Merck, Sharpe and Dolme Research Labs Wendy Harrison - University of Hawaii at Manoa Ms. Harrison is a student with Prof. Scheuer at Hawaii. She attended our 1978 workshop and has used the programs occasionally as an aid to structure determination in marine chemistry. J. Hartenstein - Goedecke Co., Germany Richard Hogue - University of California at Santa Cruz He is another contact with Prof. Wipke's group. H. Honig - Institut fur Organische-Chemie u. Organisch-Chemische Technologie, Austria Kenneth Houk - Louisiana State University Dr. Houk has recently moved to Pittsburgh where he hopes to develop closer contact with our group. H. Kating - Institut fur Pharmazeutische Biologie, Germany Brenda Kimble - University of California at Davis (see section on collaborations) Sydell Lewis - University of California at Berkeley David Lynn - University of Virginia Dr. Lynn attended our 1978 workshop when he was working with Prof. Nakanishi at Columbia. E. A. Feigenbaum 120 P41 RROO785-08 DENDRAL Project *DOr. *Dr, *Dr. Dr. Dr. *Ms. Dr. Dr. *Or. “Dr. *Or. Dr. Dr. Dr. *DOr. Dr. *Or. *Dr. In Ki Mun - Cornell University (see section on collaborations under McLafferty) Koji Nakanishi - Columbia University We have worked with him and his students (see Lynn) on structures of several synthetic and natural products. Suba Neir - Washington University, St. Louis Dr. Neir used the program to aid in determination of the structure of a mutagen. A. Neszmelyi - Central Research Institute for Chemistry of the Hungarian Academy of Sciences A.C. Oehtschlager - Simon Fraser University, Canada Connie Oshirio - Lawrence Berkeley Labs J.R. Jocelyn Pare - The J.R.J. Pare Establishment for Chemistry Ltd., Canada James M, Perry - Worcester Polytechnic Institute, Massachusetts Philip Pfeffer - USDA (Philadelphia) Ned Phillips - University of Florida J.D. Roberts - California Institute of Technology Robert Santini - Purdue University Norm Stemple - Alcon Labs Richard Teeter - Chevron Chemical Co. We have used CONGEN and the mass spectrum analysis programs to verify the structural assignment of an unknown compound. Babu Venkataraghavan - Lederle Laboratories (see section on collaborations) Stephen Wilson - Indiana University W.T. Wipke - University of California at Santa Cruz We have worked closely with Prof. Wipke's group for several years on problems of structure representation and manipulation in our complementary areas of computer applications in chemistry. Michael Zippel - Institut fur Biochemie Zentrale Arbeitsgruppe Spectroskopie, Germany Dr. Zippel used CONGEN to investigate the possible connection with their spectral search system, 121 E. A. Feigenbaum DENDRAL Project P41 RRO00785-08 d) Industrial Affiliates Program The high level of interest shown by industrial research laboratories in our. programs has always presented us with delicate questions about access to SUMEX-AIM. In the past we have granted access for trials of our programs under the conditions that access is necessarily limited and that the recording mechanisms of our programs be used to ensure that al? such trial use be in the public domain. As of April, 1980, we began solicitation of interested industrial organizations to participate in a DENDRAL Project Industrial Affiliates Program. As of May 1, 1981, we have six members. We intend to use this program as a means by which we can offer collaborations with our on-going research to industrial organizations separate from SUMEX-AIM. Although EXODENDRAL accounts to such organizations are used to facilitate cammunication and sharing of new programs and concepts of interest with the community as a whole, all significant and certainly all proprietary use of our programs will be carried out on their own computational facilities. e) Program License We are currently exploring the mechanism of program license to commercial firms as a method for dissemination of well-developed programs, for example CONGEN, This mechanism involves a negotiated agreement between a company and Stanford University for rights to access to and dissemination of identified computer programs. Currently, two companies are negotiating with Stanford. We see this mechanism as serving the function of technology transfer in a very realistic way. We do not, as a research project, have the charter or the resources to do what is essentially final engineering of a program and integration of the program into an existing, larger system. Such "value added” effort is crucial to broad acceptance of a computer- based method. In addition, a participating company would take on the burden of maintenance, documentation and training, freeing our personnel to pursue our research objectives and to bring experimental programs to the level of performance where they, too, can be disseminated by licenses. B. Interactions with Other SUMEX-AIM Projects We routinely collaborate with other projects on SUMEX most closely related to our own research. In particular, these collaborations have taken place with the CRYSALIS project, MOLGEN, SECS and have begun with Dr. Carroll Johnson at Oak Ridge. CRYSALIS is concerned with new approaches to the interpretation of X- ray crystallographic data. X-ray crystallography is another approach to molecular structure elucidation. One of our long-term interests is exploring ways in which CONGEN or GENOA generated structures might be used to guide the search of electron density maps. We are also communicating with Prof. Jon Clardy at Cornell on this problem. It is hoped that having narrowed down the structural possibilities for an unknown using physical and chemical data, the few remaining candidates can be used to guide interpretation of such maps. E. A. Feigenbaum 122 P41 RROO785-08 DENDRAL Project Most of the structural problems investigated by MOLGEN involve much larger molecules than the size normally investigated in DENDRAL research. Thus, structural representations involving higher levels of abstraction are of utility in MOLGEN, making our structure manipulation tasks quite different. However, many of the ways in which MOLGEN manipulates its structural representations drew on past experience in DENDRAL in developing algorithms to perform these manipulations. We collaborate frequently with the SECS project in a number of ways. Although our research efforts are in one sense directed toward opposite ends of work on chemical structures, SECS being devoted to synthesis, DENDRAL being devoted to analysis, the underlying problems of structural manipulation share many common aspects. We have exchanged software where possible, particularly in the area of chemical structure display. We have held several discussions in joint group meetings and at several symposia including the AIM Workshops on common problems, including substructure searching, canonical representations and representation and manipulation of stereochemistry. Persons visiting one laboratory often take the opportunity to visit the other. For example, recent visitors to both laboratories have included Prof. Andre Dreiding, Zurich, Dr. Martin Huber, Basel, and Prof. Robert Carter, Lund. Dr. Carroll Johnson has collaborated on the CRYSALIS project in the past. More recently he has taken an interest in the use of knowledge-based programs for certain problems in spectral data interpretation. For this reason he is exploring the AGE and EMYCIN systems as frameworks for his program structure, and is involved in discussions with DENDRAL to see where common areas of data interpretation can be identified so that he can draw on our experience and programs. This effort is just beginning at this time; we plan to meet early in May at Stanford to continue discussions. C. Critique of Resource Management The SUMEX-AIM environment, including hardware, system software and staff, has proven absolutely ideal for the development and dissemination of DENDRAL programs. The virtual memory operating system has greatly facilitated development of large programs. The emphasis on time-sharing and interactive programs has been essential to us in our development of interactive programs. Our experience with other computer facilities has only emphasized the importance of the SUMEX environment for real-world applications of our programs. To run CONGEN, for example, in a batch computing environment would make no sense whatever because the program (and our other, related programs) is successful in Targe part because an investigator can closely monitor and control the program as it works toward solution. We have no complaints whatsoever about the computing environment. We do have, however, significant problems with SUMEX-AIM capacity, both in available computer cycles and on-line file storage. In a sense DENDRAL suffers from its success. The rapid progress made during the last grant period and now continuing into the next period has led to development of many new programs as adjuncts to CONGEN and GENOA and at the same time has inspired many persons in the scientific community to request some form 123 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 of access to our programs. The net result is that it is often very difficult to carry on at the same time development and collaborations involving applications of our programs to structural problems due to high load average on the system. The current overcrowding we see on SUMEX creates two major problems for us in the conduct of our research. First, it diminishes productivity as many people compete for the resource; the "time-sharing syndrome" leads to idle, wasted time at the terminal waiting for trivial computations to be completed. Second, the siow response time of the system is an aggravation to an outside investigator who is anxiously trying to solve a structural problem. At some point even the most interested persons will give up, log off the computer and resort to manual methods where possible. We have taken many steps within our project to try to work around heavy use periods on SUMEX. Our group works a staggered schedule, both in terms of the actual hours worked each day and in terms of what days each week are worked. This results in some problems in intra-group communication, but fortunately the message and other communication systems of SUMEX help alleviate that situation. We try to run all demonstrations on the DEC-2020 to help ease the burden on the dual KI-10 system. We encourage our collaborators to avoid prime-time use of the system when possible. , For these reasons, we strongly support the planned augmentation of the SUMEX-AIM hardware. Any part of our computations which can be shifted to another machine will not only facilitate export of our software but will ease the load on the DEC-10s and make it easier to continue our research. Both will serve to make SUMEX more responsive and our productivity higher. III. RESEARCH PLANS A. Project Goals and Plans Current research efforts were described in highlight form in the first section, Summary of Research Program. In this section we discuss in outline form the major goals of our current grant period (5/1/80 - 4/30/83), with an indication of the progress made to date. Our goals include the following: 1) Develop SASES (Semi-Automated Structure Elucidation System) as a general system for computer aided structural analysis, utilizing stereochemical structural representations as the fundamental structural description. SASES will represent a computer-based "laboratory" for detailed exploration of structural questions on the computer. It will have as key components the following: A) Capabilities for interpretation of spectral data which, together with inferences from chemical or other data, would be used for determination of (possibly overlapping) substructures. We have made considerable progress in the areas of mass spectrometry (see References 3, 14) and C-13 NMR spectroscopy (see References 15, 18); E. A. Feigenbaum 124 P41 RROO785-08 DENDRAL Project B) The GENOA (structure Generation with Overlapping Atoms) program which will have the capability of exhaustive generation of (topological and stereochemical) structural candidates and include as an essential component the existing CONGEN program. We have developed Version I of GENOA for use by our collaborators (see Reference 17); C) Capabitities for prediction of spectral (and biological) properties to rank-order candidates on the basis of agreement between predicted and observed properties. Again, we have made considerable progress in mass (see References 3, 7, 8B) and C-13 NMR (see References 15, 16, 18) spectroscopy; 2) Develop the GENOA program and integrate it with CONGEN. GENOA will represent the heart of SASES for exploration of structures of unknown compounds, or configurations or conformations of known compounds. GENOA will be a completely general method for construction of structural candidates for an unknown based on redundant, overlapping substructural information, and it will include capabilities for generation of topological and stereochemical (see References 1, 6, 9, 10, 11) isomers; 3) Develop automated approaches to both interpretation and prediction of spectroscopic data, including but not limited to the following spectroscopic techniques: A) carbon-13 magnetic resonance (13CMR) (see References 15, 16, 18); B) proton magnetic resonance (1HMR); C) infrared spectroscopy (IR); D) mass spectrometry (MS) (see References 3, 7, 8); —) chiroptical methods including circular dichroism (CD), magnetic circular dichroism (MCD). The interpretive procedures will yield substructural information, including stereochemical features, which can be used to construct structural candidates using GENOA. We have illustrated this method in recent publications (see References 14, 18). The predictive procedures will be designed to provide approximate but rapid predictions of expected spectroscopic behavior of large numbers of structural candidates, including various conformers of particular structures. Such procedures can be used to rank-order candidates and/or conformers. The predictive procedures will also be designed to provide more detailed predictions of structure/property relationships for known or candidate structures in specific biological applications. These procedures have been illustrated in recent publications (see References 3, 7, 8, 15, 18). 125 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 4) Develop a constrained generator of stereoisomers, (see Reference 9) including: A) design and implement a complete.and irredundant generator of possible conformations for a given known, or a candidate for an unknown, structure; B) provide constraints for the conformation generator so. that proposed structures for a known or unknown compound possess only those features allowed by: i) intrinsic structural features such as ring closure and dynamics of the chemical structure; and ii) data sensitive to molecular conformations (e.g., MCD, NMR); C) integrate the stereochemical developments with the GENOA program as a final, comprehensive solution to the structure generation problem and allow for interface of the program with other methods dependent on atomic coordinates. 5) Promote applications of these new techniques to structural problems of a community of collaborators, including improved methods for structure elucidation and potential new biomedical applications, through resource sharing involving the following methods of access to our facilities and personnel; A) nationwide computer network access, via the SUMEX-AIM computer resource; B) exportable versions of programs to specific sites; C) workshops at Stanford to provide collaborators with access to existing and new developments in computer-assisted structure elucidation in an environment where complex questions of utility and application can be answered directly by our own scientific staff; D) interface to a commercially- available graphics terminal for structural input and output, at as low a cost as possible, so that chemists can draw or visualize structures more simply and intuitively than with our current, teletype-oriented interfaces. B. Justification and Requirements for Continued SUMEX Use In previous sections we discussed the relationship between the DENDRAL Project and SUMEX-AIM, methods for using SUMEX-AIM for dissemination of our programs to a broad community of structural chemists and biochemists and a critique of resource management. In this section we wish to emphasize certain factors which were not discussed earlier and to show how our future directions and interests are closely related to the proposed continuation and augmentation of the SUMEX-AIM resource. E. A. Feigenbaum 126 P41 RROO785-08 DENDRAL Project As resource-related research, DENDRAL is intimately tied to the SUMEX resource. Our involvement with SUMEX goes far beyond simple use of the facility. We use SUMEX as the focal point for a number of collaborative efforts, for export of our software and for the communication facilities essential to maintaining close contact with remote research groups working with us, SUMEX provides computational facilities for our workshops, where we bring outside investigators to Stanford to use new programs applied to real structural problems. We have already discussed in our critique the difficulties we have, in view of heavy SUMEX Toad, of maintaining both our research effort and the resource-sharing aspects of our project. In view of these factors and because SUMEX is our sole source of computational facilities, we took certain steps in our renewal proposal to attempt to alleviate our situation. Specifically, we requested a computer for our own project, a DEC VAX 11/780, to be linked to SUMEX via ETHERNET. This computer was meant to help offload some of the computational burden DENDRAL places on SUMEX, to provide a facility for production use of our programs by our collaborators and to represent a model for the type of low- cost, scientific computer available in the future to many investigators who could then run our programs in their own laboratories. Our request for the VAX was turned down with specific comments made that SUMEX facilities should be used to support development of new programs and to the extent possible, encourage preliminary production use of our programs by outside persons. In our opinion this view is somewhat shortsighted, because SUMEX is currently overloaded to the extent that even development is impeded. In addition, our current situation leaves no room for the computational burden created by some of our collaborators who need considerably more than "preliminary" access because they have no access to a computer suitable for running our programs. For these reasons, we strongly support the effort of SUMEX to acquire a VAX and other small machines in future years. Although we realize that such machines will have to be shared among the SUMEX-AIM community as a whole, the augmentation of the resource would go a significant way to meeting the computational requirements of our project and provide a variety of systems of potential use for future export of our programs. C. Needs and Plans for Other Computing Resources For several years now we have directed some attention toward alternative computing resources which could be used to support all "production" use of our programs, i.e., all applications designed to use the programs to solve real problems. Although this would have the severe disadvantage of separating our research effort from many of the applications, it has been our hope that emerging technology in networking would enable us to keep in reasonably close contact with another resource. Two resources have emerged as candidates for systems where our programs can be accessed and used in problem-solving. Unfortunately, neither has so far proven feasible for several reasons (mentioned below). At this time we cannot determine if the problems will be resolved. Until such time, we will remain completely dependent on SUMEX for all our computational needs. 127 E. A. Feigenbaum DENDRAL Project P41 RROO785-08 One alternative resource is the NIH/EPA Chemical Information System, For more than three years we have been working with them to obtain sufficient contract money to provide a version of CONGEN integrated into that system. The concept and the funds were approved but a contract has never been issued due to administrative problems at the EPA. Although there have been some developments recently, we still have no firm idea on when such a contract will be issued. If this effort is successful, then we can encourage persons who desire access to our programs to consider using the NIH/EPA system. A second alternative is the National Resource for Computation in Chemistry (NRCC). This Resource has recently had its funding terminated. We are now pursuing an alternative discussed previously, that of arranging license agreements with private industry for dissemination of our software. This will likely be the focus of our future efforts to disseminate programs to those researchers who merely wish to use them rather than work together with us in collaborative arrangements to develop more powerful programs. D. Recommendations for Future Resource and Community Development We have discussed previously our recommendation for the hardware augmentation, particularly with regards to purchase of smail machines to facilitate future export. We also have increasing need for more file storage on-line. This is a result of building large data bases as part of our research in spectral interpretation. For the time being we are working with experimental programs and small data bases. As time progresses, however, these data bases will grow rapidly as our group and a number of our collaborators add additional structures and associated spectral data. Another capability which is of increasing importance to our own work is access to low-cost graphics systems. Our programs will develop increasing dependence on graphics for visualization of three-dimensional molecular structures. Scientists desiring access to our programs will need a graphics terminal for optimum use of our systems. Currently available vector displays are simply too expensive for the average investigator. The emerging technology of low-cost raster display systems offers a more promising possibility. However, no currently available machine has the required capabilities for under $10,000, and this is an area where machines Tike the Alito hold more promise. SUMEX could perhaps initiate an effort to obtain a system which has the hardware necessary for frame-based display. Such a system allows rotation of three-dimensional objects in a way which permits visualization of the actual shape of the object. E. A. Feigenbaum 128 P41 RROO785-08 EXPEX Project IT.A.1.4 EXPEX Project EXPEX - Expert Explanation Project Edward H. Shortltiffe, M.D., Ph.D. Departments of Medicine and Computer Science Stanford University Michael R. Genesereth, Ph.D. Computer Science Department Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale EXPEX is a new Stanford project that joined the AIM community only a few months before this report was prepared. We therefore have little to report in the way of progress other than our background work that led to a recently funded proposal and the initiation of this new research. The major thrust of the work is the development of powerful representational schemes to facilitate knowledge acquisition and explanation. This includes not only the study of fundamental representational formalisms but also the encoding of various types of knowledge, such as causal information and user models. We believe that the productivity of basic computer science research tends to be heightened by experiments that deal with significant real world problem domains. Challenges drawn from chemistry, medicine, and molecular biology have introduced additional complexity to expert systems work at Stanford, but have simultaneously forced system developers to respond to pragmatic constraints and user demands that have had a significant impact on the basic AI techniques selected or developed. Thus, we believe that creative investigation into symbolic reasoning techniques is facilitated by working in real world settings where the application forces us to avoid oversimplification. The explanation portion of the research effort will therefore deal with a medical domain (endocrinology) and be undertaken on SUMEX, whereas the knowledge acquisition portion will deal with nonmedical topics and use other computing resources at Stanford. Our report here will only describe EXPEX, the research on expert medical explanation, but it should be understood that this is actually only one part of a coordinated effort to tie together research in knowledge acquisition and explanation through common representation techniques. B. Medical Relevance and Collaboration Our interest in explanation derives from the insights we gained in developing explanatory capabilities for the MYCIN system. In the case of MYCIN and its descendents, we have been able to generate intelligible explanations by taking advantage of its rule-based representation scheme. Rules can be translated into English for display to a user, and their 129 E. A. Feigenbaum EXPEX Project P41 RROO785-08 interactions can also be explicitly demonstrated. By adding mechanisms for understanding questions expressed in simple English, we were able to create an interactive system that allowed physicians to convince themselves that they agreed with the basis for the program's recommendations. The limitations of the explanations generated in this way have become increasingly obvious, however, and have led to improved characterization of the kinds of explanation capabilities that must be developed if clinical consultation systems are to be accepted by physicians. C. Hightights of Research Progress MYCIN's explanation capabilities were generalized in EMYCIN and thus became available for any EMYCIN consultation system. They were further modified and utilized in both TEIRESIAS and GUIDON. Although we had experienced problems using MYCIN's rules for certain kinds of explanations (e.g., control mechanisms that were sometimes encoded in rules, or algorithmic knowledge such as the mechanisms for drug selection), it was in the setting of GUIDON that the inadequacies of MYCIN's approach became most apparent. Consider, for example, a simple MYCIN rule such as: If: the patient is less than 8 years old Then: don't give tetracycline This rule is adequate for MYCIN's decision making task, and would be understood by most physicians if it were used in an explanation, but it is obvious to a casual observer that it contains a giant leap in logic. It is accordingly difficult for GUIDON to teach this rule to a novice medical student because the underlying pathophysiologic knowledge (i.e., that tetracycline is deposited in the developing bone and teeth of youngsters, weakening the former and disfiguring the latter) is not explicitly represented in MYCIN. Examples such as this one emphasize that a variety of knowledge forms are necessary if an intelligent system is to customize its explanations to the individual who is using the program. Underlying structural and causal relationships are generally required in addition to the high level judgmental rules that had contained almost all of the domain knowledge in MYCIN and the other EMYCIN systems. We therefore began to study in more detail the nature of the explanatory process, and were surprised to find that there are very few writers who have addressed the issues which now interest us. Perhaps the most relevant studies of explanation are in the education literature; several educators have tried to identify the characteristics of explanations which make individuals good teachers. These analyses are accordingly relevant to computer-aided instruction work, such as GUIDON, although issues of automation are not addressed explicitly. On the other hand, they seem less pertinent when applied to the "persuasiveness" of a justification offered by a scientist to a colleague. A weekly seminar group has been formed to discuss knowledge representation and to analyze the characteristics of good explanations. We have often kept our discussions separate from computer science issues, concentrating instead on the psychology of explanation and planning to E. A. Feigenbaum 130 P41 RROO785-08 EXPEX Project return eventually to consider ways in which our developing theory might be implemented in knowledge-based consultation systems. Although there are several subproblems, it was agreed that the problems of explanation can generally be divided into four categories: (1) modeling the knowledge of the system user; (2) selecting a response strategy; (3) modeling contextual information regarding the interaction; and (4) understanding the question. One goal of our new work, then, is to build an explanation system which explicitly addresses these topics. Modeling the User's Knowledge GUIDON and other ICAI systems have recognized the need to keep an internal model of the student, i.e., what he has shown he knows, what you have already told him, and perhaps a record of where his greatest weaknesses lie. Simitarly, it is clear than an expert human consultant customizes his explanations so that they can be understood by the person requesting the consultation (and are thereby maximally convincing). The expert starts with certain suppositions about his client's knowledge (e.g., a teacher may presume his student is starting from scratch, but a cardiologist will assume that another physician requesting advice probably already knows a fair amount of cardiology). The default presumption is modulated, however, as the interaction proceeds and the client demonstrates his strengths or weaknesses. We have recently begun some experiments to investigate methods for encoding, along with the domain knowledge, the comptexity and importance of that knowledge. These two parameters seem to be independently important in deciding whether to include a given reasoning step in an explanation. "Key" points (i.e., those that are highly important) probably should be mentioned even if they are not complex and are likely to be known to the user. On the other hand, less important but complex items probably need not be mentioned unless an expert user is really pressing for details of a decision pathway. Thus, static measures of complexity and importance can be compared with user descriptors that are initially assigned by default (depending upon the status of the user, e.g., expert vs. student), but are later altered dynamically in response to the course of the dialog and what it has revealed about the user's background knowledge. These ideas have been encoded in a small computer program which uses a limited knowledge base of rules and associations from the domains of pnaryngitis (sore throats) and calcium metabolism. We have experimented with a semantic network representation in which the nodes are values of attributes and rules are only one form of link between nodes. All nodes and rules have complexity and importance measures associated with them. An “opinion” regarding a specific patient can be represented as a subset of the nodes in the network, plus the links between them that account for how it has been determined which nodes are active. In this setting, a question tends to ask how it has been determined that a given node is active for a given patient. The appropriate explanation could be very complex if an effort were made to explain every link leading from data observations to the node descriptor in question. A customized explanation is therefore generated based on three variables which can be dynamically manipulated by the program: (1) the focus of the dialog (e.g., broad-based vs. localized), 131 E. A, Feigenbaum EXPEX Project P41 RROO785-08 (2) the expertise of the user, and (3) the degree of generality which is appropriate. These three variables are clearly not independent, and we are experimenting with ways to have their values manipulated in a reasonable fashion as the dialog proceeds. This early effort has provided the basis for further discussions in our seminar group as we have attempted to arrive at an optimal representation for the research to follow. We have been fortunate to enlist the collaboration of an endocrinologist at Stanford, Dr. Larry Crapo, who is eager to work with us on building an endocrinology knowledge base. It is likely that we will select the pathophysiology of calcium disorders as a small focused area to study. This domain is appealing for computer-based representation because the relationships are well-understood and there are some challenging problems. of feedback homeostasis that will need to be represented. In the years ahead, we will encode this knowledge base in detail and begin experiments on the generation of explanations using the kinds of techniques outlined above. Selecting A Response Strategy Our explanation efforts to date have tended to be simple reiterations of individual reasoning steps, but it is clear that experts and teachers use several alternate strategies for conveying their ideas or key facts. Many of these techniques draw upon common sense world knowledge (e.g., analogies with familiar concepts outside the domain), but we have thus far failed to capitalize on these teaching strategies in our work. Thus another goal of the work that lies ahead will be to develop structures for drawing parallels or otherwise representing the strategies used by good "explainers." Modeling Contextual Information Reqarding the Interaction We have already mentioned some of the ways in which contextual information may be useful in determining the best way to answer a question. For example, a more accurate model of the user's knowledge can be developed over time, and the extent to which a given conversation is focused on a particular local topic can be assessed. Note that we are emphasizing here issues other than those related to natural language understanding; computational linguists also often cite the need to record contextual dialog information in order to handle problems such as anaphora. An understanding of the "flow" of a dialog is also important in understanding the meaning of subsequent questions, as we discuss below. Understanding The Question This issue interfaces with the problem of natural language understanding, but we view it in a somewhat different light. We emphasize instead the ways in which the model of the user and contextual information may allow us to disambiguate questions. To draw from a medical example that we have frequently discussed, consider the following scenario. A reasoning program for pharyngitis diagnosis and management has just diagnosed strep throat and recommended penicillin and the user asks the question “Why would you give penicillin?" In the most obvious case, one E. A. Feigenbaum 132 P41 RROO785-08 EXPEX Project might imagine a response that itemizes the risks of streptococcal infections and the reasons for treating early with penicillin. Similarly, one might expect a more detailed response for a student and a quick summary for a physician using the system. However, an alternate interpretation is that EVERY physician knows the theoretical reasons for giving penicillin in strep pharyngitis, and that if the user is a physician and is asking the question then he must be asking something different than the simple informational question. In this case the query might be interpreted as a challenge (one that might have been conveyed by tone of voice if it had been asked of a human consultant). Apparently the user has reason to doubt that penicillin was the appropriate agent in this case, or thinks that no drug was required. Other background information and contextual knowledge should also help, and an intelligent program might thereby answer the question in a given case in any of the following ways: "Because the patient has pre-existing rheumatic heart disease." "Because I doubt that he is atlergic to penicillin, even though he reported that he is." "Because he is unreliable and I am afraid I will? not be able to reach him to call him back if his strep culture comes back positive.” "Because I tend to treat conservatively and give penicillin for strep throat even though I know there hasn't been a case of rheumatic heart disease in California in over 10 years." Note how different these kinds of explanations are from the simple justification that a program such as MYCIN might have given: "Because streptococcal pharyngitis may be followed by rheumatic myocarditis or glomerulonephritis, mediated by immune complexes, and I can prevent this complication by giving penicillin (to which streptococci are uniformly sensitive)." The ideal intelligent assistant should be able to determine from knowledge of the user, the domain, the individual case, and the context of the dialog, which of the preceding responses is most appropriate. We will attempt to identify methods for giving our program this kind of capahility. D. Publications Since January 1980 Wallis, J.W. and Shortliffe, E.H. Explanatory power for expert systems: studies in the representation of causal relationships for medical consultations. Internal working memo, Heuristic Programming Project, Stanford University, May 1981. 133 E. A. Feigenbaum EXPEX Project P41 RROO785-08 E. Funding Support Grant Title: "The Development of Representation Methods to Facilitate Knowledge Acquisition and Exposition in Expert Systems" Principal Investigator: Edward H. Shortliffe Agency: Office of Naval Research ID Number: NR 049-479 Term: January 1981 to December 1983 Total award: $456,622 Current award (1981): $140,825 tI. INTERACTION WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX We are only beginning program development at this time, and have therefore had no opportunity to share our results with others as of yet. B. Sharing and Interaction with Other SUMEX-AIM Projects We anticipate frequent ongoing interactions with other SUMEX-AIM research efforts because the development of explanation techniques is a pertinent research issue for all expert systems work in medicine. Bill Clancey's work on GUIDON is addressing many of the same issues and we expect frequent opportunities for interchange. C. Critique of Resource Management Although we have not yet placed significant demands on SUMEX management, our previous experience working with Tom Rindfleisch and his staff would suggest that this new project will receive the same kind of Taudatory service for which SUMEX has become known. III. RESEARCH PLANS (6/81-12/83) A. Project Goals and Plans We intend to investigate optimal techniques for the cowputer-based representation of expert knowledge. Because we have come to recognize the limitations of any single representation technique taken alone, a principal objective will be to merge alternate approaches, augmented with new capabilities. We will, in turn, evaluate the effectiveness of the new representation scheme by focusing on issues of both knowledge acquisition and explanation, Furthermore, we will perform these experiments in two expert domains, medical reasoning (EXPEX) and computer circuitry debugging (DART). These areas were selected because we have local expertise in each, but also because they are sufficiently different from one another that they will force us to ensure the generality of the techniques we are developing. Utilizing a single representation scheme for all aspects of the work will also encourage generality of the developed techniques because this will E. A. Feigenbaum 134 P41 RROO785-08 EXPEX Project force us to avoid concentrating on either the input (knowledge acquisition) or output (explanation) functions alone. Initially we shall concentrate on defining the knowledge representation scheme to be used. Although modifications will of course be necessary in response to additional lessons learned thereafter, we expect to reach an early consensus on the major components of the internal representation we will be using. Subsequently our efforts will divide into two components, each of which will utilize the representation scheme devised in the initial period. EXPEX will concentrate on manually constructing a knowledge base regarding calcium metabolism and pathophysiology, whereas the DART effort wili be concentrating on knowledge acquisition for their non-medical domain. In the EXPEX work, the construction of the clinical knowledge base will have created an environment for the development of the explanation capabilities which are the second thrust of our work. Drawing on the early work of Jerry Wallis, described in the memo referenced in the publications section of this report, we will next construct an expository system that uses the endocrinology knowledge base in order to generate interactive explanations. Ultimately we hope to perform experiments using both the knowledge acquisition and explanation tools that will have been developed in the two separate domains. One task will be to see if we can develop from scratch the endocrinology knowledge base that will have been hand-coded for the EXPEX effort. Because this knowledge will have previously been encoded manually, we will have a well-defined model of the form the knowledge should take as it is acquired interactively using the new system building tools developed in the electronics environment. At the same time, some of us will be developing experiments to test the validity and effectiveness of the explanation tools that were developed for EXPEX. An excellent test of the generality we have been seeking will be to build the circuit debugging knowledge base using the new knowledge acquisition tools, and then to demonstrate the utility of the explanation routines for exposition of the knowledge in this new domain. Our ultimate goal, then, is to have developed a unified system of knowledge representation that facilitates both system building, through interactive knowledge acquisition, and explanation, through interactive responses to knowledge base queries. It should be emphasized that tiroughout our work the focus will be on the underlying representation issues and not on polished text generation nor natural language understanding. 135 E. A. Feigenbaum MOLGEN Project P41 RROO785-08 IIT.A.1.5 MOLGEN Project MOLGEN - A Computer Science Application to Molecular Biology Prof. E. Feigenbaum and Dr, P. Friedland Department of Computer Science Stanford University Assoc. Prof. D. Brutlag Department of Biochemistry Stanford University Assoc. Prof. L. Kedes Department of Medicine Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The MOLGEN project has focused on research into the applications of symbolic computation and inference to the field of molecular biology. This has taken the specific form of systems which provide assistance to the experimental scientist in various tasks, the most important of which have been the design of complex experiment plans and the analysis of nucleic acid sequences. We plan to expand and improve these systems and build new ones to meet the rapidly growing needs of the domain of recombinant DNA technology. We do this with the view of including the widest possible national user community through the facilities available on the SUMEX-AIM computer resource. It is only within the last few years that the domain of molecular biology has needed automated methods for experimental assistance. The advent of rapid DNA cloning and sequencing methods has had an explosive effect on the amount of data that can be most readily represented and analyzed by computer. Moreover we have already reached a point where progress in the analysis of the information in DNA sequences is being limited by the combinatorics of the various types of analytical comparison methods available. The application of judicious rules for tiie detection of profitable directions of analysis and for pruning those which obviously lack merit will have an autocatalytic effect on this field in the immediate future. The MOLGEN project has continuing computer science goals of exploring issues of knowledge representation, problem-solving, and planning within a real and complex domain. The project operates in a framework of collaboration between the Heuristic Programming Project (HPP) in the Computer Science Department and various domain experts in the departments of Biochemistry, Medicine, and Genetics. It draws from the experience of several other projects in the HPP which deal with applications of artificial intelligence to medicine, organic chemistry, and engineering. E. A. Feigenbaum 136 P41 RROO785-08 MOLGEN Project During the next three years of MOLGEN research we intend to begin a transition from being primarily a computer science research project to being an interdisciplinary project with a strong applications focus. The tools that we have already developed will be improved. to the point where they make a significant contribution to both research and engineering in the domain of molecular biology. B. Medical Relevance and Collaboration The field of molecular biology is nearing the point where the results of current research will have immediate and important application to the pharmaceutical and chemical industries. Recombinant DNA technology has already demonstrated the possibility of harnessing bacteria to produce nearly limitless amounts of such drugs as insulin and somatostatin. Governmental reports estimate that there are more than 200 new and established industrial firms already undertaking product development using these new genetic tools. The programs being developed in the MOLGEN project have already proven useful and important to a considerable number of molecular biologists. Currently several dozen researchers in various laboratories at Stanford (Prof. Paul Berg's, Prof. Stanley Cohen's, Prof. Laurence Kedes'’, Prof. Douglas Brutlag's, Prof. Henry Kaplan's, and Prof. Douglas Wallace's) and over 300 others throughout the country are using MOLGEN programs over the SUMEX-AIM facility. We have exported some of our programs to users outside the range of our computer network (University of Geneva [Switzerland], Imperial Cancer Research Fund [England], and European Molecular Biology Institute [Heidelberg] are examples). C. Highlights of Research Progress Accomplishments: The current year has seen the completion of what might be considered the first phase of the MOLGEN project. This section will summarize the major accomplishments of that first phase. 1. Representation Research The domain of molecular biology has proven a fruitful testbed in the development of a flexible software package, the Unit System, for symbolic representation of knowledge. The package is already in use by a variety of research projects both within the Heuristic Programming Project at Stanford and at other institutions. It provides for acquisition and storage of many different types of knowledge, ranging from simple declarative types like integers and strings to complex declarative types like nucleic acid restriction maps to procedural types like a rule language in a subset of English. A major effort has been made in the past year to take the Unit System and, observing the experience of its many scientific users, improve and enhance those features which are most important. This has resulted ina speed improvement of at last two orders of magnitude for the most used 137 E. A. Feigenbaum MOLGEN Project P41 RROO785-08 functions in the representation system. The MOLGEN project has provided a unique laboratory for the conversion of a theoretically-based knowledge base system into a practical package for knowledge acquisition and manipulation. This is because of the active daily. use of the system for real laboratory problems. The Unit System has become what may be considered the first "second-generation" knowledge representation package. We have concentrated on representation methods that are unique to molecular biology, particularly, convenient methods for storing information about nucleic acid sequences and maps of those sequences, as well as an English- like language for manipulating that information. This language has allowed and encouraged the molecular biologist members of the MOLGEN project to become their own “programmers” without having to worry about the underlying representation structures of their knowledge bases. For example, the phrase: JOIN SV40 FROM FIRST ECOR1 SITE TO FIRST BAMH1 SITE TO PBR322 FROM 250 TO 3000 INTO NEWSEQUENCE will perform the clearly indicated operation. 2. Planning Research The problem of designing laboratory experiments in molecular biology has been fundamental to MOLGEN research. The work has been split into two major subparts, each resulting in a doctoral thesis in computer science. The two systems, developed by Peter Friedland and Mark Stefik, produce reasonable experiment designs on test problems suggested by laboratory scientists. The majority of MOLGEN planning work has awaited the successful completion of the latest phase of representation improvement described above. Dr. Rene Bach, a post-doctoral fellow, has recently completed a DNA sequencing experiment adviser, built entirely with the Unit System and associated procedural description language. This system provides guidance in developing a sequencing protocol given a partial initial restriction map for a new nucleic acid sequence. Other members of the MOLGEN group have begun exploring the representations needed with the Unit System to provide transparent descriptions (i.e., invisible to the non-computer scientist user) of growing experimental plans, so that plans may be treated in a manner identical with all other types of knowledge. 3. Knowledge Base Construction Over six man-years have now been spent in constructing knowledge bases for various fields of molecular biology. Professors Kedes and Brutlag, Dr. Bach, and several students have cooperated on one knowledge base which is expert in restriction enzyme methodology and another which is competent for a wide range of general laboratory techniques. Professor Kedes has worked on a knowledge base for his own interests in gene structure. Professor Brutlag is concentrating on a knowledge base for satellite DNAs. Dr. Bach has built a knowledge base expert in sequencing methods for his sequencing advisor (see above). Professor Sninsky and Dr. E. A. Feigenbaum 138 P41 RROO785-08 MOLGEN Project Bach have collaborated in a knowledge base for expression vectors. Professor Sninsky and Dr. Abarbanel have collaborated on the beginnings of a protein knowledge base to explore methods for predicting secondary protein structure from primary amino acid sequence. Several researchers in Professor Hogness's laboratory in the Department of Biochemistry have begun to build a knowledge base for storing information about many different lambda vector clones. Finally, several scientists in Professor Kaplan's Cancer Biology Research Laboratory have started to explore a monoclonal antibody knowledge base. The knowledge bases developed under the MOLGEN grant have begun to find their way into the daily laboratory practice of many of the scientists associated with the project. They have a provided a mechanism for managing the explosive growth of data and strategies in many areas of molecular biology without the necessity of building special purpose systems for each area. Also, the expert scientists themselves have been able to design and build their own systems, avoiding the time and reliability problem of a knowledge base passing through the filter of a computer scientist intermediary. The knowledge bases have served as "intelligent encyclopedias," as simulation systems, and as training vehicles. It should also be noted that the Unit System allows for the easy transfer of knowledge from one knowledge base to another, and indeed the various expert molecular biologists have freely shared information as they work on related knowledge bases. 4. Other Applications of Symbolic Computation to Molecular Biology MOLGEN programmers have spent full time enhancing existing MOLGEN applications programs and developing new systems. The SEQ program is a general purpose nucleic acid sequence analysis system. It provides a range of functions including translation, lexicography, regions of richness, restriction mapping, general string search, and intra- and inter-sequence homologies, symmetries, and dyad symmetries. The program calculates statistical probabilities for homologies, symmetries, and dyad symmetries, and also determines approximate free energy contribution of dyad symmetry structures in RNA. SEQ is highly interactive and provides many built-in explanation and help facilities. MAP is a program which determines restriction sites from enzymatic digest data. A recent collaburation with Dr. Pearson has been exploring ways to combine the ideas from MAP. and Dr. Pearson's system for solving Similar problems. Work has begun on two major new applications programs: GEL, aa system which provides bookkeeping and overlap determining assistance for "shotgun" sequencing experiments; and AA a program which provides most of the functions of SEQ for amino acid sequences. During the first year of the current MOLGEN grant, we have provided guest access to the SEQ and MAP programs to the national academic community through the facilities of the SUMEX-AIM computer system. This has meant free, dial-up access from almost anywhere in the United States. Over 300 139 E. A. Feigenbaum MOLGEN Project P41 RR00785-08 researchers at over 80 institutions have used the service. It has been so popular that the SUMEX-AIM Executive Committee has found it necessary to limit the service to at most two simultaneous users at any one time. The facilities provided to MOLGEN guests has been very limited, with a single directory and 250 disk pages serving the entire national community. Despite this, a wide variety of interesting research has been done, and MOLGEN is most grateful to the SUMEX-AIM staff for their kind assistance. Research in Progress: The remainder of the current grant period will be spent on the further development of the tools that have been constructed for experiment design and sequence analysis and on expansion and improvement of the knowledge base. This section details those research plans. 1. Representation The Unit System is now at a stage of general utility. The MOLGEN group will continue to enhance and improve both the underlying representation methods and the user interface. A further order of magnitude improvement in the speed of common operations will be achieved during the next year. There will be at least a doubling of the genetic- specific vocabulary of the procedural description language. Mechanisms will be developed to allow the Unit System to communicate effectively with all of the major MOLGEN applications programs, particularly SEQ, MAP, and GEL. We also anticipate a major effort, beginning in the summer of 1981, to adapt the Unit System to run on one of the newly available personal scientific work-station computers--most likely the Xerox Dolphin. This will provide a qualitative improvement in user interaction because of the large bit-map display and graphics capabilities, and the ability to free individual knowledge base builders from time-sharing system load. 2. Knowledge Base Development and Planning Planning work will proceed by moving into at least one new sub-domain and by synthesizing the research of previous years. Dr. Bach will construct a knowledge-based system for the planning of cloning experiments using the experience he has gained from the simpler task of designing sequencing experiments. The previous MOLGEN planning research has produced two major ideas. One, the "sSkeletal-pian” approach of Dr. Friedland, involves the selection and refinement of planning strategies, which may range from abstract to specific, provided as part of an expert knowledge base. This idea resulted from a study of the way in which molecular biologists design experiments. The other idea was the “constraint-posting" method of Dr. Mark Stefik, which concentrates on the extensive evaluation of the constraints introduced by each step of a growing plan in order to guide the selection of the next step of the plan. The MOLGEN group will undertake the project of synthesizing these two ideas, combining the practical efficiency of skeletal-plan refinement with the general power of constraint posting. E. A. Feigenbaum 140 P41 RROQ785-08 MOLGEN Project We will also begin work on plan verification and optimization systems within the same general framework; i.e. systems to check proposed plans for suitability and to improve such plans. We hope to begin to extend this to a plan debugging system; one which interacts with an experimenter to determine where an experiment failed and how to correct the problem. Knowledge base construction will proceed, with each MOLGEN collaborator exploring the particular sub-domain most interesting to him. Professor Maxam will begin building a knowledge base, most likely in the area of detailed nucleic acid structure, this summer. This will -lead to joint work with Dr. Friedland on the application of knowledge-based methods to the study of the mechanism of gene reguiation. Several researchers at the Imperial Cancer Research Fund in London, England will join the effort starting in the early autumn of 1981. In particular, we expect the area of cloning methodology to occupy a large portion of Dr. Bach's time in knowledge base construction. We will draw from the wide and varied expertise within the Stanford community during this effort. 3. Applications Systems Work will continue on improving and enhancing SEQ and MAP. The GEL and AA programs will begin operation. The currently available tools within the Unit System allow the molecular biologists themselves to design and construct special-purpose applications systems within their knowledge bases. Professors Brutlag and Kedes have already built over a dozen such "programs" for construction of specific restriction maps and fragment tables and the simulation of recombinant DNA operations. As the domain experts decide that these systems are generally useful, they will be optimized and packaged into stand-alone programs. The MOLGEN group will continue to cooperate with any national efforts to develop a sequence analysis data bank and facility for the academic community. We hope our current collaborative service activities on SUMEX- AIM will serve as the prototype for a larger and more comprehensive national facility. D. Publications Feitelson J., Stefik M.J., A Case Study of the Reasoning in a Genetics Experiment, Heuristic Programming Project Report HPP-77-18 (Working Paper) (May 1977) Friedland P., Knowledge-Based Experiment Design in Molecular Genetics, Proceedings Sixth International Joint Conference on Artificial Intelligence, 285-287 (August 1979) Friedland P., Knowledge-Based Experiment Design in Molecular Genetics, Ph.D. Thesis, Stanford CS Report CS79-760 (December 1979) Martin N., Friedland P., King J., Stefik M.J., Knowledge Base Management for Experiment Planning in Molecular Genetics, Fifth International Joint Conference on Artificial Intelligence. 882-887 (August 1977) 141 E. A. Feigenbaum MOLGEN Project P41 RROQ/785-08 Stefik M., Friedland P., Machine Inference for Molecular Genetics: Methods and Applications, Proceedings of the National Computer Conference, (June 1978) Stefik M.J., Martin N., A Review of Knowledge Based Problem Solving As a Basis for a Genetics Experiment Designing System, Stanford Computer Science Department Report STAN-CS-77-596. (March 1977) Stefik M., Inferring DNA Structures From Segmentation Data: A Case Study, Artificial Intelligence 11, 85-114 (December 1977) Stef ik, M., An Examination of a Frame-Structured Representation System, Proceedings Sixth International Joint Conference on Artificial Intelligence, 844-852 (August 1979) Stefik, M., Planning with Constraints, Ph.D. Thesis, Stanford CS Report CS80-784 (March 1980) E. Funding Support The MOLGEN grant is titled: MOLGEN: A Computer Science Application to Molecular Biology. It is NSF Grant ECS-8016247. Current Principal Investigators are Edward A. Feigenbaum and Bruce G. Buchanan, Professors of Computer Science, Laurence H. Kedes, Investigator, Howard Hughes Medical Institute and Associate Professor of Medicine, and Douglas L. Brutlag, Associate Professor of Biochemistry. MOLGEN is currently funded from 10/80 to 9/81 at $146,582 including indirect costs as the first year of a three year renewal. Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE Until this year, all system development had taken place on the SUMEX- AIM facility. The facility has not only provided excellent support for our programming efforts but has served as a major communication link among members of the project. Systems available on SUMEX-AIM such as INTERLISP, TV-EDIT, and BULLETIN BOARD have made possible the project's programming, documentation and communication efforts. The interactive environment of the facility is especially important in this type of project development. Unfortunately, the computing environment at SUMEX has suffered in the recent past from heavy demands on cycle time creating serious real-time delays for programmers and knowledge-base building especially. The Units Editor is especially sensitive because of its relatively large demands on cpu-resources. Accordingly, a significant fraction of the MOLGEN group activity has been transferred to the SCORE computer in the Department of Computer Science at Stanford. When SUMEX hardware is updated, we anticipate that its response time will improve and the MOLGEN computing will return full time to SUMEX. It is clear, however, that the MOLGEN project continues to thrive and prosper because of the computing environment only available at SUMEX: the interactive environment including instantaneous communications among collaborators who are physically distant (even on the Stanford campus), and especially the unique telecommunications E. A. Feigenbaum 142 P41 RRO0785-08 MOLGEN Project facilities that have allowed the development of the GENET community with its access to MOLGEN applications tools are two clear examples. We have taken advantage of the collective expertise on medically- oriented knowledge-based systems of the other SUMEX-AIM projects. In addition to especially close ties with other projects at Stanford, we have greatly benefitted by interaction with other projects at yearly meetings and through exchange of working papers and ideas over the system. The ability for instant communication with a large number of experts in this field has been a determining factor in the success of the MOLGEN project. It has made possible the near instantaneous dissemination of MOLGEN systems to a host of experimental users in laboratories across the country. The wide-ranging input from these users has greatly improved the general utility of our project. We find it very difficult to find fault with any aspect of the SUMEX resource management. It has made it easy for us to expand our user group, to give demonstrations (through the 20/20 adjunct system), and to disseminate software to non-SUMEX users overseas. III. RESEARCH PLANS A. Justification and Requirements for Continued SUMEX Use The MOLGEN project depends heavily on the SUMEX facility. We have already developed several useful tools on the facility and are continuing research toward applying the methods of artificial intelligence to the field of molecular biology. The community of potential users is growing nearly exponentially as researchers from most of the bio-medical fields become interested in the technology of recombinant DNA. We believe the MOLGEN work is already important to this growing community and will continue to be important. The evidence for this is an already large list of pilot exo-MOLGEN users on SUMEX. SUMEX is currently having difficulty meeting the research needs of the MOLGEN project adequately. We expect to need more file space as our knowledge bases grow; perhaps an additional 5000 disk blocks in the next few years for that work. Our real difficulties will come in the applications testing of MOLGEN tools. We support with great enthusiasm the acquisition of satellite computers for technology transfer and hope that the SUMEX staff continue to develop and support these systems. One of the oft-mentioned problems of artificial intelligence research is exactly the problem of taking prototypical systems and applying them to real problems. SUMEX gives the MOLGEN project a chance to conquer that problem and potentially supply scientific computing resources to a national audience of bio-medical research scientists. 143 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 IT.A.1.6 MYCIN Projects Group MYCIN Projects Edward H. Shortliffe, M.D., Ph.D. Departments of Medicine and Computer Science Stanford University Bruce G. Buchanan, Ph.D. Computer Science Department Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The MYCIN Projects are a set of related research programs, each devoted to the development of knowledge-based expert systems for application to medicine and the allied sciences. The name was derived from our first system, the MYCIN program. That research has now given way to three active sub-projects (EMYCIN, GUIDON, and ONCOCIN), each of which is discussed in the sections that appear below. The key issue for all sub- projects has been to develop programs that can provide advice similar in quality to that given by human experts, and to develop systems that are easy to use and acceptable to physicians and medical students. The success of the original MYCIN infectious disease consultation program has led us to try to generalize and expand the methods employed in that program to a number of ends: (1) to develop consuitation systems for other domains (our generalized system-building tool is known as “Essential MYCIN", or EMYCIN, and has been applied in several new areas; ONCOCIN is our newest consultation system and was inspired by EMYCIN although it is actually an entirely new program); (2) to explore other uses of the MYCIN knowledge base (our tutoring system, GUIDON, uses the infectious disease knowledge in MYCIN to teach medical students about diagnosis and management of infections); (3) to continue to improve the interactive process, both for the developer of a knowledge-based system, and for the user of such a system (both EMYCIN and ONCOCIN have stressed simplified techniques for interacting with a knowledge base and entering data); and (4) to experiment with alternate techniques for knowledge representation, recognizing that the pure production rule method used in MYCIN was inadequate at times and frequently led to confusion regarding the separation of strategy or control E. A. Feigenbaum 144 “P41 RROO785-08 MYCIN Projects Group processes from domain knowledge (ONCOCIN uses production rules as only one of several knowledge representation techniques, and the work on GUIDON has led to a more robust revised version of MYCIN known as NEOMYCIN). B. Medical Relevance and Collaboration By utilizing our EMYCIN system to collaborate on building the PUFF program, we learned that it is possible in a short period of time to develop a clinically useful consultation system using the domain- independent parts of MYCIN. EMYCIN has since been applied in a number of additional medical domains. With each successive application we learn more about the representation of medical knowledge and the scope and timitations of the production rule formalism used in EMYCIN. For example, it has become clear that "shallow" rules relating signs and symptoms to diagnoses through a few intermediate concepts can be sufficient for high performance in medical diagnosis. On the other hand, such shallow rules are not always sufficient for teaching medical students because they Tack the deeper causal links needed for justifying and remembering the more shallow associations (see GUIDON discussion below). Although EMYCIN was not used to build our new ONCOCIN program, the lessons learned in building prior production rule systems have allowed us to create a large oncology protocol management system much more rapidly than was the case when we started to build MYCIN. We are introducing ONCOCIN for use by Stanford oncologists in the Spring of 1981. This would not have been possible, of course, without the active collaboration of Stanford oncologists who helped with the construction of the knowledge base and also kept project computer scientists aware of the psychological and logistical issues related to the operation of a busy outpatient clinic. In addition, there is a growing realization that medical knowledge, originally codified for the purpose of computer-based consultations, may be utilized in additional ways that are medically relevant. Using the knowledge to teach medical students is perhaps foremost among these, and GUIDON continues to focus on methods for augmenting clinical knowledge in order to facilitate its use in a tutorial setting. A particularly exciting aspect of this work is the insight that has been gained regarding the need to structure knowledge differently, and in more detail, when it is being used for different purposes (e.g., teaching as opposed to clinical decision making). This aspect of the GUIDON research has Ted to the development of a modified version of MYCIN, NEOMYCIN, which is an evolving computational model of medical diagnostic reasoning that we hope will enable us to better understand and teach diagnosis to students. C. Highlights of Research Progress 1. Accomplishments This Past Year EMYCIN In the last year, substantial research efforts were completed and described in publications. First, a complete EMYCIN system, including a 145 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 rule compiler and interactive rule editor, was packaged and documented in Bill van Melle's thesis. Second, an investigation into mixing production rules and frames as a representation of medical knowledge was completed and documented by Jan Aikins. Third, a redesign of the EMYCIN system to include more of the structural and strategic knowledge needed for tutoring was completed and documented (see description of NEOMYCIN in the GUIDON sections below). In addition, the LISP code and the EMYCIN Manual were both improved considerably #n response to suggestions from outside users. The complete EMYCIN system includes: a) a rule interpreter, b) an explanation facility, c) an abbreviated rule language and editor for rule input, d) a debugging package, and e) a rule compiler. Some new developments in these parts are described briefly below and details are given in van Melte's thesis. The rule interpreter has remained much the same since its conceptualization as a procedure that traces backward through chains of rules asking questions of the user only when values of parameters cannot be deduced. One important new development was a change in the certainty factor model used to propagate degrees of certainty from multiple pieces of positive and negative evidence to a conclusion. The new model is commutative, which means that it is no longer necessary to accumulate separate measures for positive and negative evidence, but a single measure reflecting both. The new model gives the same result as the old one in combining certainty factors of like sign, but is more gentle when combining CF's of opposite sign. The previous scheme had the problem of compressing too much information into the region near 1, and, as a result, ten conclusions of CF= 0.9 could be substantially overthrown by a single conclusion of CF= -0.8. The explanation capability was improved by implementing the system's dictionary in hash tables. This gives the reasoning program more working space, since the dictionary can be kept on secondary storage until needed. Also, access time for parts of the dictionary is small because of the hash coding. The mechanisms used in EMYCIN have been used in other systems as well. The input language for new rules has been simplified and stylized in an abbreviated rule language (ARL). ARL exploits the fact that associative triples are almost as easy for a person to read and manipulate as English text -- "the X of Y is Z", although stylized, is understandable. ARL resembles a shorthand form we have seen several domain experts use to sketch out sets of rules. The parameter names used in ARL are simply the labels the expert uses in defining the parameters of the domain. The conciseness of ARL makes it much easier to input than English, which is an important consideration when entering a large body of rules. Its conciseness is also a benefit when EMYCIN prints large numbers of rules for the expert to examine. Because knowledge acquisition is a critical problem for building expert systems [cf. Buchanan, 1981], we look on ARL as a pragmatic solution to a large problem we are addressing more fully in a knowledge acquisition system called ROGET. E. A. Feigenbaum 146 P41 RROO785-08 MYCIN Projects Group The debugging package brings together pieces that have been part of the system for some time. These inctude: 1) the EMYCIN explanation facility; 2) a program that automatically explains how the system arrived at the results of a consultation; 3) a program that reviews each result of a consultation, allowing the user to judge whether the result is correct, and assisting the user in refining the knowledge base in order to correct any errors noted in the result or in intermediate conclusions; and 4) a program that automatically compares the results of a consultation to stored "correct" results for the same case, and explains any errors in the conclusions. The rule compiler, described in last year's progress report, has been integrated with the whole system. Production rules, while convenient in their modularity, are not the best representation for speedy execution. The rule compiler transforms a program's production rules into a decision tree, eliminating the redundant computation inherent in a rule interpreter, and compiles the resulting tree into machine code. The program can thereby use an efficient deductive mechanism for running the actual consultation, while the flexible rule format remains available for acquisition, explanation, and debugging. Finally, the EMYCIN user's manual has been improved. This manual is designed to be used by system builders who are creating a consultation system, not by the eventual users of the consultation system itself. The second major research effort completed this year was an investigation of the efficacy of mixing production rules and frames in a medical reasoning program. This system, called CENTAUR, was described in Jan Aikins' thesis. The medical problem is identical to the topic for the PUFF system, namely the diagnosis of pulmonary function disorders. The medical knowledge, too, was fixed for some of the experiments. The ways the knowledge was represented and used in CENTAUR, however, were changed in order to determine whether improvements over PUFF could be found. The representation of prototypical cases in CENTAUR improves the understandability of the program's line of reasoning and gives it more focus than PUFF has. Frames (called "prototypes" by Aikins) encode the knowledge of typical cases. These are linked in a hierarchy of ever more specialized descriptions of the subtypes, and are linked to production rules that associate evidence with hypotheses. The control of reasoning in CENTAUR is also represented in frames, which allows explicit changes in reasoning strategies and experimentations with alternatives. Because of this explicitness, the program can explain its control strategy to a user, thus making it more understandable than in PUFF. EMYCIN Applications EMYCIN is intended for use by system builders who wish to construct a consultation program around a production rule representation of knowledge (with individual facts represented as associative triples) and a backward chaining control structure. It is not a universal programming language, 147 E. A. Feigenbaum MYCIN Projects Group P4i RROQO785-08 but because the form of the final consultation program is fixed in advance, EMYCIN can save the system builder considerable effort. We have demonstrated this fact with several applications. Four were described in last year's report (PUFF, SACON, HEADMED, and CLOT). Three more were undertaken in this year, and a fourth was recently begun in the field of dermatology by Or. Blois and his associates at UCSF. GRAVIDA A medical consultant called GRAVIDA, was developed to track an expectant mother through her pregnancy. Constructed by Dr. Val Catanzarite, currently a resident at Santa Clara Valley Medical Center, the system acquires information about current and past medical problems of the mother, any previous pregnancies, and general historical data about the patient. GRAVIDA then keeps track of the patient on a per-visit basis, recommending tests, detecting potentially dangerous medical conditions, and estimating the current age of gestation. The construction of this consultant required the extension of the rule language (by adding several new predicate functions) to look for simple trends and events over a series of previous visits. DART The other consultants are applied to non-medical domains. In conjunction with the IBM Corporation we have developed a consultant, called DART, that identifies probable causes of failures in teleprocessing subsystems of IBM 370-class computer systems. The system accepts stylized descriptions of the observed failure (e.g., lost data, machine went into a loop, terminal doesn't respond, etc.) and then directs the acquisition of data which are collected from traces available to field service personnel. Finally, DART uses these data to indict specific components, both hardware and software, which might be broken. LITHO The other major consultant now under development seeks to identify rock formations found at various depths of an oil-well bore hole. The consultant, called LITHO, examines geological and physical data of individual zones of interest to identify various aspects of the geological formations, This consultant is being constructed in conjunction with the Schlumberger Corporation and is similar to the GEO consultant developed with the AGE system. With the publication of van Melle's EMYCIN thesis, which deals with the design of improved knowledge acquisition facilities for EMYCIN, and with the availability of an EMYCIN manual, each of the three consultants described above was constructed largely by the experts themselves. After initial discussions concerning the design of the system's goals, the identification of the data to be gathered, and the basic flow of the consultant's dialogue, the process of writing and inputting the hundreds of rules and parameters per system has been done primarily by the expert. Atl of them have remarked on the ease with which the current facilities allow E. A. Feigenbaum 148 P41 RROO785-08 MYCIN Projects Group this interaction to occur. As a result of these experiments, numerous improvements and modifications, both to the EMYCIN system and to the manual, are being incorporated into the package. GUIDON The original version of GUIDON, described in Clancey's 1979 thesis, was developed as an experiment to test the educational potential of MYCIN's rules and the ability to use the rule base outside of the consultation setting. Experiments with medical students indicated that GUIDON's framework for teaching knowledge was reasonably satisfactory, but the teaching points were not always clear in the rules. We then conceived a two step plan: first, to analyze the rule set and change it as necessary for teaching purposes (retaining the consultative capability), and second, to test this revised rule set in a new version of GUIDON. In 1980, the first step of this plan was achieved. Analysis of MYCIN's Rutes From the period February 1 through December 1, 1980 we met regularly with a physician consultant for the purpose of revising MYCIN's rules so that teaching points were clear. Protocol analysis (presenting cases MYCIN had solved) was the chief method. We also experimented with sorting of medical findings, direct lectures, and undirected recall ("tell me everything you know about..."). We attended a series of courses taught by the same physician and compared them to another physician's handling of the same course. Key Findings: a. Our framework of structural, support, and strategic knowledge for organizing, justifying and controlling the use of heuristic rules served well in knowledge acquisition dialogues. We would always ask ourselves, “What kind of rationale is he giving me? A data/hypothesis rule? Why does he believe a rule? Why did he think to consider that association (the indexing, the approach)?" We put our analysis on this psychological footing from the start, because we learned in GUIDONt that a tutorial program must incorporate knowledge that people use to access and control their heuristics. b. It is not sufficient to revise MYCIN's rules; the decomposition of knowledge into subgoals is itself sometimes imprecise or non- standard, c. Knowledge has to be added, namely the expertise for when to use MYCIN: when should one think about meningitis? what might it be confused with? MYCIN was not designed to be the "primary care" physician, but teaching diagnosis, our goal, involves expanding the knowledge base to include initial problem formulation. 149 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 The physician's approach was logical and easy for us to emulate. He was consistent from case to case, and moreover did what he told students to do. This is not necessarily typical. Other teachers we observed were not able to articulate their approach as clearly and seemed to be less sure of what students were thinking. There were common strategical concepts, however, that our three experts all used to explain their reasoning. Development of NEOMYCIN We implemented a prototype consultation system that constitutes a psychotogical model of diagnostic problem solving. This system is upward compatible with EMYCIN systems, and thus could replace the EMYCIN language and interpreter. Key theoretical features of the design: E. a. A. Forward-directed reasoning from data to hypotheses and state- categories, emulating expert problem solving: 1) Trigger rules place hypotheses in the differential diagnosis directly as data are received. The differential is maintained so that more specific causes replace general hypotheses. 2) Data are abstracted immediately, e.g., “diplopia” is thought of as an "abnormal neurological finding" 3) Process-oriented questions are immediately asked if they are relevant to the domain, even if not directed to any particular hypothesis, e.g., asking when a symptom began and how it has changed over time. 4) Data suggest causal state-categories, possibly jumping over a chain of causal links to conjecture some generic problem. 5) Data/hypothesis associations are applied in the context of the current differential diagnosis (working memory of hypotheses). Explicit, separate representation of: 1) a problem-space hierarchy to which data/hypothesis rules are attached ("etiological taxonomy") (previously implicit as the "context clauses" of rules); 2) causal rules that ultimately tie into this hierarchy; 3) world relations that constrain the relevance of data (previously implemented as "screening clauses"); 4) disease process knowledge that cuts across the etiological distinctions, useful for initial problem formulation. Feigenbaum 150 P41 RROO0785-08 MYCIN Projects Group c. A hierarchical set of domain-independent meta-rules constitute a diagnostic meta-strategy. These rules examine the knowledge sources listed above and the current differential to select an hypothesis to focus on and the next datum to collect. The key strategical idea to teach students is that collecting circumstantial evidence is preparation for making physical measurements. Its purpose is to “establish the hypothesis space," to determine the range of possibilities that might be causing the problem. Strategies for achieving this involve considering common and unusual causes, looking for evidence that will broaden the space of possibilities. There are two orientations when establishing the hypothesis space: 1) "group and differentiate” -- upward-looking, initial problem formulation in which one tries to cluster the data under some generic process (cause); and 2) "explore and refine" -- attempting to confirm successively more specific causes. The diagnostic meta-rules are generally applied as a pure-production system for each subtask (e.g., "find a new focus" is a subtask). Abort conditions are inherited to simulate shifting of focus (and return to higher goals) as data broaden the differential or exploration suggests that a conjecture is untikely. 2. Research in Progress Short-term Plans for NEOMYCIN and GUIDON We are shifting development of GUIDON to the Dolphin computer, now on loan from Xerox and located in the Computer Science Department building at Stanford. GUIDON must be revised to be compatible with the NEOMYCIN system. These revisions take two forms: a) simplifications to the code (NEOMYCIN is designed to make it easier to index rules as they are used in the tutorial), and b) extensions to take advantage of knowledge now represented explicitly in NEOMYCIN (taxonomy of problems, world facts, diagnostic strategy). NEOMYCIN is essentially a psychological model of diagnosis that enables us to monitor the student's problem solving and provide assistance in ways that were not possible before. For example, we will be teaching rorward-directed inferences -- leaps from data to hypotheses -- that we represent in NEOMYCIN's trigger rules. With this additional knowledge of how experts think, GUIDON version 2 will have leverage for interrupting the student to test his knowledge, as well as having a better basis for understanding a student's partial solutions. 161 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 We expect that complete revisions of GUIDON so that it can take advantage of what is now in NEOMYCIN will require 6 months. This includes an entirely redesigned student model, plus the new capabilities for interruption, assistance, and evaluation of student hypotheses. In parallel, we will be refining NEOMYCIN by testing it on the 100 meningitis cases in our library. Two students will be revising GUIDON; a third student will continue development of NEOMYCIN (on the SUMEX-AIM computer facility). Or. Clancey will direct and participate in both aspects of the project. Formalization of Teaching Principles One of the students who will be revising GUIDON is a doctoral candidate in the Education Department at Stanford. For his thesis research, this student will be parameterizing GUIDON'’s tutorial rules so they are controlled by a higher order model of teaching methods. Design of this model is complete on paper now. It will be implemented after GUIDON2 is working. Formal Experimentation Through our contacts with the medical school, we have arranged to test GUIDON with medical students during the period September ‘81 - March ‘82. This aspect of the project will be managed by the doctoral student in Education. Plans are to do exploratory experimentation 1) to test the usefulness of the diagnostic mode? for interpreting student behavior 2) determine whether theoretical differences in tutoring behavior are detectable by the students. Analysis of results should provide a basis for extending the diagnostic model. Development of a Mechanical/Electronic Diagnostic Program We have begun collaboration with researchers from IBM to develop a system similar to NEOMYCIN in the domain of computer failure diagnosis. The purpose of this project will be to determine to what extent the domain- independent strategies we formalized from experience in the medical domain are applicable to electronic troubleshooting. In the past year, Prof. Buchanan supervised development of an EMYCIN consultation program, named DART, for diagnosing teleprocessing problems. Through this experience, IBM personnel learned about our techniques, and we were introduced to the hardware and software problems they need to solve. We will be drawing upon this experience in the next year. ONCOCIN The oncology chemotherapy consultation system, named ONCOCIN, has achieved many of its goals since work on the project began in July 1979. We are developing an interactive system to be used by oncology faculty and fellows in the Debbie Probst Oncology Day Care Center at Stanford University Medical Center. Our overall goals are: E. A. Feigenbaum 162 P41 RROO785-08 MYCIN Projects Group (1) to demonstrate that a rule-based consultation system with explanation capabilities can be usefully applied and gain acceptance in a busy clinical environment; (2) to improve the tools currently available, and to develop new tools, for building knowledge-based expert systems for medical consultation; and (3) to establish both an effective relationship with a specific group of physicians, and a scientific foundation, that will together facilitate future research and implementation of computer-based tools for clinical decision making. opecific Objectives: The ONCOCIN research goals are directed both towards the basic science of artificial intelligence and towards the development of clinically useful oncology consultation tools. Artificial Intelligence Objectives We have undertaken AI research with the following aims: (1) to implement and evaluate recently developed techniques designed to make computer technology more natural and acceptable to physicians; (2) to extend the methods of rule-based consultation systems to interact with a large database of clinical information; and (3) to continue basic research into the following problem areas: mechanisms for handling time relationships, techniques for quantifying uncertainty and interfacing such measures with a production rule methodology, approaches to acquiring knowledge interactively from clinical experts, assessment of knowledge base completeness and consistency. Oncology Clinic Objectives We have begun to develop and implement a protocol management system, for use in the oncology day care center with the following capabilities: (1) to assist with identification of current protocols that may apply to a given patient; (2) to assist with determining a patient's eligibility for a given protocol; (3) to provide detailed information on protocols in response to questions from clinic personnel; (4) to assist with chemotherapy dose selection and attenuation for a given patient; 153 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 (5) to provide reminders, at appropriate intervals, of follow-up tests and films required by the protocol in which a given patient is enrolled; (6) to reason about managing current patients in light of stored data from previous visits of (a) the individual patients, or (b) the aggregate of all "Similar" patients. Overview of Goals for 1980: We have described a five-year plan for accomplishing the above goals. As discussed at this time last year, we spent our first year developing a prototype ONCOCIN consultation system, drawing from programs and capabilities developed for the EMYCIN system-building project. During that year, we also undertook a detailed analysis of the day-to-day activities of the Stanford oncology clinic in order to determine how to introduce ONCOCIN with minimal disruption of an operation which is already running smoothly. We also spent much of our time in the first year giving careful consideration to the most appropriate mode of interaction with physicians in order to optimize the chances for ONCOCIN to become a useful and accepted tool in this specialized clinical environment. During our second year of the project, we have accomplished all the goals we identified for 1980: (1) We have completed a special interface program that responds to commands from the customized keypad described last year; (2) We encoded the rules for one more chemotherapy protocol (oat cell carcinoma of the lung) and updated the Hodgkin's Disease protocols when new versions were released late in 1980; these exercises demonstrated the generality and flexibility of the representation scheme we have devised; (3) We developed the software protocols for achieving communication between the interface program and the reasoning program; (4) We have coordinated the printing routines needed to produce hardcopy flowsheets, patient summaries, and encounter sheets; (5) Lines have heen installed between the SUMEX machine room and the oncology clinic, and the new terminal and a hard copy device have been installed in the Oncology Day Care Center for final testing and debugging; and (6) We have just begun to offer the ONCOCIN system for use by oncology faculty and fellows in the morning chemotherapy clinics in which most of the lymphoma patients receive their treatment. We had two additional goals, not explicitly stated in last year's report. One was to design formal evaluation studies that would allow us to assess the impact of ONCOCIN and its acceptance by the physicians for whom E. A. Feigenbaum 154 P41 RROQO785-08 MYCIN Projects Group it is designed. Second, we wanted to experiment with computational techniques for verifying the completeness and consistency of a developing knowledge base. PROGRESS - 1980/81: Further Development and Testing of the Reasoning Program The early prototype of the Reasoning system was described in last year's report in some detail. A more recent summary has been submitted for presentation at the 7th International Joint Conference on Artificial Intelligence. The Reasoner is coded in Interlisp, and is running on the SUMEX computers (both the PDP-10, and the 20/20 on which we have been running when the system is used in the oncology clinic). The Reasoner has been extensively debugged this year. Several hundred sample patient cases have been run, and the results have been reviewed in detail by the collaborating oncologists. When problems have been uncovered by this process, changes in the Reasoner program (or in the encoding of the lymphoma protocol knowledge) have been undertaken. Verification of the Adequacy of the Knowledge Representation Scheme In an effort to verify that the representation scheme we are using will be adequate for arbitrary protocol knowledge that may be encountered in the future, we decided to encode and briefly test the knowledge of a non-lymphoma protocol. We chose the complicated protocol for oat cell (small cell) carcinoma of the lung because it involves a large number of possible therapies and complex interweaving of chemotherapy and radiotherapy. After approximately one month's effort by an experienced programmer, the oat cell protocol had been encoded and ran successfully on a few test cases. In addition, the lymphoma protocols themselves were changed in late 1980, and we spent a few weeks in early 1981 entering the changes implicit in these new versions. In all cases the ONCOCIN representation scheme was adequate to accommodate the protocol knowledge with only minor modifications, if any, and for this reason we are confident that our system will be able to adapt to any other protocols that need to be encoded in the coming years.’ Physician/Computer Interaction The actual mechanics of computer terminal interaction is as important to a clinical system's acceptance as the quality of the program's advice. If a system is slow or cumbersome, physicians will tend to reject it. With this in mind, we have sought to develop an optimal interactive mechanism that will not unreasonably tax the budget of the project. In last year's report we indicated that this interactive system was to be written in PASCAL. After some initial experiments, however, we decided to use SAIL instead. The system is referred to as the "Interviewer", and it has now been fully implemented and debugged. 155 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 As we emphasized when outlining our research goals, we have wanted ONCOCIN to maintain the explanation and justification capabilities that we have argued are crucial to the acceptance of clinical consultation systems. The Interviewer uses a specialized split-screen display that enables the physician to enter patient data entries in one region while pertinent explanations are displayed in another. Development of Mechanisms for Interprocess Communication Because the Reasoner (the Interlisp reasoning program) and the Interviewer (the SAIL program with which the physician interacts) must run in parallel in two different processes on the same machine, we needed to devise mechanisms for allowing these two programs to communicate with one another. This has been a major systems programming task, but we are pleased with the effectiveness of the generalized interprocess communication mechanism that we devised. Desiqning an Evaluation of the ONCOCIN System Because we wish to evaluate formally the impact of ONCOCIN and its effectiveness in the oncology clinic, we have devised a set of three experiments, two of which are already underway. The study designs are outlined in detail in an evaluation document that we have prepared. Verifying the Completeness and Consistency of the Knowledge Base An important question for AI researchers involved with the development of expert systems is how to ascertain that a knowledge base for a consultation program is complete and consistent. Dr. Motoi Suwa, a visitor to Stanford from Japan, became fascinated with this question and collaborated with us on a formal analysis of the developing ONCOCIN knowledge base. His paper describing that work was submitted for presentation at the 7th International Joint Conference on Artificial Intelligence. D. Publications Since January 1980 Aikins, Janice S. Prototypes and Production Rules: A Knowledge Representation for Computer Consultations. PhD Dissertation, Stanford University. Memo HPP-80-17, August, 1980. Bennett, S.W., and Scott, A.C. Computer-assisted customized antimicrobial dosages. Amer. J. Hosp. Pharm. 37:523-9 (1980). Buchanan, B.G. Research on Expert Systems. Memo HPP-81-1, January, 1981. To appear in D. Michie (ed.) Machine Intelligence 10. Campbell, A.B., Chang, P., Cho, J., Hickam, D., Shortliffe, E.H., and Teach, R. Preliminary proposal for the evaluation of the ONCOCIN System, Internal memo, the ONCOCIN Project, April 1981. E. A. Feigenbaum 156 P41 RROO785-08 MYCIN Projects Group ' Clancey, W.J. The Epistemology of a Rule-based Expert System: the effect of proceduralization of knowledge on explanation. To appear in the Journal of Artificial Intelligence. Clancey, W.J. and Letsinger R. NEOMYCIN: Reconf iguring a rule-based expert system for application to teaching. Submitted to IJCAI7. Clancey, W.J. Methodology for building an intelligent tutoring system. To appear in a book on Cognitive Science Methodology, edited by Kintsch, Miller, and Polson. Fagan, L.M., Shortliffe, E.H., and Buchanan, B.G. Computer-based medical decision making: from MYCIN to VM. Automedica 3:97-106 (1980). Gerring, Phil. System documentation: interprocess communication system (TopDog and Interactor). Internal memo, the ONCOCIN Project, November 1980. Shortliffe, E.H., Scott, A.C., Bischoff, M.B., Campbell, A.B., van Melle, W., and Jacobs, C.D. ONCOCIN: An expert system for oncology protocol management. Submitted to Proceedings of the 7th IJCAI, Vancouver, B.C., August 1981. Shortliffe, £.H. Consultation systems for physicians. Proceedings of the CSCSI/SCEIO Conference, 14-16 May 1980, University of Victoria, British Columbia, pp. 1-11. Shortliffe, E.H. Medical Cybernetics: The Challenges of Clinical Computing. To appear in Cybernetics, Technology, and Growth, S. Basheer Ahmed, editor; Lexington Books, 1981. Shortliffe, E.H. Medical Computing: Another Basic Science? Proceedings of the 4th Symposium on Computer Applications in Medical Care, Washington, D.C., November 1980. Shortliffe, E.H. The computer as clinical consultant (editorial). Arch. Int. Med., March 1980. Suwa, M., Scott, A.C., and Shortliffe, E.H. An approach to verifying completeness and consistency in a rule-based expert system. Submitted to Proceedings of 7th IJCAI, Vancouver, B.C., August 1981. Teach, R.L. and Shortliffe, E.H. An analysis of physician attitudes regarding computer-based clinical consultation systems. Submitted for publication in Comput. Biomed. Res., February 1981. van Melle, W., Shortliffe, E.H., Buchanan, B.G. EMYCIN: A domain- independent system that aids in constructing knowledge-based consultation programs. To appear in Pergamon-Infotech State of the Art Report on Machine Intelligence, 1981. van Melle, W. A domain-independent system that aids in constructing knowledge-based consultation programs. PhD thesis, Computer Science Department, Stanford University, June, 1980. 157 E. A. Feigenbaum MYCIN Projects Group P41 RROQ7385-08 E. Funding Support Grant Title: "Knowledge-Based Consultation Systems” Principal Investigator: Bruce G. Buchanan Agency: National Science Foundation ID Number: MCS-7903753 Term: July 1979 to March 1981 Total award: $146,152 Current award (1980): $72,493 [No continuation proposal was submitted to the NSF since the current version of the system successfully completes our proposed work. We intend to use EMYCIN as a vehicle for experimental research under other funding, including SUMEX core research, but we are not proposing further research or development on EMYCIN itself. ] Contract Title: "Exploration of Tutoring and Problem-Solving Strategies" Principal Investigator: Bruce G. Buchanan Agency: Office of Naval Research and Advanced Research Projects Agency (joint) ID number: N00014-79-C-0302 Term: March 1979 to March 1982 Total award: $396,325 Grant Titie: “Explanatory Patterns In Clinica? Medicine" Principal Investigator: Edward H. Shortliffe Agency: Kaiser Family Foundation Term: July 1979 to December 1980 Total award: $20,000 SKIP 2 Grant Title:"Research Program: Biomedical Knowledge Representation" Principal Investigator: Edward A. Feigenbaum Co-Principal Investigator (ONCOCIN Project): Edward H. Shortliffe Agency: National Library of Medicine ID Number: LM-03395 Term: July 1979 to June 1984 Total award: $497,420 Current award (1980-1981): $99,400 Administered through Medicine: ONCOCIN suballocation ($47,845) Grant Title: "Symbolic Computation Methods For Clinical Reasoning” Principal Investigator: Edward H. Shortliffe Agency: National Library of Medicine ID Number: LM-00048 Term: July 1979 to June 1984 Total award: $196,425 Current award (1980-1981): $39,107 E. A. Feigenbaum 158 P41 RROO785-08 MYCIN Projects Group II. INTERACTION WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX A great deal of interest in both MYCIN and EMYCIN has been shown by the medical and academic communities. For two years in succession we were invited by the American College of Physicians to demonstrate MYCIN at the organization's annual meeting (San Francisco, March -1979, and New Orleans, April 1980). The physicians have uniformly been enthusiastic about the program's potential and what it reveals about one current approach to computer-based medical decision making. In both cases, the demonstrations were performed on-line using network access to the SUMEX computer. We have demonstrated our programs to both physicians and computer scientists on numerous additional occasions. At the AIM tutorial in August 1980, both MYCIN and GUIDON were presented to introduce physicians to the field of AI in medicine. GUIDON was also demonstrated on the Dolphin machine at the Xerox-PARC open house during the AAAI in August. In addition, both EMYCIN and GUIDON were featured demonstrations at the annual AIM Workshop, held the same week at Stanford. The TYPER program, developed by SUMEX staff in collaboration with Dr. Larry Fagan of Stanford, was used to good effect at this workshop as well as for informal demonstrations throughout the year. Several project members contributed to the Expert Systems Workshop, sponsored by RAND and ARPA and held in San Diego in August, where EMYCIN was one of the "system building tools" that was studied in detail. The Workshop has led to the preparation of a book, "Building Expert Systems," and many of our research group have written portions of that volume (Buchanan, Clancey, Scott, Aikins, Shortliffe, van Melle). NEOMYCIN was presented to the contractors of the ONR “Instructional systems and advanced training” division, held in Pittsburgh, in January 1981. Presentations of this kind carry SUMEX-AIM results out to cognitive psychologists from around the country. Or. Clancey also presented a talk on GUIDON research at the annual conference of the Association for Development of Computer Instructional Systems in Atlanta, in March 1981, and at the annual conference of AERA in Los Angeles, in April 1981. Several medical school and computer science teachers have also asked to use MYCIN in their computer science or medical computing courses, and we continue to make the programs available frequently to researchers around the world who access SUMEX using the GUEST account. EMYCIN has generated considerable interest in the academic and business communities. We have been in frequent contact with Bud Frawley and Alain Bonnet, of Schlumberger, Chuck Brodnax and Milt Waxman of the Hughes Aircraft Corporation, and Harry Reinstein and Cliff Hollander from IBM Scientific Research Center. EMYCIN, on SUMEX, has been used at the University of Ijilinois and Michigan State University to explore the construction of expert systems. 159 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 B. Sharing and Interaction with Other SUMEX-AIM Projects We have continued collaboration with the RX, VM, and PUFF projects. Our development of a domain-independent system is facilitated by having a number of very different working systems on which to test our additions and modifications to EMYCIN. All the projects have provided us with useful comments and suggestions. The community created on the SUMEX resource has other benefits that go beyond actual shared computing. Because we are able to experiment with other developing systems, such as INTERNIST, and because we frequently interact with other workers (at the AIM Workshop or at other meetings around the country), many of us have found the scientific exchange and stimulation to be heightened. Several of us have visited workers at other sites, sometimes for extended periods, in order to pursue further issues which have arisen through SUMEX- or Workshop-based interactions. In this regard, the ability to exchange messages with other workers, both on SUMEX and at other sites, has been crucial to rapid and efficient exchange of ideas. For example, most of the invitations and planning for the 6th AIM Workshop, held at Stanford in August 1980 and described in detail elsewhere in this report, were accomplished via SUMEX or ARPANET mail. Certainly it is unusual for a small community of researchers with similar scholarly interests to have at their disposal such powerful and efficient communication mechanisms, even among those on opposite coasts of the country. C. Critique of Resource Management The SUMEX facility has maintained the high standards that we have praised in the past. The staff members are always helpful and friendly, and work as hard to please the SUMEX community as to please themselves. As a result, the computer is as accessible and easy to use as they can make it. More importantly, it is a reliable and convenient research tool. We extend special thanks to Tom Rindfleisch for maintaining high professional standards for all aspects of the facility. Due to the introduction of our ONCOCIN work with its special hardware and communication needs, we continue to be aware that we are taxing the limited resources of SUMEX with regards to technical hardware support. It has been next to impossible for one technical specialist (Nick Veizades) to balance the numerous diverse demands on his time. This is not a problem with management of the Resource but a reflection of the need for additional technical personnel associated with SUMEX. We perceive this to be a particularly important requirement in the future as the Resource undertakes an expanded role in the implementation and testing of new hardware. Special mention should be made of the remarkable role played by Tom Rindfleisch and his staff in helping to organize remote demonstrations of SUMEX-AIM programs. In October, 1980, when the NIH Council on Research Resources met in Atlanta, demonstrations of MYCIN and INTERNIST on the DEC 2020 at Stanford were so carefully arranged as to make them seem commonplace. We salute Tom and the staff for their uncomplaining assistance, and are grateful for the efforts they have made to provide a mechanism for facilitating future demonstrations at remote locations. E. A. Feigenbaum 160 P41 RROO785-08 MYCIN Projects Group Finally, we continue to feel the need for more computing power. Much of our research and development continues to take place in the hours from 7 p.m. to 10 a.m., but it is unreasonable to expect all our programming staff to adjust their own schedules around a computer. The existence of the 20/20 has been helpful in permitting demonstrations with good response time, and it has allowed us to introduce ONCOCIN in a real clinical environment, but ongoing R&D on the main machine remains difficult much of the time. Even the evening hours are now seeing higher toad averages than was once the case. We anticipate considerable improvement in this regard as the recently approved additional computing hardware becomes available. In the meantime, much of the work on EMYCIN has been moved to the SCORE computer in the Computer Science Department. Response time aside, we have shifted our development of GUIDON to the Xerox Dolphin in order to take advantage of the larger address space. This also frees up disk space so that we can comfortably develop NEOMYCIN on SUMEX. We also strongly support the creation of the new position assumed by Anne Fadenrecht; her excellent early efforts should be especially helpful in taking some of the load off of Carole Miller and Tom Rindfleisch. III. RESEARCH PLANS A. Project Goals and Plans EMYCIN Now that the design and capabilities of EMYCIN are essentially fixed, we are planning to develop new applications and to use the system as an experimental tool. The applications to electronic fault diagnosis and geology will continue and we expect to find additional medical applications as well. Many of these we expect will be undertaken by other research groups. Because we view artificial intelligence as an experimental science (Buchanan, 1981], we wish to collect data on the nature of problems EMYCIN can help solve and the limitations of the problem solving method embodied in EMYCIN. Our research on knowledge acquisition depends on the existence of a working EMYCIN system. In the ROGET program, currently under development, hierarchical knowledge about consultation systems and their knowledge bases is used to help an expert define a new knowledge base to be used by EMYCIN. For example, the meningitis and pulmonary function knowledge bases both contain rules associating diagnoses with laboratory tests and with clinical findings. ROGET will be able to use this fact to help an expert divide a new rule set into rules using test results and measurements as evidence and another rule set using more subjective evidence. 161 E. A. Feigenbaum MYCIN Projects Group P41 RROO785-08 GUIDON We have now established a good framework for organizing knowledge in an expert system to be used for tutoring. We characterize knowledge by its use for: structuring knowledge sources, supporting (justifying) knowledge sources, or controlling their invocation. In the most general terms, our plans are to do research in acquiring, representing and presenting structural, support, and strategic knowledge. We used this framework to design NEOMYCIN. Experiments during the coming year will provide a basis for developing our model of diagnosis. In particular, we propose: a) to extend NEOMYCIN's model of diagnostic strategy to include common, non-expert approaches. Besides improving the program's ability to mode} the student, this enumeration of the space of strategies will allow us to follow a plan of research similar to Brown's and Burton's, but in the domain of diagnostic strategy as opposed to subtraction procedures. Eventually, we want to develop a principled psychological model that will relate strategies to knowledge and processing abilities. b) further studies of expert reasoning in domains that require “forming a picture” of a malfunctioning process. Experience with NEOMYCIN showed that expert diagnosticians attempt to order the data they collect causally, on a time line. Interpretation of observations can be partially understood as an attempt to match this description of onset, course, severity (intensity, frequency), and causal relations of findings onto known malfunctions that are recalled (indexed) by these process variables. This work will build upon recent advances in understanding causality (e.g., deKleer and Brown). c) exploitation of new technology for experimentation with teaching methods. How can we take advantage of the Dolphin's graphic capabilities in a GUIDON tutorial? Besides graphically presenting rule relationships, we might show the student the same kind of diagrams that we use when describing our knowledge bases to our AI colleagues (hierarchies, diagrams relating compiled associations to underlying causal chains). Other than presentation strategies, we would like to experiment with different interfaces, perhaps to break away from a continuous dialogue to use the screen more as a work space for annotating and examining the knowledge base, and organizing data and hypotheses in a diagnostic problem. E. A. Feigenbaum 162 P41 RROO785-08 MYCIN Projects Group d) incorporate GUIDON as an integral part of the curriculum in in medical diagnosis at Stanford. We propose to make GUIDON available at the Fieischmann Learning Center at Stanford Medical School, just as the traditional programs built at Massachusetts General and Ohio State were made available. In addition, we will work with one or more teaching fellows at the medical school to inctude GUIDON as part of the "clinical diagnosis" course which is taught regularly at Stanford. This will continue our commitment to empirical research to develop our model of diagnosis and the teaching procedures. ONCOCIN During the coming year, there are four principal areas in which we expect to expend our efforts on the ONCOCIN System: (1) The system will be implemented for ongoing use in the Stanford Oncology Clinic, with an experimental evaluation period to begin July 1, 1981. (2) The system will be formally evaluated with regard to its impact on (a) the attitudes of the oncologists, (b) the accuracy and completeness of data collection, and (c) the adequacy of the management decisions made in the clinic. (3) We will begin to encode additional protocols as the lymphoma system comes into regular use and physicians begin to demand the inclusion of a greater percentage of the protocols used in the management of cancer patients at Stanford. (4) We will begin to devote a greater percentage of our time to experiments in encoding complex judgmental reasoning of the sort that is usually performed by expert oncologists and is not formally specified in the protocol documents themselves. Throughout the year we shall continue to relate the requirements of the system we are developing to the underlying artificial intelligence methodologies. We are convinced that the basic science frontiers of AI are best explored in the context of systems for real world use; thus ONCOCIN serves as a vehicle for developing an improved understanding of the issues that underlie all forms of knowledge engineering. B. Requirements for Continued SUMEX Use All the work we are doing (EMYCIN, GUIDON, ONCOCIN, plus continued use of the original MYCIN program) is totally dependent on continued use of the SUMEX resource. The programs all make assumptions regarding the computing environment in which they operate, and the ONCOCIN design in particular depends upon proximity to the DEC 2020 which enables us to use a 9600 baud interface, 163 E. A. Feigenbaum MYCIN Projects Group P41 RRO0785-08 In addition, we have Tong appreciated the benefits of GUEST and network access to the programs we are developing. SUMEX greatly enhances our ability to obtain feedback from interested physicians and computer scientists around the country. Network access has also permitted high quality formal demonstrations of our work both from around the United States and from sites abroad (e.g., Japan, Sweden, Switzerland). We plan to continue development of NEOMYCIN on SUMEX during the next year, whereas the GUIDON/Dolphin effort will continue on the crowded Computer Science Department Dolphin only until the SUMEX individual workstations become available. Using the main SUMEX machine, we intend to make NEOMYCIN fully usable as a consultation program so that it can be compared with MYCIN. In particular, we will be comparing cases run through both MYCIN and NEOMYCIN to see whether simplification and clarification of the rules for purposes of teaching will in turn change the program's accuracy. C. Requirements for Additional Computing Resources The acquisition of the DEC 2020 by SUMEX has been crucial to the growth of our research work, both to insure high quality demonstrations and to enable us to develop a system such as ONCOCIN for real-world use ina clinical setting. As we continue to develop systems that are potentially useful as stand-alone packages (e.g., an exportable EMYCIN), the additional small computers that are planned will be particularly valuable resources. It is not yet clear which machines are optimal for the LISP-based applications we are developing, and an opportunity to test our systems on several smali-to-medium machines will be invaluable and in keeping with our desire to move some of the AIM products into a community of service users. As we have mentioned, the response time on the main machine continues to be a major problem, both during the daytime hours and frequently in the evenings as well. The proposed SUMEX acquisitions that will provide additional cycles and permit off-loading of some users from the PDP-10 will significantly benefit the SUMEX research community. In addition, we believe that our GUIDON experience using the Dolphin personal computer is a significant part of our research. First, the Dolphin's large address space will permit development of the large knowledge base that an intelligent tutoring system requires; we have overgrown the facilities available at SUMEX. Second, the Dolphin's graphics will enable us to develop new methods for presenting material from the knowledge base. Third, the Dolphin will provide a reliable, constant "load-average" machine, for running experiments with students. Finally, the development of GUIDON on the Dolphin demonstrates the feasibility of running intelligent tutoring systems on small, affordable machines in schools and remote sites. We seem to have an insatiable appetite for disk storage space, even though ONCOCIN received an additional substantial allocation since our report last year. ONCOCIN, in particular, has become an extremely large system, and the data files for a clinic full of patients will require substantial additional space. We hope that the planned SUMEX file~server E. A. Feigenbaum 164 P41 RROO785-08 MYCIN Projects Group will allow the allocation of several thousand more pages. It should also help alleviate the need to keep copies of all patient files on both the 10 and the 20/20. D. Recommendations for Future Community and Resource Development In last year's report we made two recommendations for new SUMEX developments: (1) the acquisition of several small machines, linked to the main processor through the Ethernet and able to run INTERLISP, and (2) the formal establishment of a mechanism for providing hardware and communications equipment for SUMEX demonstrations at a distance. Both of these have been acted upon by SUMEX, and we are delighted by this kind of responsiveness. The AIM community is small and close-knit, but there remain communication problems within it. The AIM Workshops are excellent means of transferring information annually, but between Workshops all of us are remiss in not communicating new technical reports and articles. It would be very desirable to maintain a list of current publications from all the AIM research groups, for distribution by ARPANET or U.S. Mail to all others. No group will add to the list, however, unless the benefit of the information gained from such a list exceeds the cost of adding to it. SUMEX may be able to function as a catalyst to this kind of community communication. 165 E. A. Feigenbaum Protein Structure Project P41 RROO785-08 II.A.1.7 Protein Structure Project Protein Structure Modeling Project Prof. —. Feigenbaum and Mr. Allan J. Terry Department of Computer Science Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Technical Goals The goals of the protein structure modeling project are to 1) identify critical tasks in protein structure elucidation which may benefit by the application of AI problem-solving techniques, and 2) design and implement programs to perform those tasks. We have identified two principal areas which are of practical and theoretical interest to both protein crystallographers and computer scientists working in AI. The first is the problem of interpreting a three-dimensional etectron density map. The second is the problem of determining a plausible structure in the absence of phase information normally inferred from experimental isomorphous replacement data. Current emphasis is on the implementation of a program for interpreting electron density maps (EDM's). B. Medical Relevance and Collaboration The biomedical relevance of protein crystallography has been well stated in an excellent textbook on the subject (Blundell & Johnson, Protein Crystallography, Academic Press, 1976): "Protein Crystallography is the application of the techniques of X-ray diffraction ... to crystals of one of the most important classes of biological molecules, the proteins. ... It is known that the diverse biological functions of these complex molecules are determined by and are dependent upon their three-dimensional structure and upon the ability of these structures to respond to other molecules by changes in shape. At the present time X-ray analysis of protein crystals forms the only method by which detailed structural information (in terms of the spatial coordinates of the atoms) may be obtained. The results of these analyses have provided firm structural evidence which, together with biochemical and chemical studies, immediately suggests proposals concerning the molecular basis of biological activity.” The project involves a collaboration between computer scientists at Stanford University and crystallographers at Oak Ridge National Laboratories (Dr. Carrol] Johnson), the University of California at San E. A. Feigenbaum 166 P41 RRO0785-08 Protein Structure Project Francisco (Dr. Robert Langridge), and the University of California at San Diego (under the direction of Prof. Joseph Kraut). Our principal collaborator at UCSD is Dr. Stephan Freer. We also collaborate with Dr. Eric Grosse at Bell Laboratories, whose field is numerical analysis. C. Progress Summary We have completed a major cycle of design review and program reorganization, resulting in the system described in publications four and nine below. The system now has a completely hierarchical, rule-based control structure proceeding from strategy rules, to a set of task rules, ending with individual knowledge sources. This new design seems powerful and flexible enough to provide the basis of a useful EDM interpretation system for protein structure determination. After building the control structure we wanted, we have worked on buitding up the knowledge base. Large chunks of knowledge are called "tasks"; we have implemented five out of a projected set of nine. To date, we have implemented the Initialization task, two tracing tasks, a task to split "group toeholds", and a version of a task that finds "second generation” toeholds. Further details of these tasks and their content can be found in publication number four, We have also continued our efforts to improve the power of our data representations. Towards this end we have implemented a new preprocessor based on Or. Grosse's thesis research. This program is an improved method for finding the critical points of a function. In our case, the peaks of the electron density map are useful guides to atom locations and the full set of critical points are used in the ridge-line analysis discussed in publication one. Finally, we are compiling documentation on the system and the knowledge it embodies. These documents should be sufficiently complete so that we, or other groups, will have Jittle difficulty picking up where we leave off. We also feel that explicit documentation of our model-building heuristics will be useful to the crystallographic community as it provides a new viewpoint, complementary to traditional crystallographic methods. The work currently in progress can be characterized as additions to the knowledge base and work on new data representations. The five tasks currently implemented form the core of the system and suffice to solve about a half of a small protein, The remaining tasks will embody knowledge about finding new toeholds (to restart the trace when it is blocked) and about tracing in areas of the data too complex to interpret with present heuristics. One of the main areas of work along these lines is the incorporation of some notion of stereochemistry and the constraints on three dimensional structure it provides. This will be useful in the matching of features and in the prediction of secondary structure. The last item of work in progress is an attempt to design a data representation that captures volume information. Current representations such as the skeleton preserve topology but do not preserve shape. With the inclusion of volume information, we should be able to capture much of the expert's knowledge of shape and form that presently goes unused. 167 E. A. Feigenbaum Protein Structure Project P41 RROO785-08 D. List of Publications (1) Carrol? Johnson and Eric Grosse, "Interpolation Polynomials, Minimal (2) (3) (4) (5) (6) (7) (8) (9) E. A. Spanning Trees, and Ridge-Line Analysis in Density Map Interpretation", American Crystallographic Association Program and Abstracts, 4:2, Evanston, I11. Aug. 1976 Robert S. Engelmore and H. Penny Nii, "A Knowledge-Based System for the Interpretation of Protein X-Ray Crystallographic Data," Heuristic Programming Project Memo HPP-77~-2, January, 1977. (Alternate identification: STAN-CS-77-589) E.A. Feigenbaum, R.S. Engelmore, C.K. Johnson, "A Correlation Between Crystallographic Computing and Artificial Intelligence," in Acta Crystallographica, A33:13, (1977). (Alternate identification: HPP-77- 15) Robert Engelmore and Allan Terry, "Structure and Function of the CRYSALIS System", Proc. 6IJCAI, 1979. pp250-256 (Alternative identification: HPP-79-16) R.S. Engelmore, A. Terry, S.T. Freer, and C.K. Johnson, “A Knowledge- Based System for Interpreting Protein Electron Density Maps", Abstracts of Amer. Crystallographic Ass. 7,1 (1979) p38 E.H. Grosse, "Approximation and Optimization of Electron Density Maps", Stanford University Ph.D. Thesis, Dec. 1980 (Alternative identification: STAN-CS-80-835) R. Engelmore and A. Terry, Article VII.C3 (Crysalis) in Barr, A., and Feigenbaum, E. A. (eds.), The Handbook of Artificial Intelligence, Vol. II, Stanford Ca., HeurisTech Press, Los Altos, Ca.: Kaufman, 1981 A. Terry and R. Engelmore, "A Knowledge-Based Approach to the Interpretation of Protein Electron Density Maps", to appear in a forthcoming book on expert systems by Pergamon Infotech International, Maidenhead, England A. Terry, “Hierarchical Control of Production Systems", paper submitted to 7IJCAI Feigenbaum 168 P41 RROO785-08 Protein Structure Project E. Funding status Grant title: The Automation of Scientific Inference: Heuristic Computing Applied to Protein Crystallography Principal Investigator: Prof. Edward A. Feigenbaum Funding Agency: National Science Foundation Grant identification number: MCS 79-33666 Term of award: December 1, 1979 through November 31, 1981 Amount of award: $35,318 (direct costs only) II. INTERACTION WITH THE SUMEX-AIM RESOURCE A. Collaborations The protein structure modeling project has been a collaborative effort since its inception, involving co-workers at Stanford and UCSD (and, more recently, at Oak Ridge, UCSF, and Bell Laboratories). The SUMEX facility has provided a focus for the communication of knowledge, programs and data. Without the special facilities provided by SUMEX the research would be seriously impeded. Computer networking has been especially effective in facilitating the transfer of information. For example, the more traditional computational analyses of the UCSD crystallographic data are made at the CDC 7600 facility at Berkeley. As the processed data, specifically the EDM's and their Fourier transforms, become available, they are transferred to SUMEX via the FIP facility of the ARPA net, with a minimum of fuss. (Unfortunately, other methods of data transfer are often necessary as well -- see below.) Programs developed at SUMEX, or transferred to SUMEX from other laboratories, are shared directly among the collaborators. Indeed, with some of the programs which have originated at UCSD and elsewhere, our off-campus collaborators frequently find it easier to use the SUMEX versions because of the interactive computing environment and ease of access. Advice, progress reports, new ideas, general information, etc. are communicated via the message and/or bulletin board facilities. B. Interaction with Other SUMEX-AIM Projects Our interactions with other SUMEX-AIM projects have been mostly in the form of personal contacts. We have strong ties to the MYCIN, AGE and MOLGEN projects and keep abreast of research in those areas on a regular basis through informal discussions. The SUMEX-AIM workshops provide an excellent opportunity to survey all the projects in the community. Common research themes, e.g. knowledge-based systems, as well as alternate problem-solving methodologies were particularly valuable to share. 169 E. A. Feigenbaum Protein Structure Project P41 RROO785-08 C. Critique of Resource Services The SUMEX facility provides a wide spectrum of computing services which are genuinely useful to our project -- message handling, file management, Interlisp, Fortran and text editors come immediately to mind. Moreover, the staff, particularly the operators, are to be commended for their willingness to help solve special problems (e.g., reading tapes) or providing extra service (e.g. immediate retrieval of an archived file). We would also like to commend the staff for its extensive help in setting up a link between SUMEX and Dr. Langridge's group at UCSF. Such cooperative behavior is rare in computer centers. There are several facilities we wish to single out as particularly useful in furthering our research goals. Since the members of the project are physically distant, the MSG program is very useful. Similarly, the file system, the ARCHIVE facility, and the general ease of getting backup files from the operator greatly aid our efforts at coordinating the efforts of collaborators using many large data sets and programs. The crystallographers in the project find SUMEX to be a friendly environment which allows them to do their work with a minimum of dealing with operating system details. It has become increasingly evident, however, that as CRYSALIS expands, the facility cannot provide enough machine cycles during prime time to support the implementation and debugging of new features. For example, our segment-labeling preprocessor requires about an hour of machine time per 100 residues of protein (this is typically five to eight hours of terminal time during working hours) even when the Lisp code is compiled. III. USE OF SUMEX DURING THE REMAINING GRANT PERIOD (8/79 - 7/81) A. Long-Range Goats Our short term goals are to build up the knowledge base to the point where it can solve a small, known protein from "live" data. This will probably entail the implementation of at least seven tasks. By this point we should also have a package of data-reduction programs suitable for export to interested crystallographers. Our long range goals are the exploitation of the rule-based control structure for investigating alternative problem-solving strategies, the investigation of modes of explanation of the program's reasoning steps, and the expansion and generalization of the system to cover a wider range of input data. B. Justification for Continued Use of SUMEX We feel that SUMEX is the ideal vehicle for further research on CRYSALIS. While some of our work is numerical in nature and uses such facilities as FORTRAN, our main interest is in artificial intelligence. Besides being an expert system of use to the crystallographic community, E. A. Feigenbaum 170 P41 RROO785-08 Protein Structure Project CRYSALIS is an exploration of the general signal processing problem. We are vitally concerned with issues such as proper architecture for using a wide variety of heuristics effectively and hypothesis formation when both data and model are poor. The utility of our work to the AI community is partially demonstrated by the development of the AGE project, an extension of Ms. Nii's early work on CRYSALIS. This project progresses by the collaboration of several physically- separated groups. SUMEX provides a unique resource, an electronic community of researchers in our field, through the many systems such as net mail, country-wide access, and community workshops. We feel that CRYSALIS would not be possible outside of such a community. C. Needs and Plans for Other Computing Resources Our major need for other computing resources is for graphical display of our data and results. This need will be met by use of Dr. Langridge's Evans and Sutherland Picture System at UCSF and Dr. Johnson's raster-based graphics system at ORNL. The major impediment is SUMEX's current inability to support data transfer to other machines at more than 1200 baud. We are attempting to link SUMEX to UCSF by using FTP over the ARPAnet to the LBL machine and then use an existing link from LBL to UCSF. We will make minor use of the Stanford Computer Science Department's SCORE machine, mostly to run the SCRIBE text formatting program until such time as it is available on SUMEX. D. Recommendations for Future Community and Resource Development There are two recommendations we wish to make, the first and most important is to expand the computing power available to SUMEX users. CRYSALIS is an inherently-large problem. Proteins contain hundreds, to thousands of atoms which means Targe hypothesis structures, large quantities of data, and a compute-bound inference program. As the system grows to maturity, we expect increasingly serious problems with address Space limitations and with machine cycle availability. The second recommendation is that SUMEX develop some relatively inexpensive file transfer facility for machines not on the ARPAnet. Software for this already exists in the form of the TTYFTP program (or possible future programs Tike it, but in a more portable language), the development needed is in hardware and in the TENEX operating system so that transfer rates greater than 1200 baud’can be achieved. We are motivated to recommend this not only by our own need for such a facility, but also by the belief that it would aid other collaborations involving SUMEX and outside computers (the SECS project for example), and aid in the dissemination of useful programs from the research setting of SUMEX to user laboratories. 171 E. A. Feigenbaum RX Project P41 RROO785-08 II1.A.1.8 RX Project The RX Project: Deriving Medical Knowledge from Time-Oriented Clinical Databases Robert L. Blum, M.D. Department of Computer Science Stanford University ; Gio C. M. Wiederhold, Ph.D. Departments of Computer Science and Electrical Engineering Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Technical Goals Introduction: Medical and Computer Science Goals The objective of the RX Project is to develop a medical information system capable of accurately deriving knowledge of the course and consequences of treatment of chronic diseases from a large collection of stored patient records. Computerized clinical databases and automated medical records systems have been under development throughout the world for at least a decade. Among the earliest of these endeavors was the ARAMIS Project, (American Rheumatism Association Medical Information System) under development at Stanford by Dr. James Fries and his colleagues since 1969. A prototype ambulatory records system was generalized in the early 1970's by Prof. Gio Wiederhold and Stephen Weyl in the form of a Time-Oriented Database (TOD) System. The TOD System, run on the IBM 370/3033 at the Stanford Center for Information Processing (SCIP), now supports the ARAMIS Project as well as a host of other chronic disease databases which store patient data gathered at many institutions nation-wide. At the present time ARAMIS contains records of over 14,000 patients with a variety of rheumatologic diagnoses. Over 62,000 patient visits have been recorded, accounting for 50,000 patient-years of observation. The fundamental objective of ARAMIS, the other TOD research groups, and all other clinical data bank researchers is to use the raw data which has been gathered by clinical observation in order to study the evolution and medical management of chronic diseases. Unfortunately, the process of reliably deriving knowledge from raw data has proven to be refractory to existing techniques because of problems stemming from the complexity of disease, therapy, and outcome definitions; the complexity of time relationships; complex causal relationships creating strong sources of bias; and problems of missing and outlying data. E. A. Feigenbaum 172 P41 RRO0Q785-08 RX Project A major objective of the RX Project is to explore the utility of symbolic computational methods and knowledge-based techniques at solving this problem of accurate knowledge inference from non-randomized, non- protocol patient records. A central component of RX is a knowledge base of medicine and statistics, organized as a hierarchy or taxonomic tree consisting of nodes with attached data and procedures. Nodes representing diseases and therapeutic regimens contain procedures which use a variety of time-dependent predicates to label patient records in the database, facilitating the retrieval of time-intervals of interest in the records. The database is then inverted so that each node or object in the knowledge base contains pointers to all time-intervals during which its definition is satisfied. Nodes in the knowledge base also contain lists of other nodes which are causally related. These functional dependencies are used to infer causal pathways among nodes for purposes of selecting confounding variables which need to be controlled for in the study of a specific hypothesis. Causal pathways may also be used in an exploratory mode to assist in the discovery of new hypotheses. To study a particular causal hypothesis the knowledge base also contains information on the applicability of various statistical procedures and procedures for applying them. B. Medical Relevance and Collaboration As a test bed for system development our focus of attention has been on the records of patients with systemic lupus erythematosus (SLE) contained in the Stanford portion of the ARAMIS Data Bank. SLE is a chronic rheumatologic disease with a broad spectrum of manifestations which can lead to death in the third decade of life. With many perplexing diagnostic and therapeutic dilemmas, it is a disease of considerable medical interest. In the future we anticipate possible collaborations with other project users of the TOD System such as the National Stroke Data Bank, the Northern California Oncology Group, and the Stanford Divisions of Oncology and of Radiation Therapy. The RX Project is a new research effort only in existence for about two years, and, hence the project is still in a developmental stage. The primary issues being addressed at this stage are those concerned with the specifics of knowledge representation. We believe that this research project is broadly applicable to the entire gamut of chronic diseases which constitute the bulk of morbidity and mortality in the United States. Consider five major diagnostic categories which are responsible for approximately two thirds of the two million deaths per year in the United States: myocardial infarction, stroke, cancer, hypertension, and diabetes. Therapy for each of these diagnoses is fraught with controversy concerning the balance of benefits versus costs. 173 E. A. Feigenbaum RX Project P41 RROO785-08 1} Myocardial Infarction: Indications for and efficacy of coronary artery bypass graft vs. medical management alone. Indications for long-term antiarrhythmics ... long-term anticoagulants. Benefits of cholesterol-lowering diets, exercise, etc. 2) Stroke: Efficacy of long-term anti-platelet agents, long-term anticoagulation. Indications for revascularization. 3) Cancer: Relative efficacy of radiation therapy, chemotherapy, surgical excision - singly or in combination. Optimal frequency of screening procedures. Prophylactic therapy. 4) Hypertension: Indications for therapy. Efficacy versus adverse effects of chronic antihypertensive drugs. Role of various diagnostic tests such as renal arteriography in work-up. §) Diabetes: Influence of insutin administration on microvascular complications. Role of oral hypoglycemics. Despite the expenditure of billions of dollars over recent years for randomized controlled trials (RCT's) designed to answer these and other questions, answers have been slow in coming. RCT's are expensive of funds and personnel. The therapeutic questions in clinical medicine are too numerous for each to be addressed by its own series of RCT's. On the other hand, the data regularly gathered in patient records in the course of the normal performance of health care delivery is a rich and largely underutilized resource. The ease of accessibility and manipulation of these data afforded by computerized clinical. data banks holds out the possibility of a major new resource for acquiring knowledge on the evolution and therapy of chronic diseases. The goal of the research which we are pursuing on SUMEX is to increase the reliability of knowledge derived from clinical data banks with the hope of providing a new tool for augmenting knowledge of diseases and. therapies as a supplement to knowledge derived from formal prospective clinical trials. Furthermore, the incorporation of knowledge from both clinical data banks and other sources into a uniform knowledge base should increase the ease’ of access by individual clinicians to this knowledge and thereby facilitate both the practice of medicine as well as the investigation of human disease processes. C. Highlights of Research Progress 1. 1 July 1980 to 1 May 1981 Our predominant objective was to detail the overall conceptual framework for the knowledge base and to develop the extensive computational machinery necessary for retrieving, analyzing, and displaying defined time- intervals within patient records. E. A. Feigenbaum 174 P41 RROO785-08 RX Project The RX Knowledge Base (KB): The central component of RX is a knowledge base of medicine and statistics, organized as a frame-based, taxonomic tree consisting of units with attached data and procedures. Units representing diseases and therapies contain procedures which use a variety of time-dependent predicates to label the patient records, facilitating the retrieval of time-intervals of interest in the records. Other units representing statistical techniques are used to map hypotheses onto study designs and event definitions. Implementing the algorithms and data structures of this KB was one of the major tasks of the current year. At the current time the RX KB contains about 200 units of which 75 contain definitions and other relevant .information pertaining to disease courses, effects of drugs, lab values, etc. This information compromises a small subset of medical knowledge dealing with some of the signs and symptoms of systemic lupus erythematosus (SLE) as well as the effects and indications of some drugs used for this disease. Other units contain machine-readable knowledge of statistical techniques needed for testing entered hypotheses. There are approximately 40 time-dependent functions used to map from the database values onto defined units. The entire RX system currently contains approximately 400 INTERLISP functions accounting for 150 disk pages of code. The KB is about 60 disk pages. One disk page = 512 words * 36 bits per word. Also one disk page = approx. 1.5 typed pages on 8.5 by 11.5 inch paper. Statistical Interfaces: Once the relevant episodes have been defined and retrieved from the database they must be analyzed statistically. To do this we have recently adopted the IDL or Interactive Data-Analysis Language package developed at the Xerox Palo Alto Research Corp. IDL is a matrix manipulation language similar to APL and is built upon INTERLISP as is RX itself. The use of IDL for statistical analysis confers a tremendous advantage in that analyses are now highly interactive. IDL has completely supplanted our use of SPSS. Time-Oriented Graphics Package: This package enables data on an individual patient to be graphed over time, either linearly by visit or by calendar time with a "telescoping" capability. The program overlays graphs of both point data and data represented as episodes. Study Editor: Dr. Jerrold Kaplan, a research associate affiliated with the project, has implemented an additional package of programs which display to the clinician user those decisions which have been made by the knowledge base concerning which statistical techniques are to be employed, which variables are to be controlled for, and which time intervals are to be excluded. This affords the user with a means for seeing a sketch of the study plan before it is executed, and enables him to modify that plan. 175 E. A. Feigenbaum RX Project P41 RRO0785-08 Clinical Study: The Effect of Prednisone on Cholesterol As a testbed for the prototype system we have been investigating the hypothesis that the steroid, prednisone, produces a significant elevation of plasma cholesterol. To test this hypothesis, the records of 50 patients with systemic lupus erythematosus (SLE) were transferred from the ARAMIS Database to SUMEX. Of these patients, 18 were found to have five or more cholesterol determinations and to have had sufficient variance in their prednisone regimens to be testable. The KB is used to elaborate a complex causal model for the prednisone/cholesterol hypothesis which is tested using a hierarchical multiple regression method with time-lagged values. The KB is used to determine sources of possible bias and to control for those variables in the regression or to eliminate corresponding time- intervals from records. An empirical Bayes method is used to average the estimated effects in patients with varying amounts of data. The result, a highly statistically significant elevation of cholesterol by prednisone, will be submitted for publication during the coming year. 2. Research In Progress Much work remains to be done in expanding the system software and in expanding the knowledge base. Current work is addressed to increasing the flexibility of the time-segmentation functions and enriching the data structures which encode relationships among objects. We are trying to make increasingly general the class of medical hypotheses which the system can analyze automatically. This requires incorporating knowledge of additional statistical methods into the KB. We are also attempting to generalize our algorithms for selecting the set variables which may potentially confound a given hypothesis... As a means for testing and expanding the system's capabilities we intend to perform several specific studies of importance in the management of the rheumatic diseases. Our study of the effect of prednisone on cholesterol was mentioned above. Other studies now being planned include the effect of chronic aspirin ingestion on liver function in rheumatoid arthritis, the specific incidence of infectious complications of steroids as a function of dose and duration, and the utility of various autoantibodies in the prediction of flares of SLE as compared to the utility of other indicators. Finally, we are developing a methodology for discovering hypotheses of interest in the database using a heuristically guided search of non- parametric lagged cross correlations. D. Publications Blum, Robert L.: Discovery and Representation of Causal Relationships from a Time-Oriented Clinical Database: The RX Project. Stanford University Doctoral Dissertation (in press), 1981 E. A. Feigenbaum 176 P41 RROO785-08 RX Project Bium, Robert L.: Displaying Clinical Data from a Time-Oriented Database. (in press) Computers in Biology and Medicine, 1981 Wiederhold, Gio: Databases in Healthcare. Compendium Series on Technology in Healthcare, sponsored by the Healthcare Technology Center, Univ. of Missouri, Columbia, Mo., also available as Stanford CS Report 80-790 Blum, Robert L.: Automating the Study of Clinical Hypotheses on a Time- Oriented Data Base: The RX Project. Submitted for publication to MEDINFO80, Tokyo, Japan, Oct. 1980 Blum, Robert L., Wiederhold, Gio: Inferring Knowledge from Clinical Data Banks Utilizing Techniques from Artificial Intelligence. Proc. of The 2nd Annual Symp. on Computer Applications in Medical Care, pp. 303 to 307, IEEE, Washington, D.C., November 5-9, 1978 E. Funding Support Status 1) A Computer-Based System for Advising Physicians on Clinical Therapeutics Robert L. Blum, M.D.: Awardee Post-Doctoral Research Fellowship in Clinical Pharmacology Pharmaceutical Manufacturers' Association Foundation Total award: $32,500 (direct) Term: July 1, 1978 to June 30, 1980 2) Integrating Medical Knowledge and Clinical Data Banks Robert L. Blum, M.D.: Principal Investigator National Library of Medicine, New Investigator Award Total award: $90,000 (direct) Term: July 1, 1979 to June 30, 1982 3) Integrating Medical Knowledge and Clinical Data Banks Gio C. M. Wiederhold, Ph.D.: Principal Investigator National Center for Health Services Research, Small Grants Total award: $35,000 (direct) Term: April 1, 1979 to March 31, 1981 II. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Collaborations Since our project is relatively new, we do not yet have public versions of the programs. There is, however, a large sphere of collaboration which we expect in the future. Once the RX program is devetoped, we would anticipate collaboration with all of the ARAMIS project sites in the further development of a knowledge base pertaining to the chronic arthritides. The ARAMIS Project at SCIP is used by a number of institutions around the country via commercial leased lines to store and process their data. These institutions include the University of California School of Medicine, San Francisco and Los Angeles; The Phoenix Arthritis Center, Phoenix; The University of Cincinnati School of Medicine; The University of Pittsburgh School of Medicine; Kansas University; and The 177 E. A. Feigenbaum RX Project P41 RROO785-08 University of Saskatchewan. All of the rheumatologists at these sites have closely collaborated with the development of ARAMIS, and their interest in and use of the RX project is anticipated. We hasten to mention that we do not expect SUMEX to support the active use of RX as an on-going service to this extensive network of arthritis centers, but we would like to be able to allow the national centers to participate in the development of the arthritis knowledge base and to test that knowledge base on their own clinical data banks. B. Interactions with Other SUMEX-AIM Projects Several of the concepts incorporated into the design of the RX Project have been inspired by other SUMEX-AIM Projects. The RX knowledge base is similar to the Units Package of. the MOLGEN PROJECT. The production rule inference mechanism used by us is similar to that in the MYCIN Project. Several programs developed by the MYCIN group are regularly used by RX. These include disk hash file facilities, text editing facilities, and miscellaneous LISP functions. Regular communication on programming details is facilitated by the on-line mail system. C. Critique of Resource Management The SUMEX KI-10 has been severely overloaded for at least a year. Working in LISP is impossible during the day and is even difficult at times which were formerly low utilization times. This has forced us to rely increasingly on other local computation facilities. The SUMEX resource management, per se, has always been accessible and cooperative in trying to provide our project with adequate resources subject to prevailing constraints. III. RESEARCH PLANS A. Project Goals and Plans The overall goal of the RX Project is to develop a computerized medical information system capable of accurately extracting medical knowledge pertaining to the therapy and evolution of chronic diseases from a database consisting of a collection of stored patient records. 1. Short-Term Goals Goals for the year August, 1980 to July, 1981 have been detailed in section IC. above on research in progress. To summarize that section, our main short-term goal is to generalize and refine our methods for labeling and retrieving time-intervals or episodes from individual patient records and to generalize the class of hypotheses which the system is capable of analyzing. This requires further refinements in RX's algorithms for choosing and controlling for variables which may potentially confound an hypothesis of interest. E. A. Feigenbaum 178 P41 RROO785-08 RX Project 2. Long-Range Goals: August, 1981 to July, 1986 There are two inter-related long-range goals of the RX Project: 1) automatic discovery of knowledge in a large time-oriented database and 2) provision of assistance to a clinician who is interested in testing a specific hypothesis. These tasks overlap to the extent that some of the algorithms used for discovery are also used in the process of testing an hypothesis. We hope to make these algorithms sufficiently robust that they will work over a broad range of hypotheses and over a broad spectrum of data distributions in the patient records. B. Justification for Continued Use of SUMEX Computerized clinical data banks possess great potential as tools for assessing the efficacy of new diagnostic and therapeutic modalities, for monitoring the quality of health care delivery, and for support of basic medical research. Because of this potential, many clinical data banks have recently been developed throughout the United States. However, once the initial problems of data acquisition, storage, and retrieval have been dealt with, there remains a set of complex problems inherent in the task of accurately inferring medical knowledge from a collection of observations in patient records. These problems concern the complexity of disease and outcome definitions, the complexity of time relationships, potential biases in compared subsets, and missing and outlying data. The major problem of medical data banking is in the reliable inference of medical knowledge from primary observational data. We see in the RX Project a method of solution to this problem through the utilization of knowledge engineering techniques from artificial intelligence. The RX Project, in providing this solution, will provide an important conceptual and technologic Tink to a large community of medical research groups involved in the treatment and study of the chronic arthritides throughout the United States and Canada, who are presently using the ARAMIS Data Bank through the SCIP facility via TELENET. Beyond the arthritis centers which we have mentioned in this report, the TOD (Time-Oriented Data Base) User Group involves a broad range of university and community medical institutions involved in the treatment of cancer, stroke, cardiovascular disease, nephrologic disease, and others. Through the RX Project, the opportunity will be provided to foster national collaborations with these research groups and to provide a major arena in which to demonstrate the utility of artificial intelligence to clinical medicine, SUMEX as a Resource: To discuss SUMEX as a resource for program development, one need only compare it to the environment provided by our other resource, the IBM 370/3033 installation at SCIP - the major computing resource at Stanford. Of the programs which we use daily on SUMEX -INTERLISP, MSG, TVEDIT, BBO, LINK - there is nothing even approaching equivalence on the 370, despite 179 E. A. Feigenbaum RX Project P41 RROO785-08 its huge user community. These programs greatly facilitate communication with other researchers in the SUMEX community, documentation of our programs, and the rapid interactive development of the programs themselves. The development of a program involving extensive symbolic processing and as large and complex as RX at the SCIP facility, would require a staff many times as large as ours. The SUMEX environment greatly increases the productive potential of a research group such as ours to the point where a large project like RX becomes feasible. Computation Resources Required by RX: Disk Allocation: RX requires the use of two large data files which need to be kept on- line: the patient database (DB) and the knowledge base (KB). In the course of testing a hypothesis several other files are used: inverted files, source files for statistical processing, LISP SYSOUT files, etc. Our current total disk allocation of 1500 pages for all RX group members has been just adequate. [In the future, with anticipated expansions in numbers of patients and size of the KB, we intend to request an increase of our total allocation to 2000 pages. C. Other Computational, Resources It is clear that the scope of potential application of the RX Project is large. Within the term of the SUMEX-AIM grant projected through July, 1986, we anticipate the involvement of several of the national ARAMIS collaborating institutions in developing and testing arthritis knowledge bases which reflect their own patient populations and therapeutic biases. The current SUMEX machine configuration will not be able to support this National interaction because the central processors of the KI-10 are already taxed to the limit. Ours is among the SUMEX groups which would greatly benefit by the addition of one or more PDP-10 compatible machines, which could provide support to our anticipated national user community. Another resource which would be highly desirable is a faster and more reliable means for transferring data interactively between SUMEX and the SCIP IBM 370. Our current method utilizes a 2400 baud Tine with transmission from SCIP to SUMEX only, and is fraught with a high error rate. The addition of a reliable local network facility would greatly facilitate our ability to transfer patient files from SCIP to SUMEX. D. Recommendations for Resource Development SUMEX is heavily loaded everyday and almost every evening. Program research is next to impossible during those periods. Program development would be greatly facilitated by the addition of any resources which lessened this loading: upgrading the current machine to a KL or adding core to decrease page swapping. E. A. Feigenbaum 180 P41 RROO785-08 National AIM Projects II.A.2 National AIM Projects The following group of projects is formally approved for access to the AIM aliquot of the SUMEX-AIM resource or the Rutgers-AIM resource. Their access is based on review by the AIM Advisory Group and approval by the AIM Executive Committee. 181 E. A. Feigenbaum Acquisition of Cognitive Procedures (ACT) P41 RROO785-08 TI.A.2.1 Acquisition of Cognitive Procedures (ACT) Acquisition of Cognitive Procedures (ACT) Dr. John Anderson Carnegie-Mellon University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale To develop a production system that will serve as an interpreter of the active portion of an associative network. To model a range of cognitive tasks including memory tasks, inferential reasoning, language processing, and problem solving. To devetop an induction system capable of acquiring cognitive procedures with a special emphasis on language acquisition and problem-solving skills. B. Medical Relevance and Collaboration 1. The ACT model is a general model of cognition. It provides a useful model of the development of and performance of the sorts of decision making that occur in medicine. 2. The ACT model also represents basic work in AI. It is in part an attempt to develop a self-organizing intelligent system. As such it is relevant to the goal of development of intelligent artificial aids in medicine. We have been evolving a collaborative relationship with James Greeno and Allan Lesgold at the University of Pittsburgh. They are applying ACT to modeling the acquisition of reading and problem solving skills. We have made ACT a quest system within SUMEX. ACT is currently at the state where it can be shipped to other INTERLISP facilities. We have received a number of inquiries about the ACT system. ACT is a system in a continual state of development but we periodically’ freeze versions of ACT which we maintain and make available to the national AI community. C. Highlights of Research Progress Our ACTF system is a production system that operates in a semantic network data base. Our learning work has been focused on ways of increasing the power of production systems for performing various tasks. One class of learning mechanisms concern what we call knowledge compilation. This involves automatic mechanisms for creating productions that directly perform behavior that formerly required interpretative processing of knowledge in the semantic network. These compilation mechanisms also model the process by which human experts develop special purpose procedures to deal with the different types of problems that occur in their domain of expertise. E. A. Feigenbaum 182 P41 RR00785-08 Acquisition of Cognitive Procedures (ACT) Another class of learning mechanisms are concerned with tuning existing procedures so that they apply more appropriately. There are various mechanisms concerned with extending or generalizing the range of application of a procedure. In the past year we have been working at reducing these different generalization processes to a common partial matching process. In addition to generalization, tuning occurs in the ACTF system by means of discrimination and composition. Discrimination is a process for restricting the range of applicability of a production. Composition attempts to build macro-operators out of a series of productions. The third direction of our learning work has been concerned with developing a flexible strength-based set of conflict resolution rules. Here we are concerned with modelling the gradual improvement seen in human cognitive skills and also providing the system with the resilience so that it can recover from noise and changes in environmental contingencies. We have been applying this theory in detail to a simulation of how students acquire proof skills in geometry. We have a more or Jess thorough analysis of how students learn new postulates of geometry; we initially use these postulates in an interpretative fashion, integrating them with prior knowledge; how they compile special purpose procedures that directly apply this knowledge to proof generation; and how these procedures become tuned with practice. This application has provided strong evidence for most of the learning developments in the ACT system. It has also forced us to develop formalisms for how planning and probtem-solving should be structured within a production-system framework. D. List of Project Publications {1] Anderson, J.R. Language, Memory, and Thought. Hillsdale, N.J.: L. Erlbaum, Assoc., 1976, [2] Kline, P.J. & Anderson, J.R. The ACTE User's Manual, 1976. [3] Anderson, J.R., Kline, P. & Lewis, C. Language processing by production systems. In P. Carpenter and M. Just (Eds.). Cognitive Processes in Comprehension, L. Erlbaum Assoc., 1977. [4] Anderson, J.R. Induction of augmented transition networks. Cognitive science, 1977, 125-157. 5] Anderson, J.R. & Kline, P. Design of a production system. Paper g . presented at the Workshop on Pattern-Directed Inference Systems, Hawaii, May 23-27, 1977. [6] Anderson, J.R. Computer simulation of a language acquisition system: A second report. In D. LaBerge and S.J. Samuels (Eds.). Perception and Comprehension. Hilisdale, N.J.: L. Erlbaum Assoc., 1978. [7] Anderson, J.R., Kline, P.J., & Beasley, C.M. A theory of the acquisition of cognitive skills. In G.H. Bower (Ed.). Learning and Motivation, Vol. 13. New York: Academic Press, 1979. 183 E. A. Feigenbaum Acquisition of Cognitive Procedures (ACT) P41 RROO785-08 [8] Anderson, J.R., Kline, P.J., & Beasley, C.M. Complex Learning. In R. Snow, P.A. Frederico, & W. Montague (Eds.). Aptitude, Learning, and Instruction: Cognitive Processes Analyses. Hillsdale, N.J.: Lawrence Erlbaum Assoc., 1980. . [9] Anderson, J.R. & Kline, P.J. A learning system and its psychological implications. In the Proceedings of the Sixth International Joint Conference on Artificial Intelligence, 1979. [10] Reder, L.M. & Anderson, J.R. Use of thematic information to speed search of semantic nets. Proceedings of the Sixth International Joint Conference on Artificial Intelligence, 1979, 708-710. [11] Neves, D.M. & Anderson, J.R. Knowledge compilation: Mechanisms for the automatization of cognitive skills. In J.R. Anderson (Ed.), Cognitive Skills and their Acquisition. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1981. [12] Anderson, J.R., Greeno, J.G., Kline, P.J., & Neves, D.M. Acquisition of Problem-solving skill. In J. R. Anderson (Ed.), Cognitive Skills and their Acquisition. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1981. —E. Funding Support An information-processing analysis of learning in geometry John R. Anderson, Principal Investigator National Science Foundation (IST-80-15357) $186,000 Feb 15, 1981 - Feb 15, 1984 II. INTERACTION WITH THE SUMEX-AIM RESOURCE A. Collaborations, Interactions, and Sharing of Programs via SUMEX. We have received and answered many inquiries about the ACT system over the ARPANET. This involves sending documentations, papers, and copies of programs. The most extensive collaboration has been with Greeno and Lesgold who are also on SUMEX (see the report of the Simulation of Comprehension Processes project). There is an ongoing effort to assist them in their research. Feedback from their work is helping us with system design. We find the SUMEX-AIM workshops (those that we could manage to attend) ideal vehicles for updating ourselves on the field and for getting to talk to colleagues about aspects of their work of importance to us. Due to memory space problems encountered by ACT we expect that soon we will need to make use of the smaller version of INTERLISP developed at SUMEX for use in the CONGEN program. E. A. Feigenbaum 184 P41 RROO785-08 Acquisition of Cognitive Procedures (ACT) B. Critique of Resource Management The SUMEX-AIM resource has been well suited for the needs of our project. We have made the most extensive use of the INTERLISP facilities and the facilities for communication on the ARPANET. We have found the SUMEX personnel extremely helpful both in terms of responding to our immediate emergencies and in providing advice helpful to the long-range progress of the project. Despite the fact that we are not located at Stanford, we have not encountered any serious difficulties in using the SUMEX system; in fact, there are real advantages in being in the Eastern time zone where we can take advantage of the low load on the system during the morning hours. We have been able to get a great deal of work done during these hours and try to save our computer-intensive work for this time. Two location changes by the ACT project (from Michigan to Yale in the summer of 1976 and from Yale to Carnegie-Mellon in the summer of 1978) have demonstrated another advantage of working on SUMEX: In both cases we were back to work on SUMEX the day after our arrival. III. RESEARCH PLANS (8/80-7/86) A. Project Goals and Plans Our long-range goals are: (1) Continued development of the ACT system; (2) Application of the system to modeling of various cognitive processes; (3) Dissemination of the ACT system to the national AI community. This is a period of major evolution for the ACT theory. We have been developing three special versions of the ACTF learning that allow us to more efficiently simulate learning in three domains: proving theorems in geometry, speaking a new language, and writing programs in LISP. We are also performing special purpose simulations of the processes of spreading activation in memory retrieval and of pattern-matching processes in reading. We will be assimilating our experiences with these special purpose simulations in putting forth a major revision of the ACT theory. A research monograph is being written setting forth this theory and is scheduled for completion in late 1982. Subsequent to the writing of this monograph we intend to create an ACTG successor to ACTF that will embody the new conceptions. B. Justification for Continued Use of SUMEX: Our goal for the ACT system is that it should serve as a ready-made "programming language" available to members of the cognitive science community for assembling psychologically-accurate simulations of a wide range of cognitive processes. Our intention and ability to provide such a resource justifies our use of the SUMEX facility. This facility is designed expressly for the purpose of developing and supporting such national AI resources and is, in this regard, clearly superior to the facilities we have available locally from the Carnegie-Mellon computer 185 E. A. Feigenbaum Acquisition of Cognitive Procedures (ACT) P41 RROO785-08 science department. Among the most important SUMEX advantages are the availability of INTERLISP on a machine accessible by either the ARPANET or TYMNET and the existence of a GUEST login. It appears that, at least for the time being, ACT has no hope of being a national resource unless it resides at SUMEX and, given the local unavailability of a network- accessible INTERLISP, it would even be very difficult to shift any significant portion of our development work from SUMEX to CMU. C. Needs and Plans for Other Computational Resources Carnegie-Mellon's plans to begin upgrading its PDP-10 hardware to emerging state-of-the-art machines (VAX, LISP machines, etc.) promises to provide a excellent resource eventually, and we hope to have access to that resource as it develops. However, given that a considerable amount of software development will be required, a sophisticated LISP system such as INTERLISP is not likely to be available on this hardware in the near future. D. Comments and Suggestions for Future Resource Goals We are beginning to feel squeezed by various limitations of the SUMEX facility. The problem of peak load is quite serious. We have also been struggling with the address limitations of the current INTERLISP which is made more grievous by the amount of space INTERLISP requires. The computation time and address space limitations have meant that we have not been able to pursue certain projects that we would have otherwise. We applaud any efforts to increase computational power, to increase the address space of INTERLISP (e.g. VAXes), or to create significantly more space efficient versions of INTERLISP. E. A. Feigenbaum 186 P41 RROO785-08 CADUCEUS Project (INTERNIST) IT.A.2.2 CADUCEUS Project (INTERNIST) CADUCEUS Project (*) J. D. Myers, M.D. and H. Pople, Ph.D. University of Pittsburgh Pittsburgh, Pennsylvania I. SUMMARY OF RESEARCH PROGRAM A. Medical Rationale The principal objective of this project is the development of a high- level computer diagnostic program in the broad field of internal medicine as an aid in the solution of complex and complicated diagnostic problems. To be effective, the program must be capable of multiple diagnoses (related or independent) in a given patient. A major achievement of this research undertaking has been the design of a program called INTERNIST-I, along with an extensive medical data base now encompassing over 500 diseases and some 3450 individual manifestations of disease, Although this consultative program is designed primarily to aid skilled internists in complicated medical problems, the program may have spin-off as a diagnostic and triage aid to physicians assistants, rural health clinics, military medicine and space travel. Development of the INTERNIST-I system was begun about ten years ago. The system was successfully demonstrated for the first time in 1974 and has been used since that time in the analysis of hundreds of clinical problems. A major point of departure for the design of the original INTERNIST program was the realization that the task of clinical decision making in internal medicine is an i11]1-structured problem. In other domains, the task of diagnosis is often viewed as one of pattern recognition or discrimination: there is available a predefined collection of possible classifications (characterizing disease entities or clinical states), one and only one of which is considered possible in the case being studied. A diagnostic problem solver dealing with such a well structured domain has the fairly straightforward task of selecting that one of this fixed set of alternatives which best fits the facts of the case. Many statistical, (*) For a variety of reasons, including a request from an agency alleging a prior claim on the name, future generations of the diagnostic program originally called INTERNIST will subsequently be referred to as CADUCEUS. This universal symbol of the medical profession seems appropriate to the expanded role we see for this type of program in the years to come. To avoid confusion in this report, the original program will continue to be called INTERNIST-I while references to the successor system, originally called INTERNIST-II, will now employ the new name. 187 E. A. Feigenbaum CADUCEUS Project (INTERNIST) P41 RROO785-08 pattern recognition, and algorithmic techniques have been employed successfully in performing computer aided diagnosis in these well structured clinical problem domains. Primarily because complex cases often involve two or more concurrently active disease processes, no set of exhaustive and mutually exclusive classifications can be developed to structure the diagnostic problem in internal medicine. In principle, it might be argued that this more complex problem domain could be reduced to a simple discrimination task if, in addition to the individual disease entities, one includes appropriate multiple disease complexes in the set of allowable patient descriptors. However, since our experience indicates that as many as ten or twelve individual descriptors may apply in a complex clinical problem, and considering that there are a thousand or more individual descriptors of interest in Internal Medicine, the prospect of recording explicitly all possible multiple disease classifications is clearly infeasible. Qur thesis is that, in the absence of explicit structure derived from the problem domain, the successful clinician engages in heuristic imposition of structure so that effective problem solving strategies might be selected and employed for decision making relative to the postulated problem structure. In INTERNIST-I, this concept of heuristic imposition of structure is expressed primarily by means of a novel "problem-formation” heuristic. In effect, the program composes dynamically, on the basis of evidence provided, what in context constitutes a presumed exhaustive and mutually exclusive subset of disease entities that can explain, more or less equally well, some significant subset of the observed findings in a clinical case. This heuristic problem structuring procedure is invoked repeatedly during the course of a diagnostic consultation in order to deal sequentially with the component parts of a complex clinical problem. Because this program is intended to serve a consulting role in medical diagnosis, it has been challenged with a wide variety of difficult clinical problems: cases published in the medical journals, cpc's, and other interesting and unusual problems arising in the local teaching hospitals. In the great majority of these test cases, the problem- formation strategy of INTERNIST-I has proved to be effective in sorting out the pieces of the puzzle and coming to a correct diagnosis, involving in some cases as many as a dozen disease entities. On the basis of this extensive test of the initial INTERNIST-I system, it has become clear that many aspects of the system's performance could be significantly enhanced if it would be possible to deal with the various component problems and their interrelationships simultaneously. This has led to the design of CADUCEUS, a system embodying strategies of concurrent problem-formation which we expect will yield more rapid convergence to the correct diagnosis in many cases, and in at least some cases provide more acceptable diagnostic behavior. E—E. A. Feigenbaum 188 P41 RROO785-08 CADUCEUS Project (INTERNIST) B. Medical Relevance and Collaboration The program inherently has direct and substantial medical relevance. The institution of collaborative studies with other institutions has been deferred pending completion of the programs and knowledge base enhancements required for CADUCEUS. The installation of our own, dedicated VAX computer expected this summer will considerably aid future collaboration. C. Highlights of Research Progress Accomplishments This Past Year: a) Prototypic computer programs have been written to operate CADUCEUS in the new diagnostic mode. The entire medical data base for the liver and biliary tract diseases has been reorganized into a form compatible with and utilizable by the CADUCEUS programs. Implementation of this work is pending the installation of the VAX computer when all of the programs must be written or rewritten using the FRANZ-LISP language. b) The medical knowledge base comprising now just over 500 individual diseases and some 3450 manifestations of disease and hundreds of thousands of individual medical “facts,” has been cumulative for the past eight years. Much effort has been spent during the past year in updating several dozens of diseases, most of which had been profiled years ago, and in establishing uniformity and consistency in this vast knowledge base. In addition, 17 new diseases have been profiled. The pediatric knowledge base has been expanded and now includes 78 diseases. c) INTERNIST up to this time has been deficient in anatomic knowledge, particularly in topographical anatomy and anatomic laterality. An anatomic knowledge base beginning with neuroanatomy {the most complex) is being built for later incorporation into CADUCEUS. The knowledge base for the peripheral nervous system and the spinal cord is largely completed. The topographical anatomy of the abdomen and thorax are partially completed. Research in Progress: There are five major components to the continuation of this research project: 1) The completion, continued updating, refinement and testing of the extensive medical knowledge base required for the operation of INTERNIST-I. 2) The completion and implementation of the improved diagnostic consulting program, CADUCEUS, which has been designed to overcome certain performance problems identified during the past five years' experience with the original INTERNIST-I program. 189 E. A. Feigenbaum CADUCEUS Project (INTERNIST) P41 RROO785-08 3) Institution of field trials of CADUCEUS on the clinical services in internal medicine at the Health Center of the University of Pittsburgh. 4) Expansion of the clinical field trials to other university health centers which have expressed interest in working with the system. 5) Adaptation of the diagnostic program and data base of CADUCEUS to subserve educational purposes and the evaluation of clinical performance and competence. Current activity is devoted mainly to the first two of these, namely, the continued development of the medical knowledge base, and the implementation of the improved diagnostic consulting program (CADUCEUS). The development of the anatomic knowledge base is mentioned above. Doctor Gordon Banks, a skilled neurologist who also has a Ph.D. im physics and considerable experience in computing, will be joining the team as of July 1, 1981 and will provide manpower and expertise for the further development of the sizeable and important neurological component of the medical knowledge base and its manipulation by the CADUCEUS programs. D. List of Relevant Publications 1. Pople, H.E. “The Formation of Composite Hypotheses in Diagnostic Problem Solving: An Exercise in Synthetic Reasoning", Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Boston, August 1977. 2. Pople, H.E. "On the Knowledge Acquisition Process in Applied A.I. Systems", Report of Panel on Applications of A.I., Proceedings of Fifth International Joint Conference on Artificial Intelligence, 1977. 3. Pople, H.E., Myers, J. D. & Miller, R.A. “The DIALOG Model of Diagnostic Logic and its Use in Internal Medicine, Proceedings of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, USSR, September 1975. 4. Pople, H.E. "Artificial Intelligence Approaches to Computer-Based Medical Consultation, Proceedings IFEE Intercon, New York, 1975. 5. Myers, J. D. "The Process of Clinical Diagnosis and Its Adaptation to the Computer," in The Logic of Discovery and Diagnosis in Medicine, University of Pittsburgh Series in the Philosophy and History of Science, University of California Press (in press). 6. Myers, J. D., Pople, H. E. & Miller, R. A. “INTERNIST: Can Artificial Intelligence Help?" in Clinical Decision and Laboratory Use, University of Minnesota Press (in press). E. A. Feigenbaum 190 P41 RROO785-08 CADUCEUS Project (INTERNIST) 7. Pople, H. E. "Coming to Grips with the Multiple Diagnosis Problem," in Computer-Assisted Decision Making Using Clinical and Paraclinical (Laboratory) Data. 8B. Statland & S. Bauer (eds.) Mediad Inc., Tarrytown, N. Y., 1980, pp. 81-88. Reprinted in The Logic of Discovery and Diagnosis in Medicine, University of Pittsburgh Series in the Philosophy and History of Science, University of California Press (in press). 8. Pople, H. E. “Heuristic Methods for Imposing Structure on I11- Structured Problems: The Structuring of Medical Diagnostics," in Artificial Intelligence in Medicine, AAAS Symposium Series, Westview Press (forthcoming 1981). E. Funding support 1. Clinical Decision Systems Research Resource Harry E. Pople, Jr., Ph.D. Associate Professor Business Jack D. Myers, M.D. University Professor (Medicine) University of Pittsburgh Division of Research Resources National Institutes of Health 2 R24 RRO1101-04 07/01/80 - 06/30/85 $1,607,717 07/01/80 - 06/30/81 $465,199 2. INTERNIST: A Computer-Based Diagnostic Consultant Harry E. Pople, Jr., Ph.D. Associate Professor of Business Jack D. Myers, M.D. University Professor (Medicine) University of Pittsburgh National Library of Medicine National Institutes of Health 1 RO1 LM03710-01 07/01/80 - 06/30/85 $817,884 07/01/80 - 06/30/81 $148, 458 191 E. A. Feigenbaum CADUCEUS Project (INTERNIST) P41 RROQO785-08 3. New Computer-Based Patient Case Simulator Randolph A. Miller, M.D. Associate Professor of Medicine University of Pittsburgh National Library of Medicine - New Investigator National Institutes of Health 1 R23 LM03589-01 07/01/80 - 06/30/83 $89,350 07/01/80 - 06/30/81 $32,750 Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A, B. Collaborations and Medical Use of Program Via SUMEX CADUCEUS remains in a stage of-research and development. As noted above, we are continuing to develop better computer programs to operate the diagnostic system, and the knowledge base cannot be used very effectively for collaborative purposes until it has reached a critical stage of completion. These factors have stifled collaboration via SUMEX up to this point and will continue to do so for the next year or two. In the meanwhile, through the SUMEX community there continues to be an exchange of information and states of progress. Such interactions particularly take place at the annual AIM Workshop. C. Critique of Resource Management SUMEX has been an excellent resource for the development of CADUCEUS. Our large program is handled efficiently, effectively and accurately. The staff at SUMEX have been uniformly supportive, cooperative, and innovative in connection with our project's needs. III, RESEARCH PLANS (7/81-6/86) A. Project Goals and Plans The prototype CADUCEUS programs and the trial reorganization of the liver and biliary tract diseases will be installed in the VAX over the summer and fall of this year. As rapidly as possible and pending further refinement and reorganization from experience with the new system, the remainder of the medical knowledge base will be entered. Local and later collaborative field trials must necessarily be postponed until this development has been accomplished. E. A. Feigenbaum 192 P41 RROO785-08 CADUCEUS Project (INTERNIST) At least 200 important medical diseases remain to be programmed. Renewed effort in this direction is now being expanded now that other tasks have been surmounted. Expanded efforts in the fields of neurology and pediatrics are included as described above. B. Justification and Requirements for Continued SUMEX Use Our use of SUMEX will obviously decline upon the installation of our VAX. Nevertheless, the excellent facilities of SUMEX are expected to be used for certain developmental work. It is intended, further, to keep INTERNIST-I at SUMEX for comparative use as CADUCEUS is developed here. Our team hopes to remain as a component of the SUMEX community and to share experiences and developments. C. Needs and Plans for Other Computing Resources beyond SUMEX-AIM Our predictable needs in this area will be met by the dedicated VAX computer soon to be installed. D. Recommendations for Future Community and Resource Development Whether a program like CADUCEUS, when mature, will be better operated from centralized, larger computers or from the developing self-contained personal computer is difficult to predict. For the foreseeable future it would seem that centralized, advanced facilities like SUMEX will be important in further program development and refinement. 193 E. A. Feigenbaum Hierarchical Models of Human Cognition P41 RROQO785-08 IIT.A.2.3 Hierarchical Models of Human Cognition Hierarchical Models of Human Cognition (CLIPR Project) Walter Kintsch and Peter G. Polson University of Colorado Boulder, Colorado I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The two CLIPR projects have made substantial progress in their research in this past year. This progress is almost completely due to our access to the SUMEX facility. The prose comprehension group has completed one major project, and is currently interacting with other SUMEX projects with the goal of building a prose comprehension model that reflects state- of-the-art knowledge from psychology and artificial intelligence. The main activity of the planning group during the last year has been the detailed analysis of thinking-out-loud protocols collected from both expert and novice software designers. SUMEX facilities have been used to store, edit, and reformat the raw protocols to facilitate later analysis. Results of successive analyses are then input to SUMEX, and SUMEX facilities are used to collate the various results. Technical Goals: The CLIPR project consists of two subprojects. The first, the text comprehension project, is headed by Walter Kintsch and is a continuation of work on understanding of connected discourse that has been underway in Kintsch's laboratory for over seven years. The second, the planning project, is headed by Peter Polson of the University of Colorado and Michael Atwood of Science Applications Incorporated, Denver, and is studying the processes of planning using software design tasks. The goal of the prose comprehension project is to develop a computer system capable of the meaningful processing of prose. This work has been generally guided by the prose comprehension model discussed by Kintsch and van Dijk (1978), although our programming efforts have identified necessary clarifications and modifications in that model (Miller & Kintsch, 1980a). Our more recent research (Miller & Kintsch, 1980b) has emphasized the importance of knowledge and knowledge-based processes in comprehension, and we are accordingly working with the AGE and UNITS groups at SUMEX toward the development of a knowledge-based, blackboard model of prose comprehension. We hope to be able to merge the substantial artificial intelligence research on these systems with psychological interpretations of prose comprehension, resulting in a computational model that is also psychologically respectable. E, A. Feigenbaum 194 P4i RROOQ785-08 Hierarchical Models of Human Cognition The primary goal of the planning project is the development of a model of human performance on software design tasks. We intend to begin by modeling protocols of experts on solving a particular problem, eventually extending the model to other levels of experience and problems. We propose a two-pronged attack on the process of developing a model. The first is to develop a deeper understanding of our protocol data, to increase our knowledge of the details of the planning processes and the knowledge structures that experts use in the process of planning. We have developed a method of protocol analysis that essentially involves the transforming of the protocol into a low Jevel theoretical description of the processes used to solve the design problem. We have assumed a very simplified version of a blackboard model that is described in Atwood and Jeffries (1980). We currently carry out our analysis by hand, developing a form of this low level model for each protocol. However, much of the activities involved in developing this model are clerical in nature and ‘involve the categorization of seqments of a verbal protocol and then the reorganization of the categorized information. Much of this work can be automated, and we propose to develop a program that will facilitate our protocol analysis and the development of the low level models that we use to describe the behavior of individual subjects. Our second and much longer term objective is the development of a substantive model in AGE that can simulate the design processes. We feel that the software tools that are being developed at SUMEX -- in particular AGE and the UNITS package -- will dramatically facilitate our ability to develop this substantive model. Furthermore, current theoretical ideas about both the process of design and the representation of knowledge involved in developing a design have been strongly influenced by the MOLGEN project at SUMEX (Stefik, 1980). B. Medical Relevance and Collaboration The text comprehension project impacts indirectly on medicine, as the medical profession is no stranger to the problems of the information glut. By adding to the research on how computer systems might understand and summarize texts, and determining ways by which the readability of texts can be improved, medicine can only be helped by research on how people understand prose. Development of a more thorough understanding of the various processes responsible for different types of learning problems in children and the corresponding development of a successful remediation strategy would also be facilitated by an explicit theory of the normal comprehension process. Note that our goal of a blackboard model is particularly relevant to the understanding of learning difficulties. One important aspect of a blackboard model is the separation of cognitive processes into a set of interacting subprocesses. Once such subprocesses have been identified and constructed, it would be instructive to observe the model's performance when certain of these processes are facilitated or inhibited. Many researchers have shown that there are a variety of cognitive deficits (insufficient short-term memory capacity, poor long-term memory retrieval, and such) that can lead to reading problems. Having a blackboard model in 195 E. A. Feigenbaum Hierarchical Models of Human Cognition P41 RROO785-08 which the power of individual components could be manipulated would be a significant step in determining the nature of such reading problems, The planning project is attempting to gain understanding of the cognitive mechanisms involved in design and planning tasks. The knowledge gained in such research should be directly relevant to a better understanding of the processes involved in medical policy making and in the design of complex experiments. We are currently using the task of software design to describe the processes underlying more general planning mechanisms that are also used in a large number of task oriented environments like policy making. Both the text comprehension project and the planning project involve the development of explicit models of complex cognitive processes; cognitive modelling is a stated goal of both SUMEX and research supported by NIMH. The on-going development of the prose comprehension model would not be possible without our collaboration with the AGE and UNITS research groups. We look forward to a continued collaboration, with, we hope, mutually beneficial results. Several other psychologists have either used or shown an interest in using an early version of the prose comprehension model, including Alan Lesgold of SUMEX's SCP project, who is exporting the system to the LRDC vax. Needless to say, all of this interaction has been greatly facilitated by the local and network-wide communication systems supported by SUMEX. There has been considerable communication between members of the prose comprehension and AGE/UNITS groups as program bugs have been discovered and corrected; the presence of a mail system has made this process infinitely easier than if telephone or surface mail messages were required, The mail system, of course, has also enabled us to maintain professional contacts established at conferences and other meetings, and to share and discuss ideas with these contacts. C. Progress Summary The prose comprehension project has completed an initial version of a model of prose comprehension (Miller & Kintsch, 1980a). This model has been applied to a large number of texts, and has yielded quite reasonable predictions of recall and readability. Psychologists from other universities have used this system to derive reading time and recall predictions for their own experimental materials; publication of this work is pending. We are currently using the AGE and UNITS packages to extend this model toward one that can make use of world knowledge in its analyses. The planning group has completed the detailed analysis of several long thinking-out-loud protocols collected from both expert and novice software designers. These analyses involved the development of a lower level model for each of the protocols. See Atwood and Jeffries (1980) for details and examples. E. A. Feigenbaum 196 P41 RROO785-08 Hierarchical Models of Human Cognition D. List of Relevant Publications Atwood, M. E., & Jeffries, R. Studies in plan construction I: Analysis of an extended protocol. Technical Report SAI-80-028-DEN, Science Applications, Incorporated, Denver, Co. March, 1980. Atwood, M. E., & Jeffries, R. Studies in plan construction II: Novice design behavior. Technical Report SAI-80-154-DEN, Science Applications, Incorporated, Denver, Colorado, December, 1980. Polson, P. G., Jeffries, R., Turner, A., & Atwood, M. E. The process of designing software. To appear in J. R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, N.J.: Erlbaum. Atwood, M. E., Polson, P. G., Jeffries, R., and Ramsey, H. R. Planning as a process of synthesis. Technical Report SAI-78-144-DEN, Science Applications, Incorporated, Denver, Co. December, 1978. Kintsch, W. On modelling comprehension. Invited address at the American Educational Research Association convention. San Francisco, April 10, 1979. Kintsch, W. and van Dijk, T. A. Toward a model of text comprehension and production. Psychological Review,- 1978, 85, 363-394. Miller, J. R., & Kintsch, W, Readability and recall of short prose passages: A theoretical analysis. Journal of Experimental Psychology: Human Learning and Memory, 1980, in press. Miller, J. R., & Kintsch, W. Readability and recall of short prose passages. Text, 1981, in press. Miller, J. R. A knowledge-based mode of prose comprehension: Applications to expository text. Paper presented at the American Educational Research Association meeting, April, 1981. E. Funding Support Status 1. Readability and Comprehension. Walter Kintsch, Professor, University of Colorado National Institute of Education NIE-G-78-0172 9/1/78 - 8/31/81: $96,627 9/1/80 - 8/31/81: $46,537 2. Text Comprehension and Memory Watter Kintsch, Professor, University of Colorado National Institute of Mental Health 5 Rol MH15872-9-13 6/1/76 - 5/31/81: $159,060 6/1/80 - 5/31/81: $32,880 197 E. A. Feigenbaum Hierarchical Models of Human Cognition P41 RROO785-08 3. Comprehension and Analysis of Information in Text Walter Kintsch, Professor, University of Colorado, and Lyle E. Bourne, Jr., Professor, University of Colorado Office of Naval Research, Personnel and Training Programs ONR NO0014-78-C-0433 6/1/78 ~ 5/31/80: $68,315 6/1/80 - 5/31/81: $60,000 4. Procedural Net Theories of Human Planning and Problem Solving Michael Atwood, Research Psychologist, Science Applications, Incorporated, Denver, Colorado Office of Naval Research, Personnel and Training Programs ONR NO014~78-C-0165 1/25/78 ~- 12/31/80: $230,000 1/1/80 - 6/30/81: $85,000 Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Sharing and Interactions with Other SUMEX-AIM Projects Our primary interaction with the SUMEX community has been the work of the prose comprehension group with the AGE and UNITS projects at SUMEX. Feigenbaum and Nii have visited Colorado, and one of us (Miller) recently attended the AGE workshop at SUMEX. Both of these meetings have been very valuable in increasing our understanding of how our problems might best be solved by the various systems available at SUMEX. We also hope that our experiments with the AGE and UNITS packages have been helpful to the devetopment of those projects. We should also mention theoretical and experimental insights that we have received from Alan Lesgold and other members of the SUMEX SCP project. The initial comprehension model (Miller & Kintsch, 1980) has been used by Dr. Lesgold and other researchers at the University of Pittsburgh, as well as researchers at Carnegie-Mellon University, the University of Manitoba, Rockefeller University, and the University of Victoria. B. Critique of Resource Management The SUMEX-AIM resource is clearly suitable for the current and future needs of our project. We have found the staff of SUMEX to be cooperative and effective in dealing with special requirements and in responding to our questions. The facilities for communication on the ARPANET have also facilitated collaborative work with investigators throughout the country. III. RESEARCH PLANS (8/79 - 7/81) A. Long Range Projects Goals and Plans The primary long-term goal of the prose comprehension group is the development of a blackboard-based model of prose comprehension. Correspondingly, we anticipate continued use of the AGE and UNITS packages. E. A. Feigenbaum 198 P41 RROO785-08 Hierarchical Models of Human Cognition These packages allow us to model the knowledge structures possessed by people and the inferential processes that operate upon those structures, and are essential to our work. The primary goal of the planning project is the development of a model, or a series of models, of human performance on the software design task. We intend to begin by modeling the protocols of experts ona particular task, eventually extending the model to other levels of experience and other tasks. To do this we will have to become more familiar with AGE and work on articulating our theory in a way that is compatible with the AGE framework. This will involve two parallel lines of effort. One is a deeper analysis of our protocol data, to increase our knowledge of the detailed planning processes and knowledge structures experts are using to solve these problems. The second is the development of a model in AGE that can simulate these processes. We have to date been using SUMEX only for the latter activity, but we are beginning discover that both objectives are so intertwined that it is counter-productive for us to be using separate computer systems. We have'‘transferred much of our protocol analyses activities to SUMEX, making it easier for us to share this very rich data source with other investigators. B. Justification and Requirements for Continued SUMEX Use The research of the prose comprehension project is clearly tied to continued access to the AGE and UNITS packages, which are simply not available elsewhere. We hope that our continued use of these systems will be offset by the input we have been and will continue to provide to those projects: our relationship has been symbiotic, and we look forward to its continuation, C. Needs and Plans for Other Computational Resources We currently use two other computing systems located at the University of Colorado. One is the Department of Psychology's VAX 11/780, which is used primarily to run real-time experiments to be modeled on SUMEX. The second is the University of Colorado's CDC 6400, which is used for various types of statistical analysis. When the ARPA-sponsored Vax/Interlisp project is completed, we would be most interested in experimenting with becoming a remote AGE/UNITS site. It would seem that this sort of development is the ultimate goal of the package projects, and this type of interaction, once it becomes feasible, would be a logical extension of our association with the SUMEX facility. D. Recommendations for Future Community and Resource Development Our primary recommendation for future development within SUMEX involves (a) the continued support of INTERLISP, which is needed for AGE and for other work we have underway on SUMEX and (b) the continued development of the AGE and UNITS projects. In particular, we would like to see an extension of AGE to include a wider variety of control structures so that our psychological models would not be confined to one particular view of knowledge-based processing. The limited physical capacity of SUMEX, 199 E. A. Feigenbaum Hierarchical Models of Human Cognition P41 RROO785-08 both in terms of address space and overloading, is, as before, a major problem. The prose comprehension group can no Tonger use the publicly released AGE/UNITS system due to its severely limited address space, and has had to build a personal AGE system from a stripped-down version of Interlisp and a selected subset of AGE and UNITS. We heartily endorse the plans underway to obtain more computing capacity for the SUMEX project. Given our acquisition of a VAX, we particularly support the ongoing . and continued development of INTERLISP for the VAX, so that local use of AGE and UNITS would be possible. Since we, as well as other psychologists, need the real-time capability of VAX/VMS to run on-line experiments, we hope that the INTERLISP system to be developed will be compatible with VMS. Note that this need for real-time work coincides with real-world applications of SUMEX programs, in which a VAX might be devoted to both real-time patient monitoring and diagnostic systems such as PUFF or MYCIN. E. A. Feigenbaum 200 P41 RROO785-08 PUFF-VM Project I1.A.2.4 PUFF-VM Project PUFF-VM: Biomedical Knowledge Engineering in Clinical Medicine John J. Osborn, M.D. The Institutes of Medical Sciences (San Francisco) Pacific Medical Center and Edward H. Shortliffe, M.D., Ph.D. Department of General Internal Medicine Stanford University Medical Center Stanford University The immedtate goal of this project is the development of knowledge- based programs to interpret physiological measurements made in clinical medicine. The interpretations are intended to be used to aid in diagnostic decision making and in therapeutic actions. The programs will operate within medical domains which have well developed measurement technologies and reasonably well understood procedures for interpretation of measured results. The programs are: (1) PUFF: the interpretation of standard pulmonary function laboratory data which include measured flows, lung volumes, pulmonary diffusion capacity and pulmonary mechanics, and (2) VM: management of respiratory insufficiency in the intensive care unit. The second, but equally important, goal of this project is the dissemination of Artificial Intelligence techniques and methodologies to medical communities that are involved in computer aided medical diagnosis and interpretation of patient data. I. SUMMARY OF RESEARCH PROGRAM PUFF: A. Technical Goals The task of PUFF program is to interpret standard measures of pulmonary function. It is intended that PUFF produce a report for the patient record, explaining the clinical significance of measured test results. PUFF also must provide a diagnosis of the presence and severity of pulmonary disease in terms of measured data, referral diagnosis, and patient characteristics. The program must operate effectively over a wide range of pathological conditions with a broad clinical perspective about the possible complexity of the pathology. 201 E. A. Feigenbaum PUFF-VM Project P41 RR00785-08 B. Medical Relevance and Collaboration Interpretation of standard pulmonary function tests involves attempting to identify the presence of obstructive airways disease (OAD: indicated by reduced flow rates during forced exhalation), restrictive lung disease (RLD: indicated by reduced lung volumes), and alveolar-capillary diffusion defect (DD: indicated by reduced diffusivity of inhaled CO into the blood). Obstruction and restriction may exist concurrently, and the presence of one mediates the severity of the other. Obstruction of several types can exist. In the laboratory at the Pacific Medical Center (PMC), about 50 parameters are calculated from measurement of lung volumes, flow rates, and diffusion capacity. In addition to these measurements, the physician may also consider patient history and referral diagnosis in interpreting the test results and diagnosing the presence and severity of pulmonary disease. Currently PUFF contains a set of about 250 physiologically based interpretation "rules". Each rule is of the form "IF THEN ". Each rule relates physiological measurements or states to a conclusion about the physiological significance of the measurement or state. The interpretation system operates in a batch mode, accepting input data and printing a report for each patient. The report includes: (1) Interpretation of the physiological meaning of the test results, the limitation on the interpretation because of bad or missing data; the response to bronchodilators if used; and the consistency of the findings and referral diagnosis. (2) clinical findings, including the applicability of the use of bronchodilators, the consistency of multiple indications for airway obstruction, the relation between test results, patient characteristics and referral diagnosis. (3) Interpretation Summary, which consists of the diagnosis of presence and severity of abnormality of pulmonary function. C. Progress Summary Knowledge base: PUFF is implemented on the PDP-10 in a EMYCIN system which is designed to accept rules from new task domains. A typical rule is: If (FVC>=80) and (FEV1/FVC AUTOMATIC DEDUCTION . Overview Resolution-based theorem proving Nonresolution theorem proving Applications of theorem proving Nonmonotonic logic MUOnO DY VISION Overview Blocks-world understanding Processing of visual data Shape understanding TAMOOD YS Sample applications in vision research Robotics Overview Computation in a physical environment Engineering and kinematics Languages and simulation Planning and representation mr9O90O 0 > 307 Representation and control methods in vision E. A. Appendix B Feigenbaum Appendix B AI Handbook Outline XV. Learning and Inductive Inference A. Overview B. Rote learning C. Advice taking D. Learning from examples Overview Adaptive learning Learning single concepts Learning multiple concepts Learning by doing ON & Wh Ee XVI. Planning and Problem Solving A. Overview B. Linear planners C. Hierarchical planners 1. NOAH and extensions 2. MOLGEN D. Opportunistic planning E. A. Feigenbaum 308 AIM Management Committee Membership Appendix C Appendix C AIM Management Committee Membership The following are the membership lists of the various SUMEX-AIM management committees at the present time: AIM Executive Committee: LEDERBERG, Joshua, Ph.D.. (Chairman) President The Rockefeller University 1230 York Avenue New York, New York 10021 (212) 360-1234, 360-1235 AMAREL, Saul, Ph.D. Department of Computer Science Rutgers University New Brunswick, New Jersey 08903 (201) 932-3546 BAKER, William R., Jr., Ph.D. (Exec. Secretary) Biotechnology Resources Program National Institutes of Health Building 31, Room §B43 9000 Rockville Pike Bethesda, Maryland 20205 (301) 496-5411 FEIGENBAUM, Edward, Ph.D. Principal Investigator - SUMEX Department of Computer Science Margaret Jacks Hall, Room 216 Stanford University Stanford, California 94305 (415) 497-4079 LINDBERG, Donald, M.D. (Adv Grp Member) 605 Lewis Hall University of Missouri Columbia, Missouri 65201 (314) 882-6966 MYERS, Jack D., M.D. School of Medicine Scaife Halt, 1291 University of Pittsburgh Pittsburgh, Pennsylvania 15261 309 £. A. Feigenbaum Appendix C AIM Management Committee Membership SHORTLIFFE, Edward H., M.D., Ph.D. Co-Principal Investigator - SUMEX Division of General Internal Medicine, TC117 Stanford University Medical Center Stanford, California 94305 (415) 497-5821 E. A. Feigenbaum 310 AIM Management Committee Membership AIM Advisory Group: Appendix C LINDBERG, Donald, M.D. (Chairman) 605 Lewis Hall University of Missouri Columbia, Missouri 65201 (314) 882-6966 AMAREL, Saul, Ph.D. Department of Computer Science Rutgers University New Brunswick, New Jersey 08903 (201) 932-3546 BAKER, William R., Jr., Ph.D. (Exec. Secretary) Biotechnology Resources Program National Institutes of Health Building 31, Room 5B43 9000 Rockville Pike Bethesda, Maryland 20205 (301) 496-5411 FEIGENBAUM, Edward, Ph.D. (Ex-officio) Principal Investigator - SUMEX Department of Computer Science Margaret Jacks Hall, Room 216 Stanford University Stanford, California 94305 (415) 497-4079 LEDERBERG, Joshua, Ph.D. President The Rockefeller University 1230 York Avenue New York, New York 10021 (212) 360-1234, 360-1235 MINSKY, Marvin, Ph.D. Artificial Intelligence Laboratory Massachusetts Institute of Technology 545 Technology Square Cambridge, Massachusetts 02139 (617) 253-5864 MOHLER, William C., M.D. Associate Director Division of Computer Research and Technology National Institutes of Health Building 12A, Room 3033 9000 Rockville Pike Bethesda, Maryland 20205 (301) 496-1168 311 E. A. Feigenbaum Appendix C E. A. Feigenbaum AIM Management Committee Membership MYERS, Jack D., M.D. School of Medicine Scaife Hall, 1291 University of Pittsburgh Pittsburgh, Pennsylvania 15261 (412) 624-2649 PAUKER, Stephen G., M.D. Department of Medicine - Cardiology Tufts New England Medical Center Hospital 171 Harrison Avenue Boston, Massachusetts 02111 (617) 956-5910 SHORTLIFFE, Edward H., M.D., Ph.D. (Ex-officio) Co-Principal Investigator - SUMEX Division of General Internal Medicine, TC117 Stanford University Medical Center Stanford, California 94305 (415) 497-5821 SIMON, Herbert A., Ph.D. Department of Psychology Baker Hall, 339 Carnegie-Mellon University Schenley Park Pittsburgh, Pennsylvania 15213 (412) 578-2787 or 578-2000 312 AIM Management Committee Membership Appendix C Stanford Community Advisory Committee: FEIGENBAUM, Edward, Ph.D. (Chairman) Department of Computer Science Margaret Jacks Hall, Room 216 Stanford University Stanford, California 94305 (415) 497-4079 SHORTLIFFE, Edward H., M.D., Ph.D. Co-Principal Investigator - SUMEX Division of General Internal Medicine, TC117 Stanford University Medical Center Stanford, California 94305 (415) 497-5821 DJERASSI, Carl, Ph.D. Department of Chemistry, Stauffer I-106 Stanford University Stanford, California 94305 (415) 497-2783 MAFFLY, Roy H. Maffly, M.D. Division of Nephrology Veterans Administration Hospital 3801 Miranda Avenue Palo Alto, California 94304 (415) 858-3971 313 E. A. Feigenbaum References Feigenbaum, E.A., The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering, Proceedings of the 1978 National Computer Conference, AFIPS Press, (1978). Nilsson, N.J., Principles of Artificial Intelligence, Tioga Publishing Company, Palo Alto, California (1980). Winston, P.H., Artificial Intelligence, Addison-Wesley Publishing Co., (1977). Nilsson, N.J., Artificial Intelligence, Information Processing 74, North-Holland Pub. Co. (1975). Barr A. and Feigenbaum, E.A. (Eds.), The Handbook of Artificial Intelligence Volume I, William Kaufmann, Inc. Los Altos, Calif. (1981) Boden, M., Artificial Intelligence and Natural Man, Basic Books, New York, (1977). McCorduck, P., Machines Who Think, W.H. Freeman and Co., San Francisco (1979). Coulter, C. L., Research Instrument Sharing, Science, Vol. 201, No. 4354, August 4, 1978. . Metcalfe, R.M. and Boggs, D.R., Ethernet: Distributed Packet Switching for Local Computer Networks, Comm. ACM, Vol. 19, No. 7 (July 1976). Shoch, J.F. and Hupp, J.A., Performance of an Ethernet Local Network -- A Preliminary Report, Proceedings of the Local Area Communications Network Symposium, Boston,May 1979. Taft, E.A., Implementation of PUP in TENEX, Internal XEROX PARC memorandum, June 1978, Feigenbaum 314 12. Boggs, D.R., Shoch, J.F., Taft, E.A., and Metcalfe, R.M., PUP: An Internetwork Architecture, XEROX PARC report CSL-79-10, July 1979. 13. Digital Equip. Corp., Intel Corp., and Xerox Corp., The Ethernet - Data Link and Physical Layer Specifications, Version 1.0, September 30, 1980. 315 E. A. Feigenbaum