Simulation of Cognitive Processes P41 RRO0785-08 II.A.2.6 Simulation of Cognitive Processes Simulation of Cognitive Processes James G. Greeno Alan M. Lesgold Learning Research and Development Center University of Pittsburgh I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale Our goal is to improve understanding of the basic cognitive processes involved in the use and acquisition of cognitive skills. Concern is spread between basic competence in reading, mathematics and general problem solving on the one hand and complex expertise in domains such as radiology on the other. A basic form for our theorizing is computer simulation of human performance. Such simulation modeling allows richer understanding in the course of theory building and stronger empirical validation of theory than would otherwise be possible in domains where several separate cognitive systems or components interact during skilled performance. B. Medical Relevance Increased understanding of basic cognitive processes is relevant to medical needs in two ways. One form of relevance involves performance of professional medical tasks. Lesgold has been working with other local colleagues to understand the nature of the skill of reading and diagnosing chest radiographs. The work is leading toward both insights into | techniques for improving the training of radiologists and improved understanding of the aspects of radiology that seem to be learned only with many years of experience. This latter improved specification of what is hard to learn and why this is the case will hopefully inform decision making on automatic diagnostic facilities and diagnostic "prostheses" whose cost-effectiveness is otherwise difficult to establish. The second form of relevance of basic research in cognition to medical needs is in development of understanding of the cognitive requirements of elementary skills such as reading and arithmetic computation, in which cognitive deficits can constitute severe disability. Improved understanding of these basic skills should provide principles useful in developing a more valid approach to diagnosis and remediation of learning disabilities. C. Highlights of Research Progress 1. Accomplishments This Past Year In the study of the acquisition of reading processes, firm data began to emerge documenting the causal role of slow and nonfacile word recognition in hampering progress in the acquisition of higher-order reading skills (Lesgold, Resnick, Roth, & Hammond, 1981; Lesgold & Curtis, E. A. Feigenbaum 226 P41 RROO785-08 Simulation of Cognitive Processes in press; Resnick, Lesgold & Roth, in preparation). Recent results of longitudinal research in our laboratory have shown curriculum differences in the primary grades that significantly affect the specific patterns of learning word recognition that different children exhibit. In addition, there has been an improved understanding of the nature of component process interactions in reading that will drive a simulation of the specific problems in learning comprehension skills that come from inadequately practiced word recognition. The work on radiological skill has been progressing (Lesgold, Feltovich, Glaser, and Wang, 1981). After extensive review of protocol data from both expert radiologists and residents, we have a good sense of the ways in which radiology skill is a blend of opportunistic problem solving skills from medical diagnosis and more specific spatial representation skills for human anatomy. Modeling work is beginning in earnest, now. We will be experimenting with both the sorts of blackboard approaches subsumed under the AGE system and with parallel activation models of the sort being used in perception work (e.g., CAPS). A modeling effort that was carried out in its initial phase with the SUMEX facility, enabling use of Anderson's ACT system, involved study of the use of spatial information in solving problems in elementary probability theory. In this project we are simulating the processing of information in Venn diagrams in the calculation of probabilities of composite events. The study extends earlier analyses of spatial information in probtem solving in the domain of geometry proof exercises, which also included collaboration with Anderson and use of SUMEX in early phases of simulation programming (Anderson, Greeno, Kline, & Neves, in press). We have developed a simulation of learning by children who have defective procedures for arithmetic calculation. This work has depended on concepts and techniques with which we became familiar initially in collaborative work with Anderson's group. The model simulates learning from special instruction designed to communicate understanding of concepts of place value as well as a correct computational procedure, and thus provides a preliminary hypothesis about the nature of learning in which conceptual understanding and procedural knowledge interact meaningfully. 2. Research in Progress Research is continuing on all these topics. In each case, we are building, through porting of code from other centers and through some efforts of our own, an appropriate simulation environment on our own VAX- 11-780 system. We are clearly building on what we've learned within the SUMEX community in doing this. The work in reading and radiology has many aspects in common, and we hope to exploit this commonality in simulation modeling. However, the emphases will be slightly different. The radiology effort will concentrate on a highly refined level of skill, even in the learner, while the reading effort deals with the earliest stages of skill acquisition and is particularly concerned with coordinations in time (in active memory) of representations for concepts that are interrelated in a text. 227 E. A. Feigenbaum Simulation of Cognitive Processes P41 RROO785-08 Studies of mathematical cognition are continuing, with increasing emphasis on individual differences and on learning. We are pursuing empirical and theoretical studies of individuals' differences in the use of composite forms in Venn diagrams THAT correspond to composite events that are involved in problems. We are extending the work on learning in arithmetic, relating our model of acquiring understanding to some general principles of heuristic procedure modification, and we will apply these results to analysis of other learning phenomena in the early arithmetic domain, such as those studied by Riley, Greeno, and Heller (forthcoming). D. List of Relevant Publications Anderson, J.R., Greeno, J.G., Kline, P.J., & Neves, D.M. Acquisition of problem-solving skill. In J.R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates, in press. Lesgold, A.M., Resnick, L.B., Roth, S.F., & Hammond, K.L. Patterns of learning to read. Paper presented a the biennial meeting of the Society for Research in Child Development, 1981. Lesgold, A.M., Feltovich, P., Glaser, R., & Wang, Y. Radiological expertise. Symposium presentation at the annual meeting of the American Educational Research Association, 1981. Lesgold, A.M., & Curtis, M.E. Learning to read words efficiently. To appear in A.M. Lesgold & C.A. Perfetti (Eds.)}, Interactive processes in reading, Hillsdale, NJ: Erlbaum, in press. Lesgold, A.M., & Perfetti, C.A. Interactive processes in reading: Where do we stand? In A.M. Lesgold & C.A. Perfetti (Eds.), Interactive processes in reading, Hillsdale, NJ: Erlbaum, in press. lesgold, A.M., & Perfetti, C.A. (Eds.). Interactive processes in reading, Hillsdate, Nj: Erlbaum, in press. Resnick, L.B., & Lesgold, A.M. A longitudinal study of difficulties in learning to read. To appear in J.P. Das, R. Mulcahy, & A.E. Wall (Eds.), Theory and Research in Learning Disability, forthcoming. Riley, M.S., Greeno, J.G., & Heller, J.I. Development of children’s problem-solving ability in arithmetic. In H.P. Ginsburg (Ed.), Development of mathematical thinking. New York: Academic Press, in press. Lesgold, A.M. Cognitive simulation modeling on the VAX-11. Behavior Research Methods and Instrumentation, in press. E. A. Feigenbaum 228 P41 RROO785-08 Simulation of Cognitive Processes IT. Oo & Qn Sw Aan Funding Support 1) National Institute of Education . Title: Research on learning and schooling. Principal Investigators: Robert Glaser, University Professor and Co-Director, Learning Research and Development Center (LRDC), University of Pittsburgh; and Lauren Resnick, University Professor of Psychology and Co-Director, LRDC. . Funding Agency: National Institute of Education . Grant Number: NIE-G-80-0014 . Total Award: 1 December 79 to 30 November 82, $7,880,729. . Current Period: 1 December 80 to 30 November 81, $2,627,067. 2) Office of Naval Research . Title: Cognitive and instructional factors in the acquisition and maintenance of skill. Principal Investigators: Robert Glaser, University Professor and Co-Director, LRDC; James Greeno, University Professor; Alan Lesgold, Research Associate Professor of Psychology; Michelene Chi, Research Assistant Professor of Psychology. . Funding Agency: Office of Naval Research . Contract Number: N0Q0014-79-C-0215 . Total Award: 1 January 79 to 30 September 83, $1,676,950. . Current Period: 1 October 80 to 30 September 81, $247,053. 3) National Science Foundation/National Institute of Education . Title: Invention and understanding in the acquisition of computation. Principal Investigators: Lauren B. Resnick, Professor of Psychology and Co-Director, LRDC; and James Greeno, University Professor of Psychology. . Funding Agency: National Science Foundation and National Institute of Education (Joint Program) . Grant Number: SED78-22289 . Total Award: $149,967. . Current Period: 1 December 78 to 31 August 81 INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Distribution via SUMEX The work on development of radiology skills is being done in collaboration with Dr. Yen Wang, Clinical Professor of Medicine, University of Pittsburgh. 229 E. A. Feigenbaum Simulation of Cognitive Processes P41 RROOQ785-08 B. Sharing and Interaction with Other SUMEX-AIM Projects We have shared software from the ACT and AGE projects via SUMEX. In addition, the ties to the ACT project have led to interactions and joint software and programming environment development with psychologists at Carnegie-Mellon University. We expect to maintain a common simulation development environment for psychologists in both institutions, largely because SUMEX proved the value of such standardization. Similarly, especially if the COGNET proposal for a cognitive Science Network is funded, we expect to support a standard demo and guest interface in both places. C. Critique of Resource Management None. D. Future Involvement This is a final report for this project. Because of the successful acquisition of on-site facilities for artificial intelligence work, we are moving from national project to associate status within the SUMEX-AIM community. We are particularly grateful for the many opportunities SUMEX involvement provided when we most needed them. We have learned from them in building our own resource and expect to continue to share in the community-building aspects of AIM in the future. E. A. Feigenbaum 230 P41 RROO785-08 SECS - Simulation and Evaluation of Chemical Synthesis I1.A.2.7 SECS - Simulation and Evaluation of Chemical Synthesis SECS - Simulation and Evaluation of Chemical Synthesis PI: W. Todd Wipke Board of Studies in Chemistry University of California Santa Cruz, California 95064 Coworkers: D. Dolata (Grad student) I. Kim (Grad student) D. Rogers (Grad Student) J. Chou (Postdoctoral) P., Condran (Postdoctoral) T. Moock (Postdoctoral) H. Kuehmstedt (Visiting Professor, Univ. of Greifswald, East Germany) I, SUMMARY OF RESEARCH PROGRAM A. Technical Goals. The long range goal of this project is to develop the logical principles of molecular construction and to use these in developing practical computer programs to assist investigators in designing stereospecific syntheses of complex bio-organic molecules. Our specific goals this past year included adding a starting material library, initial exploration of strategies based on starting materials, developing the representation for a reasoning modute, and completion of the user-defined transform capabilities. The objectives for the XENO project were to establish extensive collaborations with metabolism experimentalists to test XENO predictions and to begin development on methods for assessing the potential biological activity of each metabolite. The world-wide SECS User’s Group for sharing chemical transforms was initiated. B. Medical Relevance and Collaboration The development of new drugs and the study of how drug structure is related to biological activity depends upon the chemist's ability to Synthesize new molecules as well as his ability to modify existing structures, @.g., incorporating isotopic labels or other substituents into biomolecular substrates. The Simulation and Evaluation of Chemical Synthesis (SECS) project aims at assisting the synthetic chemist in designing stereospecific syntheses of biologically important molecules. The advantages of this computer approach over normal manual approaches are many: 1) greater speed in designing a synthesis; 2) freedom from bias of past experience and past solutions; 3) thorough consideration of all possible syntheses using a more extensive library of chemical reactions than any individual person can remember; 4) greater capability of the computer to deal with the many structures which result; and 5) capability 231 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis P41 RROO785-08 of computer to see molecules in graph theoretical sense, free from bias of 2-D projection. The objective of using XENO (a spinoff of SECS) in metabolism is to predict the plausible metabolites of a given xenobiotic in order that they may be analyzed for possible carcinogenicity. Metabolism research may also find this useful in the identification of metabolites in that it suggests what to look for. Finally, it seems there may even be application of this technique in problem domains where one wishes to alter molecules so certain types of metabolism will be blocked. C. Highlights of Research Progress 1. Progress and Accomplishments RESEARCH ENVIRONMENT: At the University of California, Santa Cruz, we have a GT40 and a GT46 graphics terminal connected to the SUMEX-AIM resource by 1200 and 2400 baud leased lines (one leased line supported by SUMEX). We also have a T1725, T1745, CDI-1030, DIABLO 1620, and an ADM-3A terminal used over 300 baud leased lines to SUMEX, UCSC has only a small IBM 370/145, a PDP-11/45, 11/70 and a VAX 11/780, (the 11's are restricted to running small jobs for student time-sharing) all of which are unsuitable for this research. .The SECS laboratory is located in a newly renovated room with raised floor in 125 Thimann Laboratories, adjacent to the synthetic organic laboratories at Santa Cruz so the environment is excellent. 2. SECS Program Developments The Simulation and Evaluation of Chemical Synthesis (SECS) program has undergone many additions to improve its capabilities and usefulness to synthetic chemists. The ALCHEM language has been made extensible to facilitate addition of new groups. New functional groups and ring systems can be defined and referred to by name within a transform and individual atoms and bonds within the named entity can also be referenced. Thus, a transform may now inquire if an indole ring system exists elsewhere in the molecule, and may then inquire about substituents at various positions on that ring system. Such a query was previously impossible. This kind of query arises frequently in heterocyclic chemistry. The User Defined Transform module (UDT) allows the chemist to define a chemical transform on-line in the middle of a synthetic analysis. We have now enabled the program to "learn" the transform, i.e., remember it and be able to apply it again where applicable. This is the first work in machine learning of chemical reactions. A graphical front-end to the UDT module was added and the whole module incorporated into the production SECS version, A META-SECS top-level plan generator is being implemented to reason using synthetic principles and conclude plans which will then be used to guide the existing SECS program in synthetic analysis. The First Order E. A. Feigenbaum 232 P41 RROO785-08 SECS - Simulation and Evaluation of Chemical Synthesis Predicate Calculus is being used to represent the synthetic strategies -and principles. The statement parser and data structure manipulation routines have been implemented, the inference engine is in progress at this time. This explicit reasoning in synthetic strategies should prove very interesting as a way to control development of the synthesis tree. Synthesis tree output is now possible on the PRINTRONIX printer/ plotter using a new software vector to raster conversion program developed by our group. The resulting raster file is compressed, transmitted from SUMEX to Santa Cruz, then expanded and fed to the printer. This approach avoids our previous problems resulting from the long transmission time required to plot the big trees over a slow communication Vine and the mechanical problems associated with pen plotters. The Aldrich catalog of commercially available chemicals has been converted from Wisswesser line notation to connection table and to the SEMA unique representation. This allows SECS to determine if a precursor is commercially available. Initial studies of using this library to develop Synthetic strategies has begun. Several algorithms have been developed to identify potential starting materials for a target molecule The effectiveness of these algorithms is being evaluated against literature syntheses, 3. XENO - A Program to Predict Plausible Metabolites The XENO program was developed to assist metabolism researchers in predicting plausible metabolites of compounds foreign to an organism, and in evaluating the potential biological activity of the resulting metabolites. The knowledge base of XENO has been revised completely and now includes 110 types of metabolic processes. We have specialized on rat and mouse systems to date. The XENO program takes graphical input of a compound to be metabolized and stepwise generates a tree of metabolite structures which might result. The second phase of XENO which evaluates potential biological activity is underway. We are developing a series of rules for each class of compounds to relate structure to biological activity. Work to date has concentrated on aromatic amines and polycyclic aromatic hydrocarbons. Collaborations with metabolism experimentalists have begun in order that XENO can make predictions for compounds actively being studied in the laboratory. Initial feedback from ICI Pharmaceuticals on one particular drug indicates that the major metabolites were correctly generated and that XENO proposed structures that may explain some polar compounds that ICI has not yet been able to identify. XENO generated several possible pathways to explain an unusual metabolite. ICI is seeking to differentiate between the pathways. We are similarly collaborating with Mead Johnson. Initial results there indicated several transforms were missing from the XENO library or were improperly represented. Those have been corrected and new analyses sent for evaluation. Other collaborations are in progress with investigators at NIH and at the Australian National University. 233 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis P41 RROO785-08 We are also comparing XENO analyses to results reported in the literature for compounds such as cyclophosphamide. This work is sponsored by the National Cancer Institute. D. List of Current Project Publications M.L. Spann, K.C. Chu, W.T. Wipke, and G. Ouchi, "Use of Computerized Methods to Predict Metabolic Pathways and Metabolites," J. of Env. Pathology and Toxicology, 2, 123 (1978); also reprinted in "Hazards from Toxic Chemicals," ed. M.A. Mehiman, R.E. Shapiro, M.F. Cranmer and M.J. Norvell, Pathotox Publishers, Inc., Park Forest South, I11., 1978, pp. 123-121. P. Gund, E.J.J. Grabowski, D.R. Hoff, G.H. Smith, J.D. Andose, J.B. Rhodes, and W.T. Wipke, "Computer-Assisted Synthetic Analysis at Merck," J. Chem. Info. and Comput. Sci., 20, 288 (1980). S.A. Godleski, P.v.R. Schleyer, E. Osawa, and W. T. Wipke, “The Systematic Prediction of the Most Stable Neutral Hydrocarbon Isomer," Progress in Physical Organic Chemistry, Vol. 13, 1981, pp. 63-118. R.E. Carter and W.T. Wipke, "SECS--EH Hjalpmedel Vid Organisk Syntesplanering,” Kemisk Tidskrift, in press. W.T. Wipke, G. Ouchi, and J. Chou, "Computer-Assisted Prediction of Metabolites,” Proceedings of Conference Structure Activity Relations, Research Triangle Park, N.C., Feb., 1980. E. Funding Status 1) Resource-Related Research: Biomolecular Synthesis PI: W. Todd Wipke, Associate Professor, UCSC Agency: NIH, Research Resources No: RRO1059-035S1 7/1/80-12/31/81 $ 36,949 TDC 2) Computer-Aided Prediction of Metabolites for Carcinogenicity Studies PI: W. Todd Wipke Agency: NIH, National Cancer Institute No: NO1-CP-75816 1/1/80-7/31/81 $74,394 TDC Il. INTERACTIONS WITH SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX, SECS is available in the GUEST area of SUMEX for casual users, and in the SECS DEMO area for serious collaborators who plan to use a significant amount of time and need to save the synthesis tree generated. Much of the access by others has been through the terminal equipment at Santa Cruz because graphic terminals make it so much more convenient for structure input and output. E. A. Feigenbaum 234 P41 RROO785-08 SECS - Simulation and Evaluation of Chemical Synthesis Prof. H. Kuehmstedt of the University of Greifswald, E. Germany used SECS to generate some interesting and novel synthetic routes to progesterone. Demonstrations and sample synthetic analyses were generated for Ors. Terry Brunck and Steve Roman of Shell Development, John Harper of Amoco Chemicals, Prof. Fujiwara, University of Tsukuba, Japan, Dr. Peder Berntsson, Hassle, Sweden. Other visitors included Dr. M. Onozuka, A. Tomonaga and H. Itoh, Kureha Chemical Co., Tokyo, Japan, Dr. Rhyner, Director of Research, Ciba-Geigy, Basel, Switzerland. Demonstrations of SECS in Sweden were performed by Dr. Carter at many universities and companies. Ned Phillips of the College of Pharmacy, Univ. of Florida is accessing SECS via SUMEX GUEST access. A synthesis of vellerolactone,.a substance found to be toxic and teratogenic was generated for Prof. R. E. Carter, Univ. Lund, Sweden. Dr. Wipke has also used several SUMEX programs such as CONGEN in his course on Computers and Information Processing in Chemistry. Communication between SECS collaborators is facilitated by using SUMEX message drops, especially since the time difference between the U.S. and Europe and Australia makes normal telephone communication practically impossible. Testing and collaboration on the XENO project with researchers at the NCI depend on having access through SUMEX and TYMNET. B. Examples of Sharing, Contacts and Cross-Fertilization with Other SUMEX-AIM Projects. This year the SECS and XENO project have made use of the teletype plot program which Ray Carhart of the CONGEN project wrote at Stanford. We modified the program to fit the needs of our projects. This was facilitated by being able to transfer the programs within areas on the same computer system at SUMEX. We continue to have intellectual interactions with the DENDRAL and MOLGEN project in areas where we have common interests and have had people from those projects speak at our group seminars. SUMEX also is used for discussions with others in the area of artificial intelligence on the ARPANET. We developed a local print capability through SUMEX with the help of the SUMEX staff which has facilitated our work greatly. We have also communicated with SUMEX staff regarding selection of terminals and other computer equipment. C. Critique of Resource Services. We find the SUMEX-AIM network very well human engineered and the staff very friendly and helpful. The SECS project is probably one of the few on the AIM network which must depend exclusively on remote computers, and we have been able to work rather effectively via SUMEX. Basically we have found that SUMEX-AIM provides a productive and scientifically stimulating environment and we are thankful that we are able to access the resource and participate in its activities. SUMEX-AIM gives us at UCSC, a small university, the advantages of a larger group of colleagues, and interaction with people all over the country. We especially thank SUMEX for support of the leased line for our GT40, and for helping develop our remote print capability. 235 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis P41 RROO785-08 SUMEX however has fallen short of our goals and desires: the load average on SUMEX has increased and reduced my group's efficiency greatly-- the system is too overloaded. We are installing SECS on the 2020 to begin to make use of that additional capability. We also have not been able to utilize the 4800 baud high speed line we purchased because SUMEX limitations forced running at 2400 baud. We had hoped to be able to write tapes locally with the 4800 baud line, but at 2400 baud it is too slow to be practical. We would like to see some of their local lines slowed down so those remote people doing graphics can run at a higher speed. We have found that when a FORTRAN program is overlayed, the symbol table is lost, making symbolic debugging with DDT impossible, we wish that could be corrected. D. Collaborations and Medical Use of Programs via Computers other than SUMEX. SECS 2.9 now resides on the CompuServe computer networks so anyone can access it without having to convert code for their machine. This has proved very useful as a method of getting people to try this new technology. Dr. George Purvis of Battelle is accessing SECS via CompuServe, as are Gene Dougherty of Rohm and Haas and many others. SECS also resides on the Medicindat machine at the University of Gothenborg, Sweden, and is available all over that country by phone. Similarly in Australia, SECS resides at the University of Western Australia and is available throughout the country over CSIRONET. Plans are underway for a similar situation in Japan. III. RESEARCH PLANS (6/81-6/82) A. Long Range Project Goals and Plans. The SECS project now consists of two major efforts, computer synthesis and metabolism, the latter being a very young project. Our plans for SECS for-the next year include completing the high level reasoning module for proposing strategies and goals, and providing control which continues over several steps. This reasoning module also will be able to trace the derivation of goals and thus explain some of its reasoning. We also plan to focus on bringing the transform library up in sophistication to improve the performance and capabilities of SECS. In particular we plan to allow a transform to have access to the precursors generated as well as to the product, this will allow much greater control and more natural transform writing, but it requires extensive changes in the SECS control structure to permit this. We will continue to explore starting material oriented strategies based on the Aldrich Chemical file we now have implemented. We especially are interested in chirality based strategies which we feel are very strong. We plan to explore running SECS on a virtual memory 32-bit computer like a VAX-11/780 or a PRIME since many chemistry departments now have these machines available and thus could run SECS. The XENO metabolism project will be expanding the data base to cover more metabolic transforms, including species differences, sequences of transforms, and stereochemical specificities of enzymatic systems. Development of the second phase which assesses the biological activity of E. A. Feigenbaum 236 P41 RROO785-08 SECS - Simulation and Evaluation of Chemical Synthesis the metabolites will continue as will efforts to simulate excretion and incorporation, the endpoints of metabolism. Finally, application of the current program to the molecules actively being investigated by metabolism researchers will occur concurrently to test and verify the work done to date on XENO and provide examples for publication. In the next five years we foresee the SECS and XENO projects reaching a stage of maturity where they will find much application in other research groups. Our research will continue in these areas, but turn to some new programs that approach the probtems from different viewpoints and allow us an opportunity to begin fresh taking advantage of what we have learned from the building of SECS and XENO. B. Justification and Requirements for Continued Use of SUMEX. The SECS and XENO projects require a large interactive time-sharing capability with high level languages and support programs. I am on the campus computing advisory committee and am the campus representative to the UC systemwide computing advisory committee and know that the UCSC campus is not likely in the future to be able to provide this kind of resource. Further there does not appear to be in the offing anywhere in the UC system a computer which would be able to offer the capabilities we need. Thus from a practical standpoint, the SECS and XENO projects still need access to SUMEX for survival. Scientifically, interaction with the SUMEX community is still extremely important to my research, and will continue to be so because of the direction and orientation of our projects. Collaborations on the metabolism project and the synthesis project need the networking capability of SUMEX-AIM, for we are and will continue to be interacting with synthetic chemists at distant sites and metabolism experts at the National Cancer Institute. Our requirements are for good support of FORTRAN. Our needs for SUMEX include fixing the overlay loader so that an overlaid program can retain its symbol 1 table and permit symbolic use of DDT. This is a serious problem we hope can be fixed by SUMEX staff because without symbols, debugging is very difficult and time-consuming, since we must run SECS and XENO overlaid. C. Needs beyond SUMEX-AIM. We do plan to acquire a virtual memory minicomputer like a VAX or PRIME in the future to offload some of our processing from SUMEX. Such a machine would enable us to do some production and development work locally and would explore the feasibility of those types of machines as hosts for SECS and XENO. A local machine would also free us from the problems we have experienced in the winter when the telephone lines to Stanford get wet. and are too noisy to use. Even if we had such a machine we still need to use SUMEX because we plan to continue to develop and maintain the PDP-10 version of SECS and we need SUMEX for its networking capabilities. In the future if we had a mini at UCSC, we would lighten our load on SUMEX, but currently we see our load increasing as our group grows and as we start new projects yet mus maintain existing large programs. 237 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis P41 RROO785-08 We especially need the local capabilities to read and write magnetic tape because we receive and send many tapes between our collaborators, Driving to SUMEX to write a tape is not efficient for our personnel and hinders communication with collaborators via tape. The problem will worsen because the SECS Users Group will be sending UCSC tapes of chemical transforms on a regular basis. D. Recommendations for Community and Resource Development. The AIM Workshops have been excellent in the past and should be continued. We feel the SUMEX resource is too heavily utilized at times to get any productive work done. SUMEX staff could tighten the load on the machine by reducing the speed of text terminals at Stanford from 2400 baud and above down to 1200 baud which is plenty fast for humans to read, and giving remote users faster capabilities, say 4800 baud. We feel the community would benefit if remote users such as we had a virtual minicomputer so the load could be distributed more and not have everything go through Stanford which is highly congested and quite expensive for multiple leased lines. SUMEX can not currently handle an increase in the outside community using SECS or XENO for testing. The response time guests and outside collaborators see is not a good reflection on the actual efficiency of the programs. E. A. Feigenbaum 238 P41 RROO785-08 SOLVER Project TI.A.2.8 SOLVER Project SOLVER: Problem Solving Expertise Dr. P. E. Johnson Center For Research In Human Learning University of Minnesota Dr. W. B. Thompson Department of Computer Science University of Minnesota I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale This project focuses upon the development of strategies for discovering and documenting the knowledge and skill of expert problem solvers. In the last fifteen years, great progress has been made in synthesizing the expertise required for solving extremely complex problems. Computer programs exist with competency comparable to human experts in diverse areas ranging from the analysis of mass spectrograms and nuclear magnetic resonance [DENDRAL] to the diagnosis of certain infectious diseases [MYCIN]. Design of an expert system for a particular task domain usually involves the interaction of two distinct groups of individuals, "knowledge engineers," who are primarily concerned with the specification and implementation of formal problem solving techniques, and "experts" (in the relevant problem area) who provide factual and heuristic information of use for the problem solving task under consideration. Typically, the knowledge engineer, after consulting with one or more experts, decided on a particular Knowledge representational structure and inference strategy. Next, "units" of factual information are specified. That is, properties of the problem domain are decomposed into a set of manageable elements suitable for processing by the inference operations. Once this organization has been established, major efforts are required to refine representations and acquire factual knowledge organized in an appropriate form. Major research problems exist in developing more effective representations, improving the inference process, and in finding better means of acquiring information from either experts or the problem area itself. Programs currently exist for empirical investigation of some of these questions for a particular problem domain [AGE, UNITS, RLL]. These tools allow the investigation of alternate organizations, inference strategies, and rules bases in an efficient manner. What is stil] lacking, however, is a theoretical framework capable of reducing dependence on the expert's intuition or on near exhaustive testing of possible organizations. Despite their successes, there seems to be a consensus that expert systems could be 239 E. A. Feigenbaum SOLVER Project P41 RROO785-08 better than they are. Most expert systems embody only the limited amount of expertise that individuals are able to report in a particular, constrained language (e.g. production rules). If current systems are approximately as good as human experts, given that they represent only a portion of what individual human experts know, then improvement in the "knowledge capturing” process should lead to systems with considerably better performance. B. Medical Relevance and Collaboration Collaboration with Dr. James Moller, Department of Pediatrics, and Dr. Donald Connelly in the Department of Laboratory Medicine and Pathology, both at the University of Minnesota Medical School. C. Highlights of Research Progress Accomplishments of this past year. Prior research at Minnesota on expertise in diagnosis of congenital heart disease has resulted in a theory of diagnosis and an embodiment of that theory in the form of a computer simulation model which diagnoses cases of congenital heart disease. At a macroscopic level, the simulation model contains four categories of knowledge used by the expert physician (pediatric cardiologist) in diagnosis. First, the model has clinical knowledge of disease. This knowledge is hierarchically structured, including categories of disease, specific diseases, and variants of the same disease that differ in presentation. From each element in the hierarchy, there is knowledge of the associated anatomy and physiology and the expected clinical manifestations. Second, the model has deductive knowledge of disease - knowledge of principles of cardiovascular pathophysiology and the clinical manifestations useful in detecting underlying pathology. The causal knowledge deductively relates cardiovascular defects to hemodynamics and to expected patient data. This category of knowledge is not typically used in diagnosis, since the expert physician can simply recall expected clinical manifestations through clinical knowledge of disease, rather than deduce them. Third, the model has heuristic knowledge of disease and of clinical findings useful in its diagnosis. One aspect of this knowledge provides indices from clinical manifestations to diseases and from disease to disease, useful in choosing diagnostic hypotheses to consider. Another aspect of heuristic knowledge is related to evaluation of diagnostic alternatives -- rules of thumb for ruling in and ruling out alternatives as correct or incorrect. Another form of heuristic knowledge screens abnormal from normal clinical findings and identifies subsets of findings which are likely to have the same underlying cause. All heuristics are aimed at basically the same end -- reducing the cognitive demands of diagnosis without loss of diagnostic accuracy. The fourth and last category is knowledge of data acquisition techniques: interviewing methods, physical exam maneuvers, special procedures, and laboratory utilization. Diagnosis is characterized as a heuristic search process, with four sources of knowledge involved. Clinical knowledge of disease is the hierarchical structure to be searched. Deductive knowledge of disease is useful in construction of missing pieces of that hierarchy and in E. A. Feigenbaum 240 P41 RRO0785-08 SOLVER Project justifying the clinical information it contains. Heuristic knowledge aids in limiting the section of clinical knowledge to be searched and in providing simple evaluation functions for use in search. Data acquisition for knowledge is essential in obtaining patient information to be used in the search process. The simulation model is currently programmed in UT Lisp on the University of Minnesota CDC Cyber 74; it contains just over one million characters, One of the first goals of our project is to transfer this program over to‘the Sumex-AIM system - see Project goals and Plans (below). Research in progress. The methodology of our research derives from the discipline of cognitive science, and from our study of expert problem solvers. This methodology consists of: (1) extensive use of verbal thinking aloud protocois as well as other experimental data as a source of information from which to make inferences about underlying cognitive structures and processes; (2) development of computer models as a means of testing the adequacy of inferences derived from the protocol studies; (3) testing and refinement of the cognitive models based upon the study of human and model performance in experimental settings. This past year we have been investigating expertise in solving certain classes of physics problems. We have chosen this area because reasonable data on human competency is available from past work at Minnesota and elsewhere, some work on formal modeling has already been done, and both the problems and required knowledge are well specified. Effort will be concentrated on investigating control structures and search heuristics. Those portions of the system unique to physics will be isolated so that generality can be easily studied by extending the program to other domains. Once a computational model has been implemented, three classes of experiments will provide useful information. First, a series of validation tests will be used to estimate the effectiveness of the model. A series of problems will be given to the model and a solution trace produced. The degree to which the program's behavior corresponded to the ways in which the humans solved the same problems will be determined. In addition, the effectiveness of the heuristics in producing efficiencies relative to alternative approaches to the problems will be estimated. In the second set of experiments, the model will be optimized for different types of problems within the same domain to determine which heuristics are likely to be problem specific. We are particularly interested in the degree to which the problems are well specified and the effects of problem space topology. Expert/novice differences will be investigated by operating the model with different levels of sophistication in its knowledge base. The last set of experiments will investigate errors in problem solving. Classes of problems likely to produce an incorrect answer will be identified. The degree to which these errors are a necessary consequence of the search heuristics will be investigated. Finaltty, the model will be extended so that when garden path errors are recognized, a generalization process is invoked so that similar situations can be avoided in the future. 241 E. A. Feigenbaum SOLVER Project P41 RROO785-08 D. List of Relevant Publications Connelly, D., & Johnson, P.E. The medical problem solving process. Human Pathology, 1980, 11, 412-419 Elstein, A., Gorry, A., Johnson, P., & Kassirer, J. New research direction. In Clinical Decision Making and Laboratory Use. D.C. Connelly, E. Benson, & D. Burke (Eds.)}, University of Minnesota Press (in press). Feltovich, P.d., Kngwledge based components of expertise in medical diagnosis. Unpublished doctoral dissertation, University of Minnesota, 1981, Johnson, P.E. Cognitive models of medical problem solvers. In D.C. Connelly, E. Benson, & D. Burke (Eds.) Clinical Decision Making and Laboratory Use. University of Minnesota Press (in press). Johnson, P.E., Severance, D.G., & Feltovich, P.J. Design of decision support systems in medicine: Rationale and principles from the analysis of physician expertise. Proceedings of the Twelfth Hawaii International Conference on System Sciences, Western Periodicals Co., 1979, 3, 105-118. Johnson, P.E., Barreto, A., Hassebrock, F., Moller, J., & Prietula, M. Expertise and error in diagnostic reasoning. Cognitive Science (in press). Johnson, P.E., & Thompson, W.B. Strolling down the garden path: Detection and recovery from error in expert problem solving. Proceedings of the seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., August 1981. Moller, J.H., Bass, G.M., Jr., & Johnson, P.E. New techniques in the construction of patient management problems. Medical Education (in press). Swanson, D.B., Computer simulation of expert problem solving in medical ‘ diagnosis. Unpublished doctoral dissertation, University of Minnesota, 1978. Swanson, D.B., Feltovich, P.J., & Johnson, P.£. Analysis of physician expertise: Implications for the design of decision support systems. In D.B. Shires & H. Wold (Eds.), Medinfo77. Amsterdam: North-Holland Publishing Co., 1977. E. Funding and Support Work being done in scientific reasoning is sponsored under a current NSF (SE079-13036) grant to Paul Johnson. The work in law has been supported by by the Minnesota Center for Research in Human Learning and is described in a proposal which is currently under review by the NSF Law and Social Science Program. The work in medicine has been supported by NICHD E. A. Feigenbaum 242 P41 RROO785-08 SOLVER Project (T736-HD-17151 and HD-01136) and NSF (NSF/BNS-77-22075) grants to the Minnesota Center for Research in Human Learning of which Paul Johnson is a principal investigator. Additional support is being sought from the National Library of Medicine and the Information Sciences Program at NSF. Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX Collaboration with Tufts University - New England Medical Center (See III.A.) B. Sharing and Interactions with Other SUMEX-AIM Projects Our work is complementary to many of the current projects supported on Sumex-AIM. We will be investigating certain aspects of expert problem solving in order to develop better organizational and knowledge acquisition strategies. Such work requires that we be able to build upon the extensive experience in knowledge engineering within the Sumex-AIM communities. Specifically, we first need to investigate the number of existing programs in order to determine the degree to which they satisfy the design goals which we will be establishing. We then hope to use the program construction tools that are available in order to build prototype systems to illustrate our ideas. C. Critique of Resource Management (None) I1I. RESEARCH PLANS A. Project Goals and Plans Near term. The research for which we wish to use the Sumex-AIM resource has two subcomponents, one directed at selective reimplementation of the diagnosis program (described above) and the other directed at extensions of the research more broadly. Reimplementation of the diagnosis program will be carried out first, using AGE and UNITS knowledge engineering tools, and Interlisp. The present program is a blend of a semantic network representation (for clinical knowledge of disease) in a production system/blackboard control structure (heuristic activation of portions of clinica? knowledge). This structure is nicely congruent with the facilities provided through UNITS and AGE linked togethcr. This relatively well specified reimplementation task will provide an appropriate environment for learning about Sumex-AIM and the resources it provides. It will also provide an interesting comparison of performance of similar diagnosis programs implemented in different ways. The second subcomponent of the proposed research will focus upon data collection and patient management skills of the expert physician. Work thus far has focused largely on diagnosis because of our lack of familiarity with the medical domain and because of the paramount importance of methodological 243 E. A. Feigenbaum SOLVER Project P41 RROO785-08 development, as discussed in the next section on knowledge capturing. In pediatric cardiology, we now have the substantive knowledge and methodological tools required to approach medical problem solving more broadly, including knowledge and procedures retated to both collection of patient data for diagnostic purposes and patient management. In collaboration with experimental psychologists and physicians at Minnesota, and computer scientists and physicians at Tufts University in Boston we propose to investigate the stimulus information utilized by physicians in various patient data sources (X-ray, EKG, and heart sounds). Long range. We propose to investigate the "knowledge capturing" process that occurs in the early stages of the development of expert systems when problem decomposition and solution strategies are being specified. Several related questions will be addressed: What are the performance consequences of different organization approaches, how can these consequences be evaluated, and what tools can assist in making the best choice? How can organizations be determined which not only perform well, but are structured so as to facilitate knowledge acquisition from human experts? B. Justification and Requirements for Continued SUMEX Use We are currently using a CDC Cyber 74 and Cyber 172 for most of our work in problem solving. The Cyber computers are not well suited for interactive computing and have serious limitations with respect to address Space and available support software. C. Needs and Plans for Other Computing Resources beyond SUMEX-AIM We expect delivery on a VAX-11/780 in June 1981 at the University of Minnesota. Hopefully, much of our work will eventually be implementable on the VAX. However, this depends both on our ability to acquire additional peripheral hardware and on increased availability of AI software for the VAX. Until that time, access to a TENEX facility is extremely desirable. D. Recommendations for Future Community and Resource Development (None) E. A. Feigenbaum 244 P4t RROO785-08 Pilot Stanford Projects I1.A.3 Pilot Stanford Projects The following are descriptions of the informal pilot projects currently using the Stanford portion of the SUMEX-AIM resource pending funding, and full review and authorization. 245 E. A. Feigenbaum Protein Secondary Structure Project P41 RROO785-08 II.A.3.1 Protein Secondary Structure Project Protein Secondary Structure Project Robert M. Abarbanel, M.D. University of California Medical Center University of California at San Francisco I. SUMMARY OF RESEARCH A. Project Rationale Development of a protein structure knowledge base and tools for manipulation of that knowledge to aid in the investigation of new structures. System to include cooperating knowledge sources that work under the guidance of other system drivers to find solutions to protein secondary structure problems. Evaluations of structure predictions using known proteins and other user feedbacks available to aid user in developing new methods of prediction. B. Medical Relevance and Collaboration Many important proteins have been sequenced but have not, as yet, had their secondary or tertiary structures revealed. The systems developed here would aid medical scientists in the search for particular configurations, for example, around the active sites in enzymes. Predictions of secondary structure will aid in the determination of the full "natural" configuration of important biological materials. Development of systems such as these will contribute to our knowledge of medical scientific data representation and retrieval. C. Highlights of Research Progress This is a relatively new project at SUMEX. During the last year, a representation of protein sequences and rules for their manipulation have been built using the UNITS system developed at SUMEX. At this time, various existing structure prediction algorithms are being implemented to explore the needs for tools in a fully developed system where a scientist may describe a new prediction scheme in nearly natural language, and see it run on protein sequences with aids to discovery of errors and problems in the new methods developed. D. List of Relevant Publications None. E. Funding Support New Investigator Award proposal pending with NLM. E. A. Feigenbaum 246 P41 RROO785-08 Protein Secondary Structure Project Il. INTERACTIONS WITH SUMEX-AIM RESOURCE A. Medical Collaborations None. B. Sharing and Interactions with SUMEX Projects This project is closely allied with the MOLGEN group, both in computer and scientific interests. Some pattern matching methodology created for the protein data base has been adopted and used in the various DNA knowledge bases. The principal persons in the MOLGEN group have contributed to this project's use and understanding of knowledge base software and resources. C. Critique of Resource Management System load has been a significant problem. Except during late night or early morning hours, use of complex packages like the UNITS editor or other system available text editors is a major burden. Often the schedule manager has moved the active processes to background. This is a problem shared with many others in the community using knowledge base tools. It is a common practice to transfer files to other computing resources for interaction during daylight hours. Communications via MSG have aided this project in establishing connections with other researchers and groups with parallel interests. TIT. RESEARCH PLANS A. Project Goals and Plans Near-Term: Design of rules modules in UNITS editor that will allow an inexperienced user to express algorithms for structure prediction in near- natural language. These prediction schemata will then be translated into appropriate combinations of invocations of knowledge sources, editors, and feedback tools to aid the user in further refinement of his algorithms, Long-Term: Systems to be developed for the discovery of rules and/or algorithms that can transform a protein sequence into a secondary siructure, possibly with implications about tertiary structure. This will involve an effort along the lines of the meta-DENDRAL project. B. Need for Resources 1. SUMEX Resources Environment of knowledge base tools and people is the primary motive for doing this work using SUMEX. Access to both established and developing systems aids this project in setting down standards of excellence, forward thinking about computing tools and methodologies, and active exchange of techniques and ideas. The close collaboration with the MOLGEN researchers is particularly useful in this regard. 247 E. A. Feigenbaum Protein Secondary Structure Project P41 RROO785-08 2. Other Computing Resources A soon to be established network connection with the Computer Graphics Laboratory at UCSF will provide access to .1) the latest in protein structural information, and 2) color line drawing graphics facilities for evaluation and display of this projects product. A real time display using color graphics will become a possibility. Connections with the HPP Unix system may also prove useful for program sharing. E. A. Feigenbaum 248 P41 RROO785-08 Ultrasonic Imaging Project II.A.3.2 Ultrasonic Imaqing Project ULTRASONIC IMAGING PROJECT James F. Brinkley, M.D. W.D. McCallum, M.D. Depts. of Computer Science, Obstetrics and Gynecology Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The long range goal of this project is the development of an ultrasonic imaging and display system for three-dimensional modeling of body organs. The models will be used for non-invasive study of anatomic structure and shape as well as for calculation of accurate organ volumes for use in clinical diagnosis. Initially, the system will be used to determine fetal volume as an indicator of fetal weight; later it will be adapted to measure left ventricular volume, or liver and kidney volume. The general method we are using is the reconstruction of an organ from a series of ultrasonic cross-sections taken in an arbitrary fashion. A real-time ultrasonic scanner is coupled to a three-dimensional acoustic position locating system so that the three-dimensional orientation of the scan plane is known at all times. During the patient exam a dedicated microcomputer based data acquisition system is used to record a series of scans over the organ being modelled. The scans are recorded on a video tape recorder before being transferred to a video disk. 3D position information is stored on a floppy disk file. In the proposed system the microprocessor will then be connected to SUMEX where it will become a slave to an AI program running on SUMEX. The SUMEX program will use a model appropriate for the organ which will form the basis of an initial hypothesis about the shape of the organ. This hypothesis will be refined at first by asking the user relevant clinical questions such as (for the fetus) the gestational age, the lie of the fetus in the abdomen and complicating medical factors. This kind of information is the same as that used by the clinician before li@ even places the scan head on the patient. The model will then be used to request those scans from the video disk which have the best chance of giving useful information. Heuristics based on the protocols used by clinicians during an exam will be incorporated since clinicians tend to collect scans in a manner which gives the most information about the organ. For each requested scan a prototype outline derived from the model will be sent to the microcomputer. The requested scan will be retrieved from the video disk, digitized into a frame buffer, and the prototype used to direct a border recognition process that will determine the organ outline on the scan, The resulting outline will be sent to SUMEX where it will be used to update the model. The scan requesting process will be continued until it is judged that enough information has been collected. The final model will then be used to determine volume and other quantitative parameters, and will be displayed in three dimensions. 249 E. A. Feigenbaum Ultrasonic Imaging Project P41 RROO785-08 We believe that this hypothesize verify method is similar to that used by clinicians when they perform an ultrasound exam. An initial model, based on clinical evidence and past experience, is present in the clinician's mind even before he begins the exam. During the exam this model is updated by collecting scans in a very specific manner which is known to provide the maximum amount of information. By building an ultrasound imaging system which closely resembles the way a physician thinks we hope to not only provide a useful diagnostic tool but also to explore very fundamental questions about the way people see. We are developing this system in phases, starting with an earlier version developed at the University of Washington. During the first phase the previous system has been adapted and extended to run in the SUMEX environment. Clinical studies have been initiated to determine its effectiveness in predicting fetal weight and left ventricular volume. At the same time computer vision techniques are being studied in order to devetop the system further in the direction of increased applicability and ease of use. We thus hope to develop a limited system in order to demonstrate the feasibility of the technique, and then to gradually extend it with more complex computer processing techniques, to the point where it becomes a useful clinical tool. B. Medical Relevance This project is being developed in collaboration with the Ultrasound Division of the Department of Obstetrics at Stanford, of which W.D. McCallum is the head. Fetal weight is known to be a strong indicator of fetal well-being: small babies generally do more poorly than larger ones. In addition, the rate of growth is an important indicator: fetuses which are “small-for- dates" tend to have higher morbidity and mortality. It is thought that these small-for-dates fetuses may be suffering from placental insufficiency, so that if the diagnosis could be made soon enough early delivery might prevent some of the complications. In addition such growth curves would aid in understanding the normal physiology of the fetus. Several attempts have been made to use ultrasound for predicting fetal weight since ultrasound is painless, noninvasive, and apparently risk-free. These techniques generally use one or two measurements such as abdominal circumference or biparietal diameter in a multiple regression against weight. We recently studied several of these methods and concluded that the most accurate were about +/-200 gms/kg, which is not accurate enough for adequate growth curves (the fetus grows about 200 gms/week). The method we are proposing is based on the assumption that fetal weight is directly related to volume since the density of fetal tissue is nearly constant. We are hoping that by utilizing three dimensional information more accurate volumes and hence weights can be obtained. In addition to fetal weight, the current implementation of this system is now being evaluated for its ability to determine other organ volumes. In collaboration with Dr. Richard Popp of the Stanford Division of Cardiology we have started to evaluate the system on in vitro hearts. Left ventricular volumes are routinely obtained by means of cardiac E. A. Feigenbaum 250