|
Bioinformatics and Computing
Curricula 2001 Why Computer
Science is well positioned in a post-genomic
world Mark D. LeBlanc,
Computer Science Betsey D. Dyer,
Biology Wheaton College Norton, MA 02766 This work was
funded in part from a SIGCSE Special Projects Grant. IntroductionDepending on the computer scientist that you talk to, mere mention of "bioinformatics" generates responses ranging from "the saving grace for finding a job" to "just another flash in the pan." Wherever you fall within this range, the fact remains that Biology has emerged as "the new kid on the (computing) block." While computer scientists have worked for decades with physicists, chemists, economists, astronomers, and engineers, it was rare to find collaborations between biologists and professionals/academics in computing.[1] But a combination of events including the sequencing of entire genomes and high-throughput drug discovery techniques has led to a steady growth of bioinformatics-related jobs even during the declining and neutral economies of the last few years. At the very least it is clear that a number of students who leave undergraduate programs in computer science will find themselves involved with research, programming, system administration, and/or data mining in biology-related jobs. So as a discipline, is computer science ready to deliver? We argue that not only is computer science as a discipline ready to deliver but that the discipline is well-positioned to produce significant players in the interdisciplinary field of bioinformatics. While some universities and technical institutes (e,g, Wright State University, Rochester Institute of Technology, Canisius College) have taken a lead with degree programs in bioinformatics, there is considerable latitude for other models that do not necessarily involve major curricular changes. Indeed, we would argue that smaller universities and colleges have considerable opportunities to contribute to the emerging field. A significant number of core knowledge areas ("knowledge focus groups" or KFGs) in the new computer science curriculum standards (Roberts, Computing Curricula, 2001) address the needs of the bioinformatics community. Based on our research and classroom experiences, ten of the fourteen KFGs (Table 1) expose students to the types of computational and analytical rigor that is needed for generating original hypotheses and running computationally-intensive experiments with DNA and protein sequence. In addition, we feel that the standards' topics "beyond the core" are right on target. Curricular development in bioinformatics can and should center on the applications and examples using most of the preexisting CS curriculum. In the following sections, we ask and attempt answers to the following questions: (1) Is it really computer science if we teach bioinformatics? (2) Can Bioinformatics be presented via an "infusion"
model? Or is a separate "Bioinformatics Major" essential? (3) How well does Computing Curricula 2001 prepare computer science students for roles in bioinformatics-related work? (4) How can an institution begin to include bioinformatics-related material into their already full curriculum? (1) Is it really computer science if we teach
bioinformatics/genomics? Yes. In fact, an underlying motivation for this SIGSCE initiative (at least for LeBlanc) has been to show that the core knowledge areas as recommended in Computing Curricula 2001 are excellent preparation for future bioinformaticists (see Question #3). Said differently, bioinformatics/genomics is so tied to algorithms, systems, and information storage, retrieval and analysis that much of our core curriculum provides what Bruce et al. (2003) call "just-in-case" education for work in bioinformatics/genomics/proteomics. For example, a working knowledge of recurrence relations will probably not be needed in the first day in a bioinformatics research group but at some point someone in the group must be wise enough to find a closed form on a recursive solution under consideration. That is, recurrence relations are not an example of a "just in time" topic but rather a "just in case" topic. Speaking from four+ years of experience in genomics undergraduate research (including the classroom) , "just in case" can be translated to "at some point, count on it." (2) Can Bioinformatics be presented via an "infusion" model? Or is a separate "Bioinformatics Major" essential? To be honest, this emerging question is a political minefield. Our experience to date, including our personal collaborative work, our interdisciplinary teaching, and our efforts with our respective departments has shown us that there are multiple answers to these questions and that multiple answers are OK! A Piece of the Pie: First, bioinformatics is clearly interdisciplinary, but the disciplines do not solely include computer science and biology. Mathematics, chemistry, and physics can be vocal proponents for a piece of the pie. Our participation in the MAA-NIH-NSF sponsored "Meeting the Challenges in Emerging Areas: Education across the Biological, Mathematical, and Computer Sciences" (Bethesda, MD, March 2003) served as a case in point. In this gathering of mathematicians, biologists, and computer scientists, a divergent group of views were aired as to "what was needed in order to study bioinformatics." A number of mathematicians and biologists involved in the "more math" emphasis in the BIO 2010 report (Whitacre, 2003) argued that an undergraduate would need lots of math courses, including Calculus I, II, and III, linear algebra, discrete math, etc. Biologists were loath to drop much of their molecular biology requirements and a number of the computer scientists in attendance left the meeting wondering if computing was even seriously considered (see Congdon et al., 2003). Molecular Biology as a Similar Example: A parallel development (and accompanying crisis in the curriculum) occurred with molecular biology a couple of decades ago. At that time, many universities established separate degree programs in molecular biology to acknowledge that more and more biology was intertwined with (and supported by) chemistry, especially that of macromolecules such as DNA, RNA, and proteins. Yet today, molecular biology is so pervasive that separate programs may seem redundant. All biologists, regardless of specialty must, to some degree, consider the essential roles of DNA, RNA, and proteins. In actual practice, molecular biology is not a separate discipline but is a well-integrated part of any comprehensive biology program. Infusion is a Realistic Model in many Cases: We envision a future in which bioinformatics as a topic for applications and examples pervades through many areas of the computer science curriculum and that computing likewise pervades the biology curriculum. Furthermore applied courses in mathematics, statistics, chemistry, and even some physics courses might be enriched by including examples and concepts from bioinformatics. Any college or university no matter what how its resources might be allocated can begin to infuse bioinformatics and computer science is perhaps most conveniently poised to do so, having most of the curriculum already in place. Much of our confidence with the preexisting computer science curriculum and infusion is based on our success at Wheaton College (Norton MA) with exposing students to bioinformatics/genomics. In particular, computer science (and other) students need interdisciplinary experiences where they find themselves outside their domain of expertise. Surely they need to bring a rich set of "traditional" expertise to the table, but successful work in this new field is significantly dependent on students' abilities to work together on problems where conventional solutions are not obvious. See our answer to Question #4 for more suggestions for bringing bioinformatics/genomics instruction to our students outside the framework of a formal major. A Bioinformatics Major appropriate for some settings: For those institutions (e.g., RIT, Wright State, Canisius) with the resources for a separate curriculum, bioinformatics as a major is clearly one of the more rigorous undergraduate programs. Coordinators of these new majors tracks can attest to the diplomacy that is needed when agreeing on the amount and extent that students study traditional subjects in biology, mathematics, and computer science. When one views the list of requirements for one of these bioinformatics majors, we think it is safe to answer "yes", that a program in bioinformatics involves considerably more work than any of the individual programs of biology, math, or computer science. From our liberal arts perspective, this is especially evident. More directly, given our curricula focus on content outside the major area of study, sometimes up to two-thirds of the entire four-year experience, we do not see a bioinformatics major appearing at our campus. In fact, some of our present students working in our Genomics Research Group are double majors, majoring in both computer science and biochemistry. (3) How well does Computing Curricula 2001 prepare
computer science students for roles in bioinformatics-related work? We have started to identify computationally rich examples from bioinformatics that map to core units in Computing Curricula 2001 and to investigate ways of incorporating them into the computer science (and biology) curriculums. In this preliminary work, we identify applications and algorithms in bioinformatics that can serve as examples for teaching the Programming Fundamentals (PF) and Algorithms and Complexity (AL) areas from Computing Curricula 2001. Our anticipated outcomes include the fruitful application of these materials both within existing computer science (and biology) courses, as well as in new courses that build on the interdisciplinary nature of the field by engaging computer science and biology students in collaborative work. Knowledge Focus Groups, Body of Knowledge, and Core Units: Computing Curricula 2001 (CC2001, p85) embodies 132 units in the "body of knowledge," 64 of those units being considered as "core material." These 132 units are distributed across 14 knowledge focus groups (KFGs), e.g., Programming Fundamentals (PF) and Algorithms and Complexity (AL). As we consider our genomics research with undergraduates over the past four years, we estimate that our work intersected with 10 of the 14 KFGs (71%), involving at least 36 of the 132 units (27%) with 29 of those 36 units being "core units". Said differently, our research in genomics relied on and used content from 45% of the core topics in CC2001. In Table 1, we show the 36 units from CC2001 that we estimate that we have used in our genomics research. For some of these we provide links to course materials that we have developed that reinforce and apply the knowledge unit(s). For example, we have developed interdisciplinary labs and programming assignments for our "linked" courses "Algorithms" and "Genetics" (see Question #4 for details on "linked" courses). These assignments involve recursion (PF4), basic algorithmic analysis (AL1), algorithmic strategies (AL2), software design (SE1), software requirements and specifications (SE5), and software project management (SE8). In addition to our own course material development, we especially hope this Table will serve as an open invitation for like-minded faculty to development bioinformatics/genomics course materials for their own courses and areas of expertise. Although examples are beginning to emerge (e.g., see D'Antonio, 2003; Krane and Raymer, 2003; and DeJongh, 2004), most faculty will need a healthy selection of example materials to help them infuse bioinformatics content into their existing and possibly new courses.
Pedagogical focus groups: In addition to the
knowledge units, CC2001 features six pedagogical focus groups that consider
curricular issues across computer science as a whole. In particular we wish to
highlight that our work with infusing genomics into the computer science
curriculum is consistent with the pedagogical efforts to (i) support topics and
courses outside of traditional computer science, including science (PFG2a);
(ii) expose students to undergraduate research (PFG5d); and (iii) create new
courses and content that present a rigorous challenge for non-CS majors,
"computing across the curriculum" (PFG6) (CC2001, p4). (4) How can an institution begin to include
bioinformatics-related material into their already full curriculum? We strongly believe that any college or university no matter how its resources might be allocated can begin to infuse bioinformatics and computer science into the computer science and biology curriculums already in place. In particular over the last few years, we have "linked" computer science and biology classes together. We define "linked" courses as two independently run courses that share genomics/bioinformatics as a common thread in their respective syllabi and that share time in the form of guest lectures, some common lab sessions (e.g., four out of 12 labs over the semester), collaborative programming assignments outside of lab time, and final interdisciplinary team projects and presentations. Linked courses offer faculty a flexible way to infuse genomics content into appropriate courses, gain the benefits of interdisciplinary experiences, but still maintain control of most topics in the syllabus. For example, both "Genetics" and "Algorithms" are core courses in small departments with a considerable amount of "traditional" material that must be covered. Of particular importance is our goal to develop course materials that focus faculty and student attention on genomics research, facilitate various types of collaborative work between students, and are easy to integrate into various combinations of course linkages (Dyer and LeBlanc, 2002). To meet our objective of reaching a significant number of majors in computer science and across the biological sciences, we have been careful to include a core course from each discipline ("Algorithms" for computer science and "Genetics" for biology). However, our ongoing plan is to develop course materials that facilitate collaborations around genomics content that are not necessarily specific to a particular course. We consider this to be one of the more significant challenges of our educational materials development. In preliminary efforts, we linked "Algorithms" with both "Cell Evolution" and with "Genetics" and experimented with a re-use of course materials. We submit that other faculty will find "Artificial Intelligence", "Software Engineering", and "Databases" to be potential computer science "links". Linking two courses assumes a
range of teaching options for faculty, from sharing one programming assignment
to collaborating in multiple labs, homework, and final projects throughout the
semester. An underlying assumption in our plan is that new exciting research questions will be answered by new software (or at least modifications to existing
software). Our personal research collaboration and classroom experiences to
date have convinced us that successful collaborations emerge over time, not,
for example, in one or two isolated laboratory sessions. Thus, we intend to continue
to develop materials that bring students from both disciplines together in
class and lab and then later in follow-ups outside of lab, for example, in a
joint homework assignment, in modifications to a programming assignment, design
of an experiment in genomics, or preparations for final project presentations. References Bruce, K.B., Drysdale, S., Kelemen, C., and Tucker, A. (2003). Why math? Communications of the ACM, v46(9), 40-44. Congdon, C., Dougherty, J., Evans, D., LeBlanc, M., Currie Little, J., Chu Prey, J., Stojkovic, V., and Tymann, P. (2003). Computing, Biology, and Bioinformatics - Chapter for Report on Education across Biology, Computing, and Mathematics." Unpublished report. D'Antonio, L. (2003). Incorporating Bioinformatics in an Algorithms Course. Inroads - SIGCSE Bulletin, v35(3), 211-214. DeJongh, M. (2004). Materials for an Introductory
Course in Bioinformatics. http://www.cs.hope.edu/~dejongh/bioinformatics/sigcse/hopecourse.html Krane, D. and Raymer, M. (2003). Fundamental Concepts of Bioinformatics. Benjamin Cummins Press, San Francisco. Roberts, E., Ed. (2002). Computing Curricula 2001: Computer Science Final Report, IEEE Computer Society, New York, April 2002. Whitacre,
Paula T. (2003). BIO 2010 - Transforming Undergraduate Education for Future
Research Biologists. A report from the National Research Council of the
National Academies, the National Academies Press, Washington D.C. [1] The exception of course is in the biological modeling community. For example, ecological modelers have a tradition of interdisciplinary work. Maintained by: Mark LeBlanc Dept of Math & Computer Science Wheaton College, Norton, Massachusetts |
||||||||||||||||||||||||