David Harel

David Harel

William Sussman Professor, Dept. of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Israel

One-day experiments on living beings will be, to a large extent, substituted by interactive simulations on computers. A decade ago, computer scientist David Harel, winner of the Israel Prize, the ACM Software System Award, the Emet Prize, and four honorary degrees, proposed the scientific grand challenge of modeling a full multi-cellular organism as a reactive system. He suggested the 1000-cell C. elegans nematode as the model organism. In an estimated 10-15 years of intensive work, a multidisciplinary team could construct a full model of this organism, or a similar one, in its development and behavior. The model would be dynamic, interactive, and zoomable, allowing changes and probes on the cellular and molecular levels. Such tools would allow cyber-experiments in order to answer questions too complex for laboratory techniques, including comparisons not only within a species, but also with evolutionarily related species with different forms and behaviors. 21st century research in the life sciences and medicine is poised to undergo a major transition, in which computer science would play a central role, similar to the role of mathematics in the physical sciences of the 20th century.

Breaking the Wall of Biocomplexity. How a Reactive Systems Approach May Lead to Full Dynamic Models of Multi-Cellular Organisms


I am a computer scientist, but I am going to be talking about biology, and essentially about building computerized models of biological systems. One of the questions we ask ourselves is: why do we even do computerized modeling? What should we model, and how should we do the modeling? Once we have built a model, one of the most interesting questions is how do we know that the model we have is complete and valid? These questions don’t really have to be raised when you are building a human-made machine. If you are building a model of a telephone or of an aircraft, we know the answers to these questions easily. However, they are especially acute if we want to build a computerized model of something in nature. If you want to build a model of the weather, for example, how do we do it, and why should we do it are important questions, and question colred red on the slide is the most important: how do we know when we have finished and the model is valid?

A key issue that underlies the kind of work that I do in this area is drawing a line between what we call a reactive system and the kinds of systems we find in biology. In computer science a reactive system is one whose complexity comes from things that you do. You press a button — the temperature rises above a certain level, a valve opens, something shuts down; you put your card into your Bankomat, your ATM — and it causes something to happen. These are reactive systems. Our claim is that what makes biology very complicated to understand are exactly the same kinds of things that make understanding an F-15 or Windows or some other complicated human-made system understandable.

So, here is a very complicated man-made, or human-made, reactive system. Here is a very complicated God-made, or Buddha-made, or whatever you want made, nature-made, reactive system. I like to say that the similarity between these two things is a lot greater than the difference. If there is a major difference between an F-15 and an elephant, the difference is that the elephant is orders and orders of magnitude more complicated. But the difficulty in understanding one is very similar to the difficulty in understanding the other. The whole idea behind the kind of work I am talking about today is to try to employ techniques that we use in order to build the things on the right in order to understand or reverse-engineer the things on the left. So, if you want a sound-bite sentence, how about trying to reverse-engineer an elephant, or a worm, or a heart, rather than to engineer an F-15. Coming from the Middle East, I have other reasons to try to do less of the latter and more of the former...

One of the questions that arises is: what should we be modeling? My approach is, don’t just take a question and try to answer it using a model, but try to do the whole system; be comprehensive. Which raises two questions: I call them horizontal delineation and vertical delineation.

Horizontal delineation is: what is the whole system? That is what is the precise borderline between what you are going to model and what is outside the model. There are various possibilities that come up. One is: try to do an entire cell. There was a wonderful paper published a few months ago about a model of a cell. Some of my friends say to me, “You are talking about elephants and worms; let’s see you even do one cell or two cells!” That is a good idea. But then you might want to try to do a heart, or a liver, or a toenail, or a tail; or perhaps an entire organism— an elephant, for example, or a fly. But you can also think about doing a whole herd of elephants or a colony of ants. The point right now is not to make a decision here but to understand that you have to make it very clear what is going to be the system you are modeling and what is outside of it and is part of the environment.

Then there is vertical delineation. You have to make a decision about the level of detail, that your model is going to have to deal with. You can say, “I am going to stop at the intercellular level. I am going to talk about cells and about communication between cells. Each cell will just be a black box.” But then, if you do that, it is quite obvious that you have to go to some extent inside the cells and talk about some of the major molecules therein. If you do that, people might say, “If you go there, then you also have to talk about genes, about DNA, and so on”. But I also have friends who say, “If you go to that level, you really have to get into the biochemistry and maybe even into physics”. We might have to get into quantum effects, string theory, or whatever. Again, my point here is not to make a decision right now but to explain that when you start out you have to make decisions ahead of time about what level of detail you are going to have to deal with. You cannot talk about elephants, and then allow someone to ask “what happens when I turn off this gene?”, if you haven’t told them ahead of time that you are not modeling on the level of genes.

The crucial point, however, in doing comprehensive modeling of a complete biological system, is this: you want to do elephants, fine: go ahead and interview all the world’s elephant experts and put everything they know somehow consistently into your model. The interesting point is that if you do that, you can do anything else that they don’t know any way you want. So, for example, if you know the elephant is here, and a month later it is here (this might be spatial movement or it might be developmental), but no one seems to know how it goes from here to there. You are doing computerized modeling in silicon. You can buy some software and make it go like this. The reason that is ok is because if this gentleman comes along, he is our elephant expert and he looks at the screen and says, “No, it doesn’t go like that; it goes like this.” Then you have left out of your model something that is known about elephants. If somehow you put everything that is known into your model in a consistent way, you can make up the rest. Not that that is easy, but that is, at least, in principle, possible in a computerized model.

So, the challenge here, or the dream —the wall that I would like to break, or I would like someone to break, I call it “the whole organism challenge.” It is to construct a “full”, “correct”, true-to-all-known-facts, 4-dimensional model—not of a single cell, which is indeed very difficult and partly it has already been done, but—of a multi-cellular organism, namely, an animal. “Full” and “correct” are in quotes, because the full depends on the level of detail and the correct depends on something that I haven’t said yet. But, in general, I hope you get the idea. We want it to be consistent with everything that is known. We want it to be seen in three dimensions but to move and be animated over time, and so on.

Some of the properties this model should have, I will just run through them quickly: it should be realistic; it should show the elephant, or whatever it is, developing from a single cell and also moving around, eating, having breakfast, and so on. It should be interactively executable, modifiable and zoomable; obviously it should not just be a nice movie, but it should be mathematically precise and rigorous. But, the mathematics should somehow be hidden under the surface so that biologists should be able to do the modeling. Someone once said to me, “Why don’t you say, even biologists?” So, that sounds kind of deprecating, but it is fine, because biologists are trained to do things that I have no idea how to do and vice versa. And, yes, we want biologists to be able to do the modeling themselves. So, a bunch of differential equations is probably not going to be the right kind of way to do the model—at least on the surface.

Why? Why shouldn’t we be satisfied with having a question that we want answered, building a mathematical or computerized model around that, getting an answer, and going on to the next thing. Why do we want to do, or why do I want to do, the big thing? Well, scientific altruism: I would like to truly understand life. I think this is a wonderful way to understand life in detail, but there is a lot more to it. Here is a partial list of some of the almost obvious gains that you might have if you are able to build a full interactive computerized model of an animal: You can uncover gaps, correct errors, form theories, predict new phenomena and suggest experiments to be done in the lab, discover emergent properties (which are things that kind of emerge out of the model on their own), verify theories against observations, and the sky is the limit: synthesis, drug construction, and so on.

How to do the modeling? The brief answer is that I don’t have time, of course, to give you the techniques and to show you what languages and tools we use from computer science and software engineering to do this, but one thing I do want to say and show you in a minute is: make it look good. We don’t just want numbers and colored dots running across the screen.

I am going to give you now some very modest examples of scratching the surface of this challenge—not in the sense of trying to do a whole animal, but in the sense of trying to capture some complicated phenomenon. The first project has to do with the thymus gland, where we modeled several thousands of T cells behaving and differentiating. An enormous amount of knowledge, not data, was somehow put into this project, which took almost five years to do.

So, you will see several thousand cells. Each of these cells has a very, very complicated program. So, it is an enormously complicated computation going on here. Any minute, on the left, about a third of the screen on the left, you will start seeing cells kind of conjugating around these long lines, which are epithelial cells. The details are not important. But, the thing that is going on in the left third of the screen is an emergent property: these cells are elbowing each other in order to try to carry out some chemical activity that causes the lucky ones to reach the right-hand side of the screen, and become fully fledged T cells. The black “X”s, the black crosses, are apoptosis; you see cells dying. Again, I don’t have time for the details here.

Now if we go to the next slide, I want to show you that in contrast to the previous one, which was really just a run of the system that we recorded, you can actually interact with the system. So, this is an example of how we sit in front of the screen, and we can make changes in the execution itself. We can zoom in; we can zoom in further; we can see exactly which cell is the parent of which other cell. If you look closely, you will see some of the receptors protruding. Those are active. Now we right click one of these cells, and we make changes. So, you are actually treading on the elephant’s foot, so to speak, in this case changing the receptor status. This is just a recording of how we played with the system. Now you see suddenly one of the receptors growing. We have just made that change, and now we can go on to the next one.

This example is the same system run under two different sets of initial circumstances. The first one is a wild-type run. In the second one, we inserted a change—I don’t want to get into the details— which we had a feeling was very important in deciding that these cells really compete to become T cells, and they have this emergent property on the left. Let’s see this now in operation: on the top screen, you will see a run very similar to the one we just saw; it is just recorded a bit more slowly. On the bottom is the same system, where we switched off a couple of things that we thought were relevant to this. What you will see on the top in a moment—again in the left third of the picture— you will see these hundreds and hundreds of cells, some of them committing suicide to let their siblings move ahead and the others competing fiercely to carry out these interactions with the epithelial cells. In the bottom run, almost nothing like that is happening. I would like to say that it is a bit like if you take us scientists and you switch off our scientific competition genes—I probably wouldn’t have travelled to Berlin today; maybe these gentlemen would have been playing golf or sitting down with their grandchildren and playing. The fierce competition that we have in order to break down the walls and carry out our science better comes from something. Here we made the change in the biology, and we got a very weak development of the thymus gland, of course—this happens to be a mouse—its entire immune system would collapse if that were really the case. I am showing this in order to give you an example of how you can do experimental biology in silicon and actually see on the screen the difference between the first experiment versus the second.

Another project that I want to show you a video of, is a pancreas. Here, we did organogenesis, which is trying to capture how an organ grows. As a side remark, which is not on the slide, 2012 is the 100th birth-year of Alan Turing. A lot of you know Turing from his work on the Enigma cryptographic machine in the 2nd World War. He also did a lot of work on biology. (So if we can run the video, please.) I will just show you the beginning of this. This is a pancreas growing. Again, the programming was done for each cell. A cell has a program; you throw several thousands of these cells together, and you get the shape of a pancreas growing essentially out of nothing.

I am going to skip this forward so that I can spend my last minute concluding, but this kind of goes on and becomes more elaborate. You can see on the bottom what the model looks like versus what the real thing looks like: you can see the similarity. We also had some, I think, rather amazing insights into why a pancreas looks like a pancreas and a lung looks a like lung, and liver looks like a liver. We were able in the same model to make changes to the computerized model and to get the same system to look like a liver (on the left) or like a lung (on the right).

The idea that I want to put forward is to build a complete model of the roughly one thousand-cell C. elegans nematode worm, which, as you can see from the video on the right, is completely transparent. As you can see on the left, we actually know exactly how it develops—at least in video format. There is an enormous amount of information about this one thousand-cell animal. This is something that I would like you to be able to—give water to a speaker in ten-years’ time at the Falling Walls Conference—and show an animated video of the C. elegans nematode being completely modeled, so that what you see here is not going to be a worm; it is going to be a fully computerized model from which we will understand a lot about life— Cheers!