Thomas Lengauer

Thomas Lengauer

Director at the Max Planck Institute for Informatics, Saarbrücken, Honorary Professor, Rheinische Friedrich-Wilhelms-Universität Bonn and Universität des Saarlandes, Saarbrücken

Significantly improving the effectiveness of drug therapy of Aids patients, Thomas Lengauer and his collaborators have developed bioinformatical models for predicting the resistance of HIV viral variants to applied drug combinations. Their research gears to break new grounds in effective Aids therapy, especially in the later stages of the disease, when the virus has evolved to be highly resistant. Lengauer received prestigious acknowledgements like the Karl Heinz Beckurts Award, the Konrad-Zuse Medal of the German Informatics Society, and the membership of respected science academies like acatech and Leopoldina. This computational biologist provides physicians with important software tools for diagnosis and therapy, based on large databases of clinical and virological data on HIV therapy. Decoding the building blocks of life with mathematical-statistical methods, Lengauer’s work opens up new therapy options for Aids patients for whom ‘traditional’ alternatives are hard or impossible to find.

Breaking the Wall of Viral Drug Resistance. How Bioinformatics Can Help to Improve Aids Therapies


On November 9th 1989, I was at home in Paderborn and watched everything on TV. The wall was a very present reality in my youth; it separated my grandparents from the rest of my family.
I am very grateful that I can talk on this occasion, at this very historical place. I am going to be the first of a sequence of two talks on HIV; both of them try to break walls in different respects. I am more concerned with trying to make the therapies against HIV that are available today as optimal as possible. The next talk is going to explore new approaches to therapy and HIV.

Aids, caused by HIV, is one of the big infectious killers of humans. You see the numbers of newly infected people in 2008; and the main problem that we have to deal with today - the next speaker is introducing to steps to overcome it- today’s therapy cannot eradicate the virus from the infected patient. It means: once you got it that it is an alliance for life. So, the therapy goal can only be to suppress the viral replication inside the patients’ bodies and in this way to ease symptoms and to prolong life.

This is also a very difficult task that I want to illustrate in this diagram here. As the patient harbours a lot of viral particles, and these viral particles are not all the same. They are quite diverse- indicated here by different colours. Drugs now suppress the viral life cycle, the viral replication, but they do so to different degrees with different viral variants. Here we have a drug, which we will call drug “A”, which is especially good at suppressing the blue variant and the black variant; but it is not good at suppressing the green variant, so the green variant enriches and the therapy becomes quickly ineffective, because this resistant green variant is now dominating.

A measure against that could be to administer another drug “B” that specifically targets the green variant. Now basically the barrier for the virus to break through into resistance is higher, because the drugs cover a broader spectrum of the viral variants. But you see we have this strong minority here that is very rare, the red variant, which eventually breaks through, because it is resistant against those drugs. Of course, the optimal picture would be to have a drug combination that somehow catches all of them; this is Utopia. The virus will always win in the end. The only thing we can do is to have this end to be as far in the future as possible.

So the major problem is: what drug should the patient receive who has developed resistance? This is the central problem that doctors treating Aids patients have to deal with today. The problem classically has been solved by way of medical expertise, but it is now becoming quickly more and more difficult to do so, because we have to deal with millions of virus variants out there. This is probably an understatement; it is probably, rather, billions. Also, the number of drugs for that very reason here becomes larger and larger. We have over two dozen drugs. You see we don’t give a single drug; we give several drugs, so we have hundreds or thousands of possibilities to administer combination drug therapies. So making this mapping, finding on the basis of the viral population, the best combination becomes a problem, which I think now is quite clearly a problem that might be solved by computers.

That is what we are doing. Our basis is a database of clinical information on more than a thousand HIV variants. All of these variants are characterised in two ways: once by their genome, the genotype, and the other is by their level of resistance against any of the drugs in our study. So, here we have a pictorial image here: this is a thousand viral variants. The red ones are the ones that are resistant against the given drug, and the green ones are the ones that are susceptible- whether the drug is effective. You basically have now one of these coloured pictures for each of the drugs in our study, which is between one and two dozen.
So this is our database, and for this database we now want to learn with mathematical methods. What would be the resistance level of a new viral variant against the drug: one of these millions of variants that are out there in the patients, but not among the thousands that are in our database? Of course, this is where the mathematics are; this is where the difficult methods are, and this is what I am not going to tell you. I just want to give you a small insight of how you can learn from such data- very banal idea- which is just look a single mutation, a single difference, in the viral genome. That difference can be carried by a virus or not. Either the virus has the mutation or it doesn’t.

So, the database is divided in two parts by such a given mutation. Those viruses that have the mutation- here on the left- and those viruses that don’t. If you now have a mutation that gives you this picture, where the viruses that have the mutation tend to be resistant, and the viruses that don’t have the mutation tend to be not resistant, then that mutation carries information about the viral resistance.

On the basis of these kinds of analyses, we can find out whether our viral genotype that is presented to us by the patient could be resistant or not. Of course there are many, many mutations, so we somehow have to analyse all of them in concert. This is where our mathematics comes in. The models are optimised in two kinds of respects. Our quality assessment is done, of course, in terms of accuracy. We want the predictions of the model in terms of what the resistance level of the virus would be, which the model computes from the viral genotype. We want that to be as accurate as possible. So, the first one is quite a mathematical thing, that you can optimise with mathematical methods. That is common status.

The second quality assessment is very important in the medical domain, and it is not quite as mathematical, because it concerns interpretability. What I mean by that: once the model outputs the answer and says, I think this virus is against that drug; that should not be the end of it. The doctor wants to hear why that is the case. The doctor wants to hear some plausibility, some argument, of how the computer arrived at its decision. That is what I mean with interpretability. This is all I am going to say about methods.

I just want to run you through an anonymised example of a real case that has been treated on the basis of our software. We have the software available on a server that is freely accessible over the Internet- under the URL you see here. What you do with the server is you input the sequence of the viral genome or the relative parts of the viral genome into the input page. Then the server does three kinds of analyses. The first analysis is just that it finds out the mutations: where does the virus differ from the virus that we would expect? That is information that doctors also would get without our computer. That is just something that comes out of the experimental assay.

In this case here, we have a patient that came into the treatment, the practice, several years ago, and which under classical circumstances would have been considered a hopeless case. Because this is just a small portion of the genome of this patient, and you have 16 mutations accumulated in this gene that just quotes for a single protein. The normal classical expertise, basically told the physician that there is no therapy option; there is all resistance in there. So the physician turned to our server, and our server did the second step of expertise.

But that second step analysis, and the output of that second step is this report: where the server computed the resistance level of the virus against any of these fifteen odd drugs. So there is a role for each drug and the column, which names the drug; a column, which gives a number that quantifies the level of resistance: the larger the number, the more resistant the virus inside the patient is to that drug. This colourful part here is the interpretation part. Namely here you see mutations that the patient has acquired, the virus inside the patient has acquired, and these mutations are coloured red if they increase the resistance. They are coloured green if they lower the resistance. Such mutations exist too.

This is what the doctor got, and as a matter of fact you see that this patient is quite hopeless, because there are so many mutations and all these levels are very high. A level that is above four, basically says that there is substantial resistance. So what was the therapy that the doctor now found? As a matter of fact, the argument that I am giving you now was the result of an expert consultation between several doctors and labs. It was not a standard argumentation to be done in normal clinical practice, because this is such a difficult patient.

And this therapy was given. So this drug we can understand. It has one of the lowest resistance levels in the whole suite, as you can see, so we could expect this drug to be effective, but why this drug- such a high resistance level- why would that drug be given? Well, that is actually not given in order to suppress the viral life cycle, but because this committee found out that this drug, Saquinavir, is effective, because this mutation sensitises the virus to the drug. You see the green colour. This mutation, at the same time, a resistance mutation of this drug; so keeping this drug in the regime was specifically there to fix the mutation, to make sure that that mutation doesn’t go away. So this drug stays effective.

Here we have a complicated argument that doesn’t just go on resistance level, but it talks about the viral escape into resistance. What would the virus do when it is presented with this therapy? So, of course, we were interested in automating this process and making it accessible- also in settings in which you don’t have all the experts together that talk about it for hours in order to find out what to do.

So we have a second line of analysis that we prepared that actually computes the escape of the virus into resistance and ranks therapies. That is the output of that part of the server. You see here the top ranking. These are therapies ranked by their effectiveness. That is the top-ranking therapy. That is exactly the therapy that the expert committee found out. It only has a success probability of 62%, but in that patient it turned out to be successful. As far as I can tell, it was successful at least until half a year ago- I don’t have new data- and that was over six years, and that was hopeless case. So that is something that shows you that, especially in hard situations, such software can help in an essential fashion.

So this is just a slide showing that the software is used in clinical practice. It reduces the error rate from one in four to one in seven. It is used for treating two thirds of the Aids patients in Germany; it is called from over thirty countries. One of the offers that I haven’t described here, in particular, has taken over 200,000 enquiries in the last three years. It is basically a unique offer worldwide that people use in many countries in order to treat their patients.

I want to make in the last minute- resolve a caveat- I talked so much about the diversity of the viral population inside the patient, and then I said you paste a single sequence into that server. How can you describe a diverse population by a single genome? Well, you can’t. If you want to do that here in terms of colours, you would just give the average colour, which would be something between green and blue. It would disregard the red, but you see the red colour is the one that breaks the therapy in the end. So, you see that that cannot possibly work.

Therefore, we have a new sequencing technology, which actually can resolve the whole quasi-species here, the whole population. So, instead of getting single sequences, you now get thousands of sequences. We have adapted our server to that kind of analysis. Here you see two patient examples. Lots of sequences, thousand sequences arrayed here. The top colours mean the virus is resistant; the bottom colours mean that the viral is susceptible. Here over 95% of the viruses in the population are susceptible, so you can give the drug. This is a patient that has half of the population has a resistant virus. So, don’t give the drug. We have increased the power of our server to deal with this new data, and we are currently analysing what exactly that means clinically: how much better our predictions become.
At the end, I just want to say: this is a big interdisciplinary project. The red people are mathematicians, computer scientists; the green people are virologists; the blue people are clinicians. They all have helped get us to success- for the last ten years. I thank them, and I thank you for your attention.