Thomas Wiegand

Thomas Wiegand

Chair of Image Communication Department, Technische Universität Berlin; Head of Image Processing Department, Fraunhofer Heinrich-Hertz-Institut Berlin, Germany

3D – isn’t that the funny way of watching movies wearing clutzy glasses? For now that may be true, but in only fifteen years time, television as we know it will be over: we will be watching it in a radically new three-dimensional form. Then, we will look back at 2D television the way we look at old black-and-white movies today: the remains of another era. Thomas Wiegand (1970), professor at the Technische Universität Berlin and head of the Image Processing Department of the Fraunhofer Institute for Telecommunications Heinrich Hertz Institute is working on transforming television – and our viewing habits. Having studied Electrical Engineering at the universities of Hamburg and Erlangen- Nuremburg, he quickly became one of the pioneers writing the code for H.264/MPEG-4 AVC, the standard for video compression. His forays into coding, visualizing and streaming 3D environments are regarded as the most promising developments in the telecommunications standardization sector. In 2009, Wiegand received the Innovation Award of Vodafone Foundation for Research in Mobile Communications. Watch him transform our ways of seeing. No clutzy glasses required.

Breaking The Wall of the Flat World of TV. What Three Dimensional Television Pictures Will Look Like.


20 years ago I was at my grandmother’s birthday party. We had already celebrated too much to drive to the border.

Thank you very much. Happy Birthday grandma; she is going to be turning 86 this year, and is happy after all. I am going to be talking about television. I don’t even attempt to create a bridge to the three previous speakers. I am going to be talking about television. Without television there wouldn’t have been a breaking of the wall. If Mr. Schabowski wouldn’t have wouldn’t have announced that the wall was kind of down- well, he didn’t say it, and he actually didn’t mean it, then the wall wouldn’t have come down. Then people thought it is happening, and it was happening in a visual world, and actually it has happened in the real world later.

It is pretty interesting that some of you may not be aware that actually in 1989, also the digital media revolution started. At that time, the entire technical development towards Digital Broadcast and DVD actually took its starting point, and some parts of that were happening in Berlin. About ten years later, another major step happened in digital media, which was the birth of H.264/MPEG-4 AVC in Berlin. It is the basis of all HDTV. For instance, every HDTV receiver is going to use this codec, and it is going to be present there, and every Blu-Ray player in the future. Most of Internet video, as you may know, our young generation is consuming video and entertainment mostly over the Internet, and stopped looking at television and Blu-Ray disks, and countless mobile video. Within five years, about one billion devices have emerged from zero- within five years! What is next? What is going to come after this digital revolution of video? Video is just happening everywhere, as we see it.

One of the things that we think may happen is high resolution. You are being emerged into the scene. This is a high-resolution cinema that we have built at the Heinrich-Hertz-Institut here in Berlin. The lady you see there is the director of the Berlin Philharmonics. We are discussing with them about broadcasting symphonies. This is not only giving you a huge cinema this way, and you are just being immersed into the scene. It is also giving you 3D sound. You can place every sound source everywhere you like in the room.
There is one thing that goes beyond, and let me show you this movie. (Voices from movie: “I saw U2 in 3D! It was awesome!” “The whole movie is going to be defining a whole new genre.” “The technology, the 3D, Bono was there; Bono was on; he was on fire. It reached out to you. You felt his message even more, because it felt like he was there and reaching out to you. It was like a spiritual experience”) A spiritual experience? Oh, I don’t think we are there yet, but this is not the first time I am hearing that. 3D cinema: is it really happening? Actually, it is. The number of 3D productions that are coming out is growing tremendously. Obviously, those who produce these movies have some artistical motivation, but they also have a motivation to sell things. 3D movies are selling at a higher price.

These glasses that you see from the past are these anaglyph glasses, and today you have polarized and shutter glasses. Why is that? For 3D cinema you need to have two views to be transmitted. Some of the 3D effects you can see already from 2D. Our 2D television kind of works, but not exactly in 3D as you would like it to work.

Why glasses? If you want to see an object behind your display, and this is your display. The object that you want to visualize behind, the two dots where you look at the display, should be spatially apart. The left dot is on the left side, and the right dot is on the right side. When the objects are right on the display, the two dots coincide. When the two dots are in front of the display, the two rays should actually cross each other, and the two dots, again, are spaced apart. What glasses provide you with, they fulfil you the need to project a different image into each eye, and glasses provide your control by showing different pictures to the left and to the right eye.

Is that sufficient? Some of you wear glasses. Would you want to wear glasses at home? We have asked some opinions about that and quite a few people don’t like it.

The solution is an autostereoscopic display. An autostereoscopic display is a normal display, as you can see here, and it has a lens in front of it. This lens enables us to emit multiple views to different positions so that your eye can see two different views. You don’t need glasses, because the display is emitting these two views. Now, if I just emit two views, you just need to stand still and need to mount your head; so we need to be emitting multiple views.

Here are for example four views. But we actually need to emit more than four views. The collection of views that are emitted are a field of view, and that is repeated, because we cannot emit an infinite number of views. How many views are necessary? If you calculate it, our perception guys tell us that you should have a view every three centimetres. Your eyes are spaced about six centimetres apart. So if you say you and your beloved one want to watch television together, you probably need 1.5 metres of range, in order to have enough of this viewing area. You need 1.5 metres divided by three: 50 views. 50 views is a huge problem! It is a huge problem for transmission, so we need 50 times the bit rate for video to transmit to a person wanted to watch video. We need 50 cameras maybe to acquire it. Actually, the technical solution to this problem is different.

What we can do is the following. We can point two cameras on the object. This is a brick of the Berlin Wall 1989. We can point two cameras at this object. Then we can try to produce a continuous reproduction of all the intermediate views between these two cameras. Then, we can do this using a weighted averaging. How would that work? Lets say you have a left view and a right view. I wanted to do this, because I wanted to get you excited about engineering. I am the first engineering talk here. So, you have a left view and a right view. These are the two dots that show the same thing in these pictures at the same position. If I want to create an intermediate view, I am just taking these two dots, and I am averaging them. If, for instance, the view is closer to the left, I am going to be weighting the left view stronger. That way I am getting a weight from the left to the right, and I can sweep from left to right in order to show all these views.
If I do this, just as I explained, the result is going to look like this. It is not working. What is the problem? Lets look a little closer. Lets say we have this object at a distance “Z” from the two cameras. We need to model these two cameras. A very simple camera model is the camera plane and the focal points. The focal points have a distance of “X”. Now, the focal length, which is the distance of the camera plane from those focal points, is “F”. This object is emitting a ray, and the ray passes through the image plane into the focal points. When it passes through the image plane, it creates an intensity value, which is the actual picture that you are seeing.

Now, if I take the ray here, and I am moving it over to the right, this is the copy of this one here, you can see that there is a disparity between these two rays. This disparity is causing the problem that you saw on the first video. This is the disparity, and this is the first formula of today, is given as D = F * X/ Z. “F” is the focal length of the camera; that is how the camera is built. That is fixed. “X” is the distance between the two cameras. That is fixed, but I could adjust it. “Z” is actually the distance of the object from the cameras. That is the interesting point; the distance of the objects. I need to know how far objects are away in order to reproduce the views between my two cameras. This distance is actually varying. You can see that the disparity becomes smaller when the distance is larger, and it becomes larger when the distance is smaller. You can see this from the equation.

What you then need to do consequently is that you need to estimate these depth values. So things that are closer are getting a brighter value and things that are further away are getting a darker value. When you then take these values to support your interpolation, because they are calculating the disparity between these two views, you are actually getting a much better result.

This display is obviously not 3D. I cannot show you 3D on 2D displays; that is not working. Not even this conference is going to break that wall. We still need 3D displays. Outside, when you come in, you see a 3D display. There you can see our first results on autostereoscopy.

Let me summarise. In cinema, 3D is already coming. At home, we need 3D displays without glasses, for sure. Well, this is a brave sentence, but let me say it. In 20 years from now we will look at 2D video like we see black-and-white video now.

I would like to thank a lot of people that worked on this subject. We had also a lot of support. We are continuing to work, so the revolution that we are looking at is probably like the revolution when we moved from black-and-white to colour.

I have one minute left; so let me make a small personal remark. I was born in East Germany, and I was 19 when the wall came down. When then the money was changed, my parents owned a total of one trabant, some old East German furniture, and 4,000 Deutsch Marks. That was it. That was the starting point. Herr Turner, thanks for doing this, and thanks for having me. Thank you very much.