Stereo 3D Vision (How to avoid being dinner for Wolves) – Computerphile

you can get results from this where you can't get results from lasers lasers get bleached out in sunlight i had a colleague that i was speaking to who went to mexico to do crop scanning and he had a handheld laser scanner and he had to do it at night in a tent because the sun wrecked the laser scanner and there were walls about and it was a big problem for him if you'd have just used a camera you might have found that you've got to work harder on your stereo matching but there are things it will do but laser scanners can't so there's going to be a time for one at a time for the other the top tip for the day is use a stereo pair of cameras don't get eaten by walls yeah that would be my advice we find corresponding points in our left and right eyes and then we can use that to work out how far away from us something is when we have an individual eye on its own we have some monocular cues some monocular clues as it were that we can use to find out depth or at least to estimate depth but true 3d only comes from two eyes in a single eye you might have something like the object is bigger than it was before so it's coming towards us or one object is passing our view faster than another and that parallax and that gives us a clue that it's in front of something else occlusion is an obvious one if something actually is in front of something else we can make some reasoning about that so our brains will take those monocular clues cues and do something with them and work out what's going on but when we have two eyes then we can do actual 3d depth perception the classic example is those magic eye things that were around in the 90s i'm not very good at seeing those i kind of cross-eyed and it kind of works but it's all a bit backwards but the idea there is that we trick our eyes into seeing slightly different images and that gives us a perception of depth if we've got a stereo system what we've the main thing we need to know is where are our cameras our brains know where our eyes are because they've learned it but one's here and one's here you know people maybe have slightly further apart eyes um but your brain will account for this if we're going to do this mathematically using a computer we need to know where these cameras are if you know that we've seen an object in in one view and then we go into the other view we need to try and find corresponding points without knowing where the cameras were your search space is increased you've got look over the whole image maybe you get points confused maybe there's a corner that appears multiple times because it's like a book and it has four corners and then you've got to try and resolve which one's which and some of these features won't appear in both views because of occlusion so if you take your left and right view of my hands you know some of my left hand is going to be visible from one eye isn't in the other eye and that's a huge problem so what we do is we start with a process called camera calibration we have two cameras that are nearly next to each other and we don't know exactly what their angles are but we can find that out by using camera calibration we have to take the picture from both cameras at the exact same time because otherwise the scene is going to have changed so we'll assume we're taking pictures with the cameras at the same time something that isn't true of some visual reconstruction systems we take a picture of this board and we calibrate the positions of our cameras and then we move the cameras and take a picture of something we're trying to reconstruct in 3d so then we have a situation where we have one image here on this side our left left view and we have an image here which is our right view in our previous video on the matrix we talked about the lens and all this system in front we'll do away with that for now just for simplicity's sake and we'll say these are pinhole cameras because we're using a pinhole camera model the optical center of our camera is somewhere behind this camera plane so some object in the world projects its light down here intersects our image plane and then goes into the camera origin like this we have an optical center of our camera and any light rays coming from this object here are going to travel down this ray intersect our image plane and then go into the optical center of the camera and this will happen for any points in our scene that this camera can see we want to say we've got a point on this image plane where did it come from and the crucial problem is that it could have come from here or it could have come from here or here or here or anywhere along this ray and we don't know and that's what we're trying to find out that's the depth problem now we also have an optical center for this camera which is here and rays will be coming out and intersecting through these points so if we knew that this point in this image was this point in this image then we just project the rays we find where they intersect and that's we use simple triangulation simple maths to work out how far away that position is we don't know what point that is because it's going to change it might not be visible in this image it's one problem the search base is quite large reliably finding the exact same point as this in a different image when it might have rotated and changed slightly is a lot of work in two dimensions and you've got to do that for every single pixel in this image you've got to find maybe one that tries to match in here that's a lot of work to do so we don't tend to do that we use something a nice observation called epipolar geometry to try and make this a little bit easier if this is our intersection point x1 and this is some object x all the way out there and we're trying to find out how far away it is we need to try and make our search in this image a little bit easier so what we do is we imagine that this is part of a big triangle coming out so this is one corner of our triangle this is another or it could be this this x is somewhere along here and comes in through this point so let me get a different pen and make things easier we can draw a ray that goes from this optical center to here and from this optical center to here and from this optical sensor here to any of these points and they intersect this image like this and what this is is our epipolar line so this line here through these points is all the possible projections of this ray into this image so now we've simplified our problem because we know where these cameras are we can say we're trying to find this position x1 in this image by knowing that it's going to be somewhere along here we know it's going to be in this line here so we've got a limited set of pixels we have to now look through so all we need to do is go for each of these pixels in a list and say which one of them looks most like this and then we find it and then we find our triangulation point and we find out how far it is away is this because you already know where the cameras are yes it's only possible because we know where the cameras are if we don't then we have to just search through the whole of the other image and it takes ages one edge of our triangle is between the optical centers of the cameras one is through this point and out into the world and the other is some value we don't know which is going to be along this line because it's just a flat triangle cutting through this image which makes it a lot easier to find out where these things are what we will do if we're writing a stereo reconstruction algorithm is for every point in this image and maybe we'll do it backwards as well for completeness for every point in this image we will try and find the point along its particular epipolar line that best matches it and then of course you can go much more complicated than that you can try and find the global image map between here and here which is a combination of not only the best feature matches but also um you know it needs to be nice and smooth objects don't tend to go back and forth a lot so you want them to be rounded so you have to bear that in mind finding a point in this image based on another one from this image is called the correspondence problem and that's really the core of what of what we're solving here finding the included pixels is hard and there are approaches based on this where they not only try and find what we call the disparity map the the difference between this x and this x

As found on YouTube

You May Also Like