The steps required for 3D reconstruction.
There are many ways to reconstruct the world around but it all reduces down to getting an actual depth map.
A depth map is a picture where every pixel has depth information (instead of color information). It is normally represented like a grayscale picture.
As mentioned before there are different ways to obtain a depth map and these depend on the sensor being used. A type of sensor could be a simple camera (from now on called RGB camera in this text) but it is possible to use others like LiDAR or infrared or a combination.
The type of sensor will determine the accuracy of the depth map. In terms of accuracy it normally goes like this: LiDAR > Infrared > Cameras. Depth maps can also be colorized to better visualize depth.
Depending on the kind of sensor used, theres more or less steps required to actually get the depth map. The Kinect camera for example uses infrared sensors combined with RGB cameras and as such you get a depth map right away (because it is the information processed by the infrared sensor).
But what if you don’t have anything else but your phone camera?. In this case you need to do stereo reconstruction. Stereo reconstruction uses the same principle your brain and eyes use to actually understand depth.
The gist of it consists in looking at the same picture from two different angles, look for the same thing in both pictures and infer depth from the difference in position. This is called stereo matching.
In order to do stereo matching it is important to have both pictures have the exact same characteristics. Put differently, both pictures shouldn’t have any distortion. This is a problem because the lens in most cameras causes distortion. This means that in order to accurately do stereo matching one needs to know the optical centers and focal length of the camera.
In most cases this information will be unknown (especially for your phone camera) and this is why stereo 3D reconstruction requires the following steps:
- Camera calibration: Use a bunch of images to infer the focal length and optical centers of your camera
- Undistort images: Get rid of lens distortion in the pictures used for reconstruction
- Feature matching: Look for similar features between both pictures and build a depth map
- Reproject points: Use depth map to reproject pixels into 3D space.
- Build point cloud: Generate a new file that contains points in 3D space for visualization.
- Build mesh to get an actual 3D model (outside of the scope of this tutorial, but coming soon in different tutorial)
Step 1 only needs to be executed once unless you change cameras. Steps 2–5 are required every time you take a new pair of pictures…and that is pretty much it.
The actual mathematical theory (the why) is much more complicated but it will be easier to tackle after this tutorial since you will have a working example that you can experiment with by the end of it.
In the next part we will explore how to actually calibrate a phone camera, and some best practices for calibration, see you then.
Don’t forget to give us your 👏 !
Credit: Source link