Dense Reconstruction¶
As shown in the Chapter of “Super Panorama”, we get a model for total three floors of the “winter garden” scene with about 1,000,000 faces. with reasonable quality.
We want our image reconstruction to produce a model with similar quality. We offers two test datasets : Small indoor scene with code “cuxz”. and Indoor Garden scene with code 4em9.
The main problem of the original Colmap dense reconstruction results are :
- Reflection of the smooth floor or windows. ==> Deep learning image segmentaion of the floor, followed by a depth could post-processing step.
- Great amount of noise of texture areas, and very unsatisfying edges. ==> Total Variation resonstruction to preserve sharp edges.
- We also tested a few deep learning reconstruction methods, while they are not satisfying.
1. Colmap MVS¶
- Using the colmap MVS results (using Patch Match algorithm).
- Modeling with poisson reconstruction.
- Simplify the model using Meshlab Quadric Edge Collapse Decimation.
We have a quiet satisfying result in our garden scene (one layer garden part, built with about 1200 images). It has about 11,000,000 faces before simplification, and 200,000 after simplification.
Problems :
- Still too much faces need to be about O(100,000) faces.
- The depth estimation is not complete, result in holes. Try TV reconstruction
- The reflection of the ground, and some textureless areas, will lead to poor reconstruction. Using Deep Learning image segmentation
2. Deep Learning¶
- Deep learning MVS method.
- Depth Completion
The Deep Learning methods are just not stable enough. And training in every datasets is too expensive.
2.1 DeepMVS¶
We tried DeepMVS in our scene.
Problems:
- It only capture the relative relationship, not the real distance. (see more in my report )
- It can only have good result in some scene, while cannot be applied to general cases. It greatly limit its application, as it costs a lot to train in a new scene (main the cost to make the dataset).
2.2 NetMVS¶
Problem:
- The offical NetMVS shows great results, while we found its test data is far too simple. We test it in our own scene, it produces a terrible result. (see my report jupyter notebook )
- The algorithm (we use a pytorch implementation version ) costs too much GPU memory. Its officical results are built with D=256 (see the explanation of the parameter from the project), while in our 8G GTX1080 GPU, we could only add 10 source images, with D set to 80. Which may explain the poor result.
2.3 CSPN¶
We test the pretrained model of CSPN , github project . Our results could be seen here . The result is just not satisfying.
2.4 Sparse-to-Dense¶
We test the pretrained model of sparse-to-dense , github project . Our results could be seen here . The result is just not satisfying.
2.5 DeMoN¶
2.6 Video¶
2.7 Hybrid-method¶
Consistent Video Depth Estimation use a various deeplearning method to achieve relative good result.
- Step 1. Colmap : sparse reconstruction.
- Step 2. FlowNet2 : for estimating a flow-displacement consistence evulation.
- Step 3. Fine-Tune (Training) Monodepth2 : use the optical flow displacement consistence (obtained in Step 2.) and geometry consistence (using 3D SFM pyhsics model, obtained in Step 1).
- Step 4. Scale calibration of the monodepth2 depth and colmap depth.
3. Our process¶
Step 1. Semantic segmentation¶
We using the Pytorch Encoding Library . And it offers image segmentation models for two datasets (ADE20K Dataset: for indoor scene, and Pascal Context Dataset for outdoor scene). We use the best result in its dashtable: resnet+deeplab models. And we found the ADE20K Dataset pretrained models are very suitable for our task. Our result for Indoor Garden Scene with code ipju.
Step 2. Floor repair¶
The floor repair process :
- Extraction the floor point cloud using the semantic segmentation results.
- RANSAC Plane estimation based on these clouds.
- Filter the points far from the plane.
- Filling the area with the estimated plane model.
Step 3. TV Reconstruction¶
To try to fill the depth estimation . We try to apply the Total Variation L2 reconstruction (using ADMM algorithm, see more detail in my convex optimization document ) to refine the depth result of Colmap patch match MVS. (see the example show in jupyter notebook )
Problems:
- Too slow. Use other faster algorithms.
- Still need refinement. Tried using Deep Learning image segmentation labels (higher TV weight for pixels with the same label), but didn’t end up well example .
Step 4. TSDF Reconstruction¶
We use a TSDF reconstruction to make our mesh model.
Step 5. Post-process¶
Post process on the TSDF mesh result.
- remove isolated pieces (wrt Face number) : 25
- cut the undesired parts
- Simplification: Quadric edge collapse decimation : 0.1 reduction, planar simplification
Finally we got a model with 118,403 faces. The results could be found in Baidu Yun 28gp .