first commit

likesum · Oct 12, 2020 · 13521c6 · 13521c6
commit 13521c6
Show file tree

Hide file tree

Showing 51 changed files with 3,173 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Zhihao Xia,  Patrick Sullivan, Ayan Chakrabarti
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,123 @@
+# Generating and Exploiting Probabilistic Monocular Depth Estimates
+### [Project Page](https://projects.ayanc.org/prdepth) | [Video](https://youtu.be/lNw9326KSlU) | [Paper](https://openaccess.thecvf.com/content_CVPR_2020/html/Xia_Generating_and_Exploiting_Probabilistic_Monocular_Depth_Estimates_CVPR_2020_paper.html)
+Tensorflow implementation of training and utilizing a neural depth sampler for a variety of depth applications.<br>
+
+[Generating and Exploiting Probabilistic Monocular Depth Estimates](https://projects.ayanc.org/prdepth) (CVPR 2020, **Oral Presentation**) <br>
+ [Zhihao Xia](https://www.cse.wustl.edu/~zhihao.xia/)<sup>1</sup>,
+ [Patrick Sullivan](https://github.com/algoterranean)<sup>2</sup>,
+ [Ayan Chakrabarti](https://projects.ayanc.org/)<sup>1</sup> <br>
+<sup>1</sup>WUSTL, <sup>2</sup>The Boeing Company
+
+<img src='example/figure.jpg'/>
+
+## Setup
+
+Python 3 dependencies:
+
+* tensorflow 1.15
+* tensorflow-probability
+* matplotlib
+* numpy
+* imageio
+
+We provide a conda environment setup file including all of the above dependencies. Create the conda environment `prdepth` by running:
+```
+conda env create -f prdepth/environment.yml
+conda activate prdepth
+```
+
+## Download pre-trained models
+Our pre-trained models, along with the pre-trained feature extractor DORN, which is converted from the [caffe model](https://github.com/hufu6371/DORN) provided by the [DORN paper](https://arxiv.org/pdf/1806.02446.pdf), can be found [here](https://github.com/likesum/prdepth/releases/download/v1.0/trained_models.zip). You can download and unzip them by running
+```
+bash ./scripts/download_models.sh
+```
+We provide example input images under `example/inputs/`. To test the pre-trained model on your own data, you need to change the resolution of your images to the NYUv2 resolution and rename input images (RGB, sparse depth, low-resolution depth, etc.) correspondingly. Please see the code and `example/inputs` for details.
+
+## Depth estimation with a single RGB image
+Our neural sampler (VAE) is trained with a standard monocular RGB-D dataset, but can be used for a diverse set of applications, including monocular depth estimation and depth estimation with addtional information. When there is no additional information, our model can perform general monocular inference tasks beyond per-pixel depth.
+
+### Monocular depth estimation
+In the standard monocular depth estimation application, when only a RGB image is available, run 
+```
+python test_VAE.py --save_dir example/outputs
+```
+to get a single depth estimation (mean prediction). To show the diversity of our distributional output, this script will also return the results of ideally selected (best) samples and adversarially selected (worst) samples.
+
+### Pair-wise relative depth estimation
+Instead of metric depth estimation, our model can also be used for predicting which of the two points in the scene are closer than the other.
+Run
+```
+python original_estimation.py
+```
+to get the accuracy of the relative depth estimation using our distributional output, comparing to using the ordering of the individual depth value pairs in a monocular depth map estimate.
+
+
+## Depth estimation with additional information
+We provide the implementation to combine our distributional output (from monocular cue) with a variety of additional partial information about depth to derive a more accurate scene depth estimate, *without retraining*. To get predictions for different applications and different settings in each application, simply change the input as they can all be done with our single model. We provide the hyper-parameters for each application in each script, which are found on our validation dataset.
+
+### Sparse to dense with arbitrary sparse measurements
+Consider the task of estimating the depth map when the inputs are a color image and a sparse depth map. Run
+```
+python sparse_to_dense.py --save_dir example/outputs [--nsparse num_of_sparse_points]
+```
+and compare the result with our monocular estimate (mean prediction).
+
+### Sparse to dense with confidence-guided sampling
+Our distributional outputs can provide spatial map of the relative monocular ambiguity in depth at different locations, which can guide the depth sampling given a fixed budget of measurements. Run
+```
+python guided_sampling.py --save_dir example/outputs [--nsparse num_of_sparse_points]
+```
+to get the guided sparse measurements and the dense depth estimation using this sparse depth map. Compare it with the output of sparse to dense with arbitrary sparse measurements.
+
+### Depth upsampling
+Consider the task of depth super-resolution from a low-resolution depth map along with a color image. Run
+```
+python depth_upsample.py --save_dir example/outputs [--factor downsampling_factor]
+```
+
+### Depth uncropping
+Consider the task of extrapolating the depth measurements from a sensor of small field-of-view (FOV) or along a single line, with a color image of a larger FOV. Run
+```
+python depth_uncrop.py --save_dir example/outputs [--height height_of_crop] [--width width_of_crop]
+```
+
+
+### Diverse global estimates for user to select
+Our model can provide a set of diverse global depth estimates for user to select given a sinle image. We simulate the user selection by selecting the most accurate depth estimation. Run
+```
+python diverse_estimaion.py --save_dir example/outputs [--n_estimation num_of_diverse_estimations]
+```
+
+### Depth estimation with user annotations of erroneous regions
+As an extension, we consider also getting annotations of regions (only locations, not depth values) with high error from the user in each estimate. We simulate user annotations by comparing the estimation with ground-truth depth. Run
+```
+python interactive_estimation.py --save_dir example/outputs [--n_estimation num_of_diverse_estimations]
+```
+and compare the result with diverse estimation without user annotations.
+
+
+## Training your own models
+Our model is trained on the standard [NYUv2 RGB-D dataset](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html). To train your own model, download the dataset and update `data/NYUtrain.txt` and `data/NYUval.txt` with path to each training and validation image. Then run
+```
+python train_VAE.py
+```
+You can press `ctrl-c` at any time to stop the training and save the checkpoints (model weights and optimizer states). The training script will resume from the latest checkpoint (if any) in the model directory and continue training.
+
+Since our VAE uses a pre-trained [DORN](https://github.com/hufu6371/DORN) model as the feature extractor, to train our model on your own dataset, you might need to train a DORN model first.
+
+
+## Citation
+If you find the code useful for your research, we request that you cite the paper. Please contact zhihao.xia@wustl.edu with any questions.
+```
+@inproceedings{xia2020generating,
+  title={Generating and Exploiting Probabilistic Monocular Depth Estimates},
+  author={Xia, Zhihao and Sullivan, Patrick and Chakrabarti, Ayan},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={65--74},
+  year={2020}
+}
+```
+
+
+## License
+This code `prdepth/tf_extract_grad.py` is taken from an [older version of Tensorflow]( https://github.com/tensorflow/tensorflow/blob/r1.7/tensorflow/python/ops/array_grad.py#L725). Our TF implementation of DORN is based on the [caffe implementation](https://github.com/hufu6371/DORN) provided by the authors. The rest of the code is licensed under the MIT License.
diff --git a/data/NYUtrain.txt b/data/NYUtrain.txt
@@ -0,0 +1 @@
+path-to-NYUv2/kitchen_0031/00581
diff --git a/data/NYUval.txt b/data/NYUval.txt
@@ -0,0 +1 @@
+path-to-NYUv2/classroom_0004/00031
diff --git a/data/test.txt b/data/test.txt
@@ -0,0 +1 @@
+example/inputs/00959
diff --git a/depth_uncrop.py b/depth_uncrop.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+
+import os
+import argparse
+
+import numpy as np
+import tensorflow as tf
+
+from prdepth import sampler
+from prdepth import metric
+import prdepth.utils as ut
+from prdepth.optimization.uncrop_optimizer import UncropOptimizer as Optimizer
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    '--height', default=120, type=int, help='height of the cropped depth')
+parser.add_argument(
+    '--width', default=160, type=int, help='width of the cropped depth')
+parser.add_argument(
+    '--save_dir', default=None, help='Save predictions to where')
+opts = parser.parse_args()
+save_dir = opts.save_dir
+
+TLIST = 'data/test.txt'
+MAXITER = 200
+TOLERANCE = 1e-8
+LMD = 150.
+
+#########################################################################
+
+depth_sampler = sampler.Sampler(nsamples=100, read_gt=True)
+
+optimizer = Optimizer(depth_sampler, LMD)
+
+sess = tf.Session()
+depth_sampler.load_model(sess)
+
+#########################################################################
+# Main Loop
+flist = [i.strip('\n') for i in open(TLIST).readlines()]
+depths, preds, masks = [], [], []
+
+for filename in flist:
+    # Run VAE to sample patch-wise predictions.
+    depth_sampler.sample_predictions(filename, sess)
+
+    # Load cropped depth.
+    depth = sess.run(depth_sampler.image_depth).squeeze()
+    cropped_depth = ut.read_depth(
+        filename + '_crop%dx%d.png' % (opts.height, opts.width))
+
+    optimizer.initialize(sess)
+    optimizer.compute_additional_cost(cropped_depth, sess)
+
+    for i in range(MAXITER):
+        global_current = optimizer.update_global_estimation(sess)
+        diff = optimizer.update_sample_selection(sess)
+
+        if diff < TOLERANCE:
+            break
+    pred = optimizer.update_global_estimation(sess)
+
+    pred = np.clip(pred.squeeze(), 0.01, 1.).astype(np.float64)
+    pred[cropped_depth > 0] = cropped_depth[cropped_depth > 0]
+    preds.append(pred)
+
+    depth = sess.run(depth_sampler.image_depth).squeeze().astype(np.float64)
+    depths.append(depth)
+    masks.append(cropped_depth == 0)
+
+    if save_dir is not None:
+        nm = os.path.join(save_dir, os.path.basename(filename))
+        min_depth = np.maximum(0.01, np.min(depth))
+        max_depth = np.minimum(1., np.max(depth))
+        ut.save_color_depth(nm + '_gt.png', depth, min_depth, max_depth)
+        ut.save_color_depth(nm + '_uncropped.png', pred, min_depth, max_depth)
+
+# Metrics computed only on filled-in regions.
+metrics = metric.get_metrics(depths, preds, projection_mask=True, masks=masks)
+for k in metric.METRIC_NAMES:
+    print("%s: %.3f" % (k, metrics[k]))
diff --git a/depth_upsample.py b/depth_upsample.py
@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+
+import os
+import argparse
+
+import numpy as np
+import tensorflow as tf
+
+from prdepth import sampler
+from prdepth import metric
+import prdepth.utils as ut
+from prdepth.optimization.s2d_optimizer import UpsamplingOptimizer as Optimizer
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    '--factor', default=48, type=int, help='super-resolution factor')
+parser.add_argument(
+    '--save_dir', default=None, help='save predictions to where')
+opts = parser.parse_args()
+save_dir = opts.save_dir
+
+TLIST = 'data/test.txt'
+MAXITER = 200
+TOLERANCE = 1e-8
+if opts.factor == 48:
+    GAMMA = 0.3
+    NUM_GD_STEPS = 3
+elif opts.factor == 48:
+    GAMMA = 0.2
+    NUM_GD_STEPS = 1
+
+#########################################################################
+
+depth_sampler = sampler.Sampler(nsamples=100, read_gt=True)
+
+optimizer = Optimizer(depth_sampler)
+
+sess = tf.Session()
+depth_sampler.load_model(sess)
+
+#########################################################################
+#### Main Loop
+flist = [i.strip('\n') for i in open(TLIST).readlines()]
+depths, preds = [], []
+
+for filename in flist:
+    # Load downsampled input.
+    lowres_depth = ut.read_depth(filename + '_lowres%dx.png' % opts.factor)
+
+    # Run VAE to sample patch-wise predictions.
+    depth_sampler.sample_predictions(filename, sess)
+
+    optimizer.initialize(sess)
+
+    for i in range(MAXITER):
+        global_current = optimizer.update_global_estimation(
+            lowres_depth, GAMMA, NUM_GD_STEPS, sess)
+        diff = optimizer.update_sample_selection(global_current, sess)
+
+        if diff < TOLERANCE:
+            break
+    pred = optimizer.update_global_estimation(
+        lowres_depth, GAMMA, NUM_GD_STEPS, sess)
+
+    pred = np.clip(pred.squeeze(), 0.01, 1.).astype(np.float64)
+    preds.append(pred)
+
+    depth = sess.run(depth_sampler.image_depth).squeeze().astype(np.float64)
+    depths.append(depth)
+
+    if save_dir is not None:
+        nm = os.path.join(save_dir, os.path.basename(filename))
+        min_depth = np.maximum(0.01, np.min(depth))
+        max_depth = np.minimum(1., np.max(depth))
+        ut.save_color_depth(nm + '_gt.png', depth, min_depth, max_depth)
+        ut.save_color_depth(nm + '_upsampled.png', pred, min_depth, max_depth)
+
+metrics = metric.get_metrics(depths, preds, projection_mask=True)
+for k in metric.METRIC_NAMES:
+    print("%s: %.3f" % (k, metrics[k]))